RegexLearn And Other RegEx Resources
Written by Nikos Vaggalis   
Wednesday, 08 December 2021

RegexLearn is an intuitive online instruction led playground where you get to learn how to construct regular expressions. We also revisit other tools, advanced regex constructs, regex programming language portability and how to deter Regular expression Denial of Service attacks.

I won't get deep into the details of why you have to know regex as a developer. I'll suffice to say that:

Regex can be used in programming languages such as Phyton, SQL, Javascript, R, Google Analytics, Google Data Studio, and throughout the coding process for finding, matching, and editing text.

Despite this age of multimedia, text is still king; from data in humble Excel spreadsheets to ETL, NLP and business intelligence, everything is text.

The other point is that regular expressions are notoriously hard to master. Of course, things have improved since the era of reading about them in books such as the timeless Mastering Regular Expressions by Jeffrey Friedl or the more recent Learning Regular Expressions by Ben Forta and the process of learning about them has become easier due to the advent of tools such as RegexLearn. Of course amongst the tools themselves the level of difficulty or the target group they address, vary.

So there are tools such as The Perl Regex Tester, Regexr, Regex101, which address a more advanced level audience and there's others that are more friendly like Regexplained, Ihateregex or desktop Regex Coach, but all of them mostly allow the user to test regexes against a piece of text in order to de-obfuscate them and learn by testing them out. None of them provide the step by step instructional approach RegexLearn adopts.

Tools aside there were also other attempts of curving the complexity of the expressions by adopting scientific solutions such as Genetic Programming which I took a look at with "Automatically Generating Regular Expressions with Genetic Programming", or with a new language , the Simple Regex Language, examined in Taming Regular Expressions.

Going back to RegexLearn, simplicity is what it offers. You merely type what the instruction tells you, and as an outcome you get to see the text matched, getting to learn how to use the operator at hand.

It starts slowly and very simply just by typing OK in the RegEx field to proceed to the next step.

 

Step 2 explains why learning regex is useful:

let's say you have a list of filenames. And you only want to find files with the pdf extension. Following typing an expression ^\w+\. pdf$ will work.

and  requires just a tap on Next in order to proceed.

In step 3 you learn about the Dot . : Any Character

The period . allows selecting any character, including special characters and spaces.

Then it moves on to Character Sets and so on, each step increasing in difficulty. If you get stuck, no worries; Alt+H will show you the answer. In total there are 55 steps so a good amount of depth is covered.

Of course, in the end you won't learn very advanced constructs which are probably programming language specific such as Perl's The Pattern Code Expression ?{?code} or Extended Constructs ?{ code }. For examples of that advance usage check the links.

On the matter of language specific regex extensions, the question that pertains is Can Regular Expressions Be Safely Reused Across Languages?, that is, can I reuse a regular expression crafted in JavaScript verbatim in Python? In doing so, will I get the same results and performance? That article looks into the research, which has also security precautions addressed by

PHP and Perl, PHP probably because it utilizes the PCRE (Perl Compatible Regular Expressions) library, were the only ones that had explicit defenses against exponential time behavior.

Exponential time behavior is yet another reason of why you should really know your regexes well to avoid Regular Expression DoS Attacks.

The Regular expression Denial of Service (ReDoS) is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size).An attacker can then cause a program using a Regular Expression to enter these extreme situations and then hang for a very long time.

Due to differences in the underlying algorithms which the regex engines are based on, a match in some languages may require greater than linear time (polynomial or exponential in the worst case) in the length of the regex and the input string. These are called super-linear matches and some regex engines fall prey to this super-linear behavior while the wiser ones avoid it.

Thus regexes that fall into this super-linear category can be exploited by being fed specially crafted strings which would subsequently overload the host, i. e. web server, as in a DoS attack, eventually bringing it down to its knees.

Something to take into account. And because this is a huge problem we've also reported on a tools that can identify resource-hungry regular expressions, see Regexploit. While  this is a scenario that is unlikely to trouble the users of RegexLearn, it is good to know that it exists.

After you complete all RegexLearn steps, then it's to time to put your newly found skills to test. While RegLearn promises a Practice section, it is not ready yet. However, you can practice with Machine Learning Lab's Regular Expression Game, the makers which we met in Automatically Generating Regular Expressions with Genetic Programming, in a game that comprises of 12 levels of increasing difficulty. And if you want a more old fashioned approach, see Can You Do The Regular Expression Crossword?.

A late addition to the list of tools is py_regular_expressions, a GUI app written in tkinter to help you practice Python regular expressions.

RegexLearn's infrastructure is also open source and can be found on its GitHub repo.

 

More Information

RegexLearn

Github

Related Articles

Learning Regular Expressions (Book Review)

Automatically Generating Regular Expressions with Genetic Programming

Taming Regular Expressions

The Pattern Code Expression 

Extended Constructs

Machine Learning Lab's Regular Expression Game

Can You Do The Regular Expression Crossword?

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


52nd Mersenne Prime Found
27/10/2024

It has been nearly six years since the last Mersenne prime was discovered. Now, at last, we have Mersenne prime number 52 and it has 41,024,320 digits!



JetBrains Improves Kubernetes Support In IDE Upgrades
12/11/2024

JetBrains has improved its IDEs with features to suggest the logical structure of code, to streamline the debugging experience for Kubernetes applications, and provide comprehensive cluster-wide Kuber [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

<ASIN:0134757068>

<ASIN:0596528124>

 

Last Updated ( Wednesday, 08 December 2021 )