Taming Regular Expressions
Written by Nikos Vaggalis   
Friday, 16 September 2016
Article Index
Taming Regular Expressions
SRL-Simple Regex Language

 

 

Simple Regex Language, although it embraces similar  principles with the GP approach, targets the very language in which we write regular expressions.

An example will work best in demonstrating its concept.The  following expression matches an email address of the  "you@example.com" format, written in the traditional form looks like the output of text generated by the process of obfuscation:

 


/^(?:[0-9]|[a-z]|[\._%\+-])+
    (?:@)(?:[0-9]|[a-z]|[\.-])+(?:\.)[a-z]{2,}$/i
 

 

Under SRL the same is naturally expressed as:

 


begin with any of (digit, letter, one of "._%+-")
   once or more,
literally "@", any of (digit, letter, one of ".-")
   once or more,
literally ".", letter at least 2 times,
   must end, case insensitive

This example demonstrates that in effect SRL is a Domain Specific Language, that acts as a targeted solution to the  domain's problem.

This is nothing new, and is frequently used in domains ranging from web development, e.g. the Perl Dancer framework hosting its own DSL, to databases, e.g. SQL which takes the pain out of working with RDBMS's, and language extension DSLs, e.g. Linq and jOOQ which emulate SQL in C# and Java respectively by allowing the writing of SQL statements as if the programming language natively supported them:

C#-Linq:


string startFolder =
  @"c:\program files\Microsoft Visual Studio 9.0\";
// Take a snapshot of the file system.
System.IO.DirectoryInfo dir =
new System.IO.DirectoryInfo(startFolder);
// This method assumes that the application has discovery permissions
// for all folders under the specified path.
IEnumerable fileList =
dir.GetFiles("*.*", System.IO.SearchOption.AllDirectories);
//Create the query
IEnumerable fileQuery =
from file in fileList
where file.Extension == ".txt"
orderby file.Name
select file;

 Java-jOOQ:


Factory create = new Factory(connection, dialect);
Result<?> result = create.select()
.from(AUTHOR)
.join(BOOK).on(BOOK.AUTHOR_ID.equal(AUTHOR.ID))
.fetch();

 

With SLR, each part of a regular expression part becomes self-explanatory, as we go from this:

    [a-f]{4}                
to this:
    letter from a to f exactly 4 times

And from this:
    [0-9]
to this:
   digit from 0 to 9 exactly 1 time

Or even from this:
    [A-Z]{1,2}
to this:
    letter from A to Z between 1 and 2 times
 
In effect it trades key strokes for expressibility.

Putting all together, our very first regex example that matches UK postcodes:

 


/ # UK Postcode:
[[A-Z]{1,2}         # Area
[0-9R][0-9A-Z]?     # District
\s                  # exactly one blank space
[0-9]               # Sector
[ABD-HJLNP-UW-Z]{2} # Unit
/x

 

can be now rewritten as:

 


letter from A to Z between 1 and 2 times
any of (digit from 0 to 9,literally "R")
any of (digit from 0 to 9,letter from A to Z)
          optional whitespace exactly 1 time
digit from 0 to 9 exactly 1 time
any of (literally "A",literally "B",
          letter from D to H,literally "J",
literally "L",literally "N",letter from P to U,
          letter from W to Z)
exactly 2 times

 

Note that we haven't used any capturing parentheses as we are only interested in the concept of proof, that is working out a match rather than working on extracting text.

Testing it against some standard postcode samples:

M1 1AA
M60 1NW
CR2 6XH
DN55 1PT
W1A 1HQ
EC1A 1BB

validates the correctness of the generated regex.

The limits
The engine has support for quantifiers, groups, anchors, even lookarounds, therefore you can come up with pretty elaborate expressions. However, a very interesting feature not yet supported is Recursive regexes; I am curious to find out how this would be mapped in SRL.

How to use
You can either build your desired SRL Query online or import your favourite language's SRL implementation in your code. For the time being there's support for PHP7, Python, C# , Javascript and Java. More info on that on the project's GitHub repo where you'll discover it is looking for contributors to extend its scope.

Wrapping it up, SRL goes to great lengths to take the pain out of constructing regular expressions, as such it could be an invaluable tool for professionals such as scientists, physicists, statisticians, who need the power of regular expressions but are not proficient enough at coding to do so. 

 srlsq

 

 

More Information

SRL-Simple Regex Language

SRL Query builder

GitHub repo

Related Articles

Automatically Generating Regular Expressions with Genetic Programming

Advanced Perl Regular Expressions - The Pattern Code Expression

Advanced Perl Regular Expressions - Extended Constructs

Banner
 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter,subscribe to the RSS feed and follow us on, Twitter, FacebookGoogle+ or Linkedin

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

 



Last Updated ( Friday, 16 September 2016 )