.NET Regular Expressions In Depth
Written by Mike James   
Thursday, 16 July 2020
Article Index
.NET Regular Expressions In Depth
Back references

If you think regular expressions are trivial and boring, you've not seen the whole picture. Here we reveal that in .NET they are amazingly powerful and not to be missed.

Deep C#


 Chapter List

  1. Value And Reference
  2. Dynamic C#
  3. Passing Parameters
  4. Inheritance 
  5. Casting – the escape from strong typing
  6. Controlling Inheritance
  7. Delegates
  8. Multicast delegates and events
  9. Anonymous Methods, Lambdas And Closures
  10. Take Exception To Everything
  11. What's The Matter With Pointers?
  12. Generics
  13. Structs
  14. The LINQ Principle
  15. XML in C#
  16. Linq and XML
  17. Regular Expressions in depth
  18. Bit Manipulation
  19. Async, Await and the UI problem
  20. The Invoke pattern
  21. The Parallel For
  22. Deep C# - Custom Attributes In C#  ***NEW
  23. Not so complex numbers in C#
  24. Getting Started With .NET IL
Multicast delegates and events
Tuesday, 25 May 2010
Article Index
Multicast delegates and events
Generic Events

Multicast delegates are useful in their own right but they also form the basis on which the C# event system is built. We take a close look at how they work and how to use them. For example, did you know you could add and subtract delegates?

Regular expressions are addictive.

Playing with these compressed but powerful patterns is better than solving a Sudoku.

If you are wondering what this is all about because, obviously, regular expressions are just the use of “*” and "?" then read on because the truth is a lot more subtle and the result is a lot more powerful than you might suspect. 

Equally, regular expressions are something that you will find in more than just C#, they are useful in Javascript, Perl, Java, Ruby and even in applications such as word processors.

If you know the basics of regular expressions then jump to the end of the article where you will find some deeper explainations of less used features. 




Regular fundamentals

It all starts with the idea of specifying a grammar for a particular set of strings. All you have to do is find a pattern that matches all of the strings you are interested in and use the pattern.

The simplest sort of pattern is the string literal that matches itself. So, for example, if you want to process ISBN numbers you might well want to match the string “ISBN:” which is its own regular expression in the sense that the pattern “ISBN:” will match exactly one string of the form “ISBN:”.

To actually use this you have to first create a Regex object with the regular expression built into it:

Regex ex1 = new Regex(@"ISBN:");

The use of the “@” at the start of the string is optional but it does make it easier when we start to use the “/” escape character.

Recall that strings starting with “@” are represented “as is” without any additional processing or conversion by C#.

To actually use the regular expression we need one of the methods offered by the Regex object.

The Match method applies the expression to a specified string and returns a Match object.

The Match object contains a range of useful properties and methods that let you track the operation of applying the regular expression to the string.

For example, if there was a match the Success property is set to true as in:


The index property gives the position of the match in the search string:

    @"ISBN: 978-1871962406").Index.ToString());

which in this case returns zero to indicate that the match is at the start of the string.

To return the actual match in the target string you can use the ToString method. Of course in this case the result is going to be identical to the regular expression but in general this isn’t the case.

Notice that the Match method returns the first match to the regular expression and you can use the NextMatch method which returns another Match object.


Pattern matching

If this is all there was do regular expressions they wouldn’t be very interesting.

The reason they are so useful is that you can specify patterns that spell out the regularities in a type of data.

For example following the ISBN: we expect to find a digit – any digit.

This can be expressed as “ISBN:\d” where \d is character class indicator which means “a digit”.

If you try this out you will discover that you don’t get a match with the example string because there is a space following the colon. However “ISBN:\s\d” does match as \s means “any white-space character” and:

Regex ex1 = new Regex(@"ISBN:\s\d");
           @"ISBN: 978-1871962406").ToString();

displays “ISBN: 9”.

There’s a range of useful character classes and you can look them up in the documentation. The most useful are:

  •           (i.e. a single dot) matches any character.
  • \d         digit
  • \s         white-space
  • \w        any “word” character including digits

There is also the convention that capital letters match the inverse set of characters:

  • \D       any non-digit
  • \S       any non-white space
  • \W      any word character

Notice that the inverse sets can behave unexpectedly unless you are very clear about what they mean.

For example. \D also matches white space and hence


matches ISBN: 9. 

You can also make up your own character group by listing the set of characters between square brackets.

So for example, [0-9] is the same as \d. Negating a character set is also possible and [^0-9] matches anything but the digits and is the same thing as \D.

There are also character sets that refer to Unicode but these are obvious enough in use not to need additional explanation. 


As well as characters and character sets you can also use location matches or anchors.

For example, the ^ (caret) only matches the start of the string. For example, @"^ISBN:"

will only match if the string starts with ISBN: and doesn’t match if the same substring occurs anywhere else. The most useful anchors are:

  •          start of string
  • $          end of string
  • \b         word boundary – i.e. between a \w and \W
  • \B        anywhere but a word boundary

So for example: 


specifies a string consisting of nothing but digits. Compare this to


which would also accept a null string.

One subtle point only emerges when you consider strings with line breaks.

In this case by default the ^ and $ match only the very start and end of the string.

If you want them to match line beginnings and endings you have to specify the /m option. It’s also worth knowing about the \G anchor which only matches at the point where the previous match ended – it is only useful when used with the NextMatch method but then it makes all matches contiguous.


Last Updated ( Thursday, 16 July 2020 )