Atlas Informatics

Regex Overview

Regular Expressions (Regex)

Regex is a syntax that allows you to find special combinations of characters and words which you can also replace later on if needed.

The most important briefly

  • Escape character for special notations: \
    Examples: \t for a tabulator, \r for a CR, \n for a newline. Hex examples: \x0D, \u000D.
    Escaping entire strings can be done easily with Text Expert of Mighty Desktop.

  • Do not interpret the character x: [x]       (Exception is \, that must be done by \\)
    This must be done in particular when x is one of these characters: . ? \ ( ) + * $ ^ { [ |

  • Symbol for any passage, possibly not there: (.*)

  • Symbol for any passage, at least 1 character must be there (.+)

  • Symbol for any character (except line end): .

  • Symbol for any number (consisting only of digits), possibly not there (\d*)

  • Symbol for any number, at least one digit: (\d+)

  • Symbol for a number with two to three digits: (\d{2,3})

  • Symbol for a letter including those of other languages and also underscore: \w (similar to [a-zA-ZäöüÄÖÜß_])

  • Symbol for an english letter: ([a-zA-Z])

  • Symbol for a blank or tab (or any other whitespace): \s (opposite: \S)

  • Start of the searched text (often a line): ^ (Warning: If a file search is done it means the start of the file)

  • End of the searched text (often a line): $ (Warning: If a file search is done it means the end of the file)

Captured Parts

RegEx offers the possibility to carry over matching parts from the search string to the replacement string. For example, if "a(.*)c" is searched and "abc" is found, "b" is such a Captured Part. The captured parts are numbered upwards starting from 1 and preceded by a $ sign. This way you can let the"b" get inserted into the replacement string by writing "x$1y" and get "xby" as the replacement.

Consider that this notation doesn't work if the characters following the number are digits. Example: If you wanted to replace to "x" + $1 + "5" that would be written as "x$15" and erroneously interpreted as captured part number 15 by RegEx. There is a more complex notation with named captured parts for that case.

Another way is to capture a passage into a named variable. For example this expression captures an identifier of 1 to 100 characters into a variable named 'name': "(?<name>[A-Za-z_]{1,100})". In the replacement string you can then insert this by writing "${name}".

Often used Regexs

Decimal Number [+-]?(\d+\.?\d*|\.\d+)
Version Number max. 4 parts (\d{1,10})([.]\d{1,10}){1,3}
GUID lower or uppercase without brackets [0-9A-Fa-f]{8}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{4}[-][0-9A-Fa-f]{12}

More Infos

Web Page Content
Regular-Expressions.info Tutorial for stepwise learning Regex
Regex One Another tutorial for stepwise learning Regex
Zytrax Easy understandable document
Vogella Java Extensive well structured document
Microsoft Complete but very technical description
Wikipedia Abstract description of Regex

 

 

Go to Homepage