Quick DB Replacer 1.1
Regular expressions
Regular expressions are a powerful tool for the search of text and
strings. Either/ or conditions, searching groups of characters, beginning
of an input string, beginning or the end of a word, finding a hyperlink or
an email address are just a few possibilities of using regular
expressions.
Regular expressions represent a advanced use of computer technologies,
many books exists about this theme. But with basic knowledge only the
regular expressions starts to be a powerful tool already.
A regular expression is a pattern of text that consists of ordinary
characters (for example, letters a through z) and special
characters, known as meta characters. The pattern describes
one or more strings to match when searching a body of text. The
regular expression serves as a template for matching a character
pattern to the string being searched. Here are some examples of
regular expression you might encounter:
Example |
Matches |
"^\s*$" |
Match a blank line. |
"\d{2}-\d{5}" |
Validate an ID number consisting of 2 digits,
a hyphen, and another 5 digits. |
The following table contains the complete list of meta characters
and their behavior in the context of regular expressions:
Character |
Description |
\ |
Marks the next character as either a special
character, a literal, a backreference, or an octal escape. For
example, 'n' matches the character "n". '\n' matches a newline
character. The sequence '\\' matches "\" and "\(" matches "(". |
^ |
Matches the position at the beginning of the input string. |
$ |
Matches the position at the end of the input string. |
* |
Matches the preceding character or
subexpression zero or more times. For example, zo* matches "z"
and "zoo". * is equivalent to {0,}. |
+ |
Matches the preceding character or
subexpression one or more times. For example, 'zo+' matches
"zo" and "zoo", but not "z". + is equivalent to {1,}. |
? |
Matches the preceding character or
subexpression zero or one time. For example, "do(es)?" matches
the "do" in "do" or "does". ? is equivalent to {0,1} |
{n} |
n is a nonnegative integer. Matches
exactly n times. For example, 'o{2}' does not match the
'o' in "Bob," but matches the two o's in "food". |
{n,} |
n is a nonnegative integer. Matches at
least n times. For example, 'o{2,}' does not match the
"o" in "Bob" and matches all the o's in "foooood". 'o{1,}' is
equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. |
{n,m} |
m and n are nonnegative
integers, where n <= m. Matches at least n
and at most m times. For example, "o{1,3}" matches the
first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'.
Note that you cannot put a space between the comma and the
numbers. |
? |
When this character immediately follows any of
the other quantifiers (*, +, ?, {n}, {n,}, {n,m}),
the matching pattern is non-greedy. A non-greedy pattern
matches as little of the searched string as possible, whereas
the default greedy pattern matches as much of the searched
string as possible. For example, in the string "oooo", 'o+?'
matches a single "o", while 'o+' matches all 'o's. |
. |
Matches any single character except "\n". To
match any character including the '\n', use a pattern such as
'[\s\S]'. |
(?:pattern) |
Matches pattern but does not capture
the match, that is, it is a non-capturing match that is not
stored for possible later use. This is useful for combining
parts of a pattern with the "or" character (|). For example,
'industr(?:y|ies) is a more economical expression than
'industry|industries'. |
(?=pattern) |
Positive lookahead matches the search string
at any point where a string matching pattern begins.
This is a non-capturing match, that is, the match is not
captured for possible later use. For example 'Windows
(?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not
"Windows" in "Windows 3.1". Lookaheads do not consume
characters, that is, after a match occurs, the search for the
next match begins immediately following the last match, not
after the characters that comprised the lookahead. |
(?!pattern) |
Negative lookahead matches the search string
at any point where a string not matching pattern
begins. This is a non-capturing match, that is, the match is
not captured for possible later use. For example 'Windows
(?!95|98|NT|2000)' matches "Windows" in "Windows 3.1" but does
not match "Windows" in "Windows 2000". Lookaheads do not
consume characters, that is, after a match occurs, the search
for the next match begins immediately following the last
match, not after the characters that comprised the lookahead. |
x|y |
Matches either x or y. For
example, 'z|food' matches "z" or "food". '(z|f)ood' matches
"zood" or "food". |
[xyz] |
A character set. Matches any one of the
enclosed characters. For example, '[abc]' matches the 'a' in
"plain". |
[^xyz] |
A negative character set. Matches any
character not enclosed. For example, '[^abc]' matches the 'p'
in "plain". |
[a-z] |
A range of characters. Matches any character
in the specified range. For example, '[a-z]' matches any
lowercase alphabetic character in the range 'a' through 'z'.
|
[^a-z] |
A negative range characters. Matches any
character not in the specified range. For example, '[^a-z]'
matches any character not in the range 'a' through 'z'. |
\b |
Matches a word boundary, that is, the position
between a word and a space. For example, 'er\b' matches the
'er' in "never" but not the 'er' in "verb". |
\B |
Matches a nonword boundary. 'er\B' matches the
'er' in "verb" but not the 'er' in "never". |
\cx |
Matches the control character indicated by
x. For example, \cM matches a Control-M or carriage return
character. The value of x must be in the range of A-Z
or a-z. If not, c is assumed to be a literal 'c' character.
|
\d |
Matches a digit character. Equivalent to
[0-9]. |
\D |
Matches a nondigit character. Equivalent to
[^0-9]. |
\f |
Matches a form-feed character. Equivalent to
\x0c and \cL. |
\n |
Matches a newline character. Equivalent to
\x0a and \cJ. |
\r |
Matches a carriage return character.
Equivalent to \x0d and \cM. |
\s |
Matches any whitespace character including
space, tab, form-feed, etc. Equivalent to [ \f\n\r\t\v]. |
\S |
Matches any non-white space character.
Equivalent to [^ \f\n\r\t\v]. |
\t |
Matches a tab character. Equivalent to \x09
and \cI. |
\v |
Matches a vertical tab character. Equivalent
to \x0b and \cK. |
\w |
Matches any word character including
underscore. Equivalent to '[A-Za-z0-9_]'. |
\W |
Matches any nonword character. Equivalent to
'[^A-Za-z0-9_]'.
|
\xn |
Matches n, where n is a
hexadecimal escape value. Hexadecimal escape values must be
exactly two digits long. For example, '\x41' matches "A".
'\x041' is equivalent to '\x04' & "1". Allows ASCII codes to
be used in regular expressions. |
\num |
Matches num, where num is a
positive integer. A reference back to captured matches. For
example, '(.)\1' matches two consecutive identical characters.
|
\n |
Identifies either an octal escape value or a
backreference. If \n is preceded by at least n
captured subexpressions, n is a backreference.
Otherwise, n is an octal escape value if n is an
octal digit (0-7). |
\nm |
Identifies either an octal escape value or a
backreference. If \nm is preceded by at least nm
captured subexpressions, nm is a backreference. If \nm
is preceded by at least n captures, n is a
backreference followed by literal m. If neither of the
preceding conditions exists, \nm matches octal escape
value nm when n and m are octal digits
(0-7). |
\nml |
Matches octal escape value nml when
n is an octal digit (0-3) and m and l are
octal digits (0-7). |
\un |
Matches n, where n is a Unicode
character expressed as four hexadecimal digits. For example,
\u00A9 matches the copyright symbol (©). |
More examples and information about the use of regular expression is
available in the internet with
google.
Recommended book: Mastering Regular
Expressions by Jeffrey E. F. Friedl
|