http://qntm.org/files/re/re.html
1.
literal characters, which just represent
themselves. Such as ‘c’, ‘ab’
2.
the dot --- any
single character
3.
Any meta character can be escaped using a backslash,
\
. This turns it back into
a literal.
4.
A character class is a collection of
characters in square brackets. This means, "find any one of these
characters". eg [0-9] means any digital, [a-z] means any lower case
characters. [^a] means any characters except a
5.
Special means
a.
\d – digital \D
– non-digital
b.
\w -
[0-9A-Za-z_]
:
find a word character. \W –
non-word character
c.
\s find a space(tab, space, carriage return or
line feed) character. \S non-space character.
6.
Multipliers use braces to put a multiplier
after a literal or a character class.
a{3}
means "find an a
followed by an a
followed by an a
".a{3,5}
means "find aaaaa
or aaaa
or aaa
".?
means none or once
*
means any times (none, once, or more than once)
+
means one or more than once
Alternation with group: alternation is devided by |, eg: ‘cat|dog’
means cat or dog. You may use () to put your matching pattern there for
example: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day
means the day’s
\b means the word boundary
^
and $
from
"start-of-line" and "end-of-line"
Advanced topics about Regular Expression
Capturing : used referring for () group. (\w+) had a
(\w+) (\w+)
And we can use back-references to refer to the captured
pattern such as regular expression ([abc])\1 means "find aa or bb or cc".
([abc])\1\1 means "find aaa or bbb or ccc"
No comments:
Post a Comment