Wednesday, 26 March 2014

Regular Expression

this is the study note to
http://qntm.org/files/re/re.html



1.       literal characters, which just represent themselves. Such as ‘c’, ‘ab’
2.       the dot  --- any single character
3.       Any meta character can be escaped using a backslash, \. This turns it back into a literal.
4.       A character class is a collection of characters in square brackets. This means, "find any one of these characters". eg [0-9] means any digital, [a-z] means any lower case characters. [^a] means any characters except a
5.       Special means
a.       \d – digital           \D – non-digital
b.      \w - [0-9A-Za-z_]: find a word character.  \W – non-word character
c.       \s find a space(tab, space, carriage return or line feed) character. \S non-space character.

6.       Multipliers use braces to put a multiplier after a literal or a character class.
a{3} means "find an a followed by an a followed by an a".
a{3,5} means "find aaaaa or aaaa or aaa".
? means none or once
* means any times (none, once, or more than once)
+ means one or more than once



Alternation with group: alternation is devided by |, eg: ‘cat|dog’ means cat or dog. You may use () to put your matching pattern there for example: (Mon|Tues|Wednes|Thurs|Fri|Satur|Sun)day means the day’s

\b means the word boundary
^ and $ from "start-of-line" and "end-of-line"

Advanced topics about Regular Expression
Capturing : used referring for () group. (\w+) had a (\w+)  (\w+)
And we can use back-references to refer to the captured pattern such as regular expression ([abc])\1 means "find aa or bb or cc". ([abc])\1\1 means "find aaa or bbb or ccc"
 

No comments:

Post a Comment