In Table 1-19, \b, \B, \d, \D, \s, \S, \w, and \W behave differently depending on flags: if LOCALE (?L) is used, they depend on the current 8-bit locale; if UNICODE (?u) is used, they depend on the Unicode character properties; if neither flag is used, they assume 7-bit U.S. ASCII. Tip: use raw strings (r'\n') to literalize backslashes in Table 1-19 class escapes.
*************** FROM PERL (common) *************
my $str = "Hello saju how are you" # Declaring and initializing a variable '$str'.
## Finding ##
$str =~ /pattern/ # Equivalent to function 'regmatch(pattern,$str)'.
Example: $str =~ /saju/ # Finding for pattern 'saju' in variable '$str'.
## Finding and substituting or replacing ##
$str =~ s/pattern/replacement/ # Equivalent to function 'regsubstitute(pattern,replacement,$str)'.
Example: $str =~ s/saju/sanu/ # Finding for pattern 'saju' in variable '$str' and replace it with 'sanu'.
***************************************
# Regular expression is written in between / ... /.
# A few 11 characters mean something special in between / ... / , they are
[], {}, (), *, +, ?, ., \, ^, $, |
# Other characters in between / ... / just means themselves.
# \ ---> Backslash removing special meaning of special characters in between / ... / s
**************************************************************
### Character Classes ###
[aeiou] ---> Match any one of these characters a,e,i,o,u.
Example: /p[aeiou]t/ matches pat,pet,pit,pot,put.
Example: /p[^aeiou]t/ it not maches pat,pet,pit,pot,put.But it maches pbt,pct,pdt,----.
## character class shortcuts ##
[0123456789] ----> \d
[abc..xyxABC...XYZ0123456789_] ----> \w # Except white space.
[ \n\r\t\f] ----> \s # White space charater class.
[^0123456789] ----> \D
[^abc..xyxABC...XYZ0123456789_] ----> \W
[^ \n\r\t\f] ----> \S # opposite of White space charater class.
. ---> 'dot' shortcut matches any character(whilte space character,abc---xyzADC---XYZ0123456789_).It is widely used.
## character class shortcuts Examples ##
/\d\d-\d\d-\d\d\d\d/ ---> 12-11-2009
/\d\d\/\d\d\/\d\d\d\d/ ---> 11/10/2009
#or
m{\d\d/\d\d/\d\d\d\d} ---> 11/10/2009 # This method is used to avoid usage of delimiter Backslash '\' .Here we using m{..} sinstead of /../.
***************************************
### Quantifiers ###
# Normally, we match exactly one (thing)literal character or character class.
# Put a quantifier after a (thing) literal character or character class to change that.
# We can say we want to match
1) Zero or one (thing) literal character or character class. ---> ?
2) Zero or more (thing) literal character or character class. ---> *
3) One or more (thing) literal character or character class. ---> +
## Zero or One thing --> ? ##
/sa?ju/ ----> Matches 'saju', 'sju'. Here literal character 'a' matches Zero or One time. Not matches 'saaju'.
/s[aj]?u/ ----> Matches 'sju', 'sau', 'su'. Not matches 'saju'.Here characters in the character class '[aj]' matches Zero or One time,
## Zero or More thing --> * ##
/sa*ju/ ----> Matches 'saju', 'sju', 'saaaju', etc. Here literal character 'a' matches Zero or More times.
/s[aj]*u/ ----> Matches 'saaaju', 'sjjju', 'saaajjju', 'su', 'saju', 'sajajaajju'.Here characters in the character class '[aj]' matches Zero or More times.
## One or More thing --> + ##
/sa+ju/ ----> Matches 'saju', 'saaaju'. Not matches 'sju'. Here literal character 'a' must matches One or More times.
/s[aj]+u/ ----> Matches 'saju', 'saaaju', 'sajjju', 'saaajjju', sajajaajju'. Not matches 'su'.Here characters in the character class '[aj]' must matches One or More times.
## Without Qualifiers ##
/(s[aj]u)/ ----> Matches 'sau', 'sju'. Not matches 'saju','su'.
***************************************
### Anchors ###
\A ---> Matches only at te begining of the text.It must be first thing in Regular Expression.
\Z ---> MAtches only at the end, or newline followed by end. It must be last thing in Regular Expression.
Examples:
/\A\d+/ ---> Line start with digits.
/\s\d{5}\Z/ ----> Line end with space and 5 digits.
***************************************
### Capturing Groups ###
# Parentheses '()' capture whatever they match in $1,$2,----.
# Count left-most parentheses to get corresponding $n.
Example:
/\A(\w+),\s+([A-Z][A-Z])\s+(\d{5})\Z/ then,
print "City: $1 , State: $2 , Zip: $3 ";
***************************************
No comments:
Post a Comment