|
|
|
|
|||||||||||
Past club events:
UHACC @ Penguicon
Club Pages:
ISU chapter: ISUnix
Projects:Club Documents:
Our Guiding Principles
|
REGULAR EXPRESSIONS - PART II
A Primer page 3
(c) C. Geigner, President, UHACC - November 2003
========================================================================
Allright, so now we know how to construct some pretty useful
expressions. But what is the context? I purposefully left out operands,
modifiers, and all that in Part I so that you could just concentrate on
expressions without worrying about the constructs that we use to apply
them. I suppose that now is the time to put the rubber to the road, lest
I leave you all out there in theory land.
========================================================================
DELIMITERS:
// Forward Slashes are the defacto standard for containing expression
and replacement elements. Defacto, I say, because in many languages
you may choose any character to delimit elements!
Field 1 is reserved for the expression that you want to match.
Field 2 is optional, but can be used with the "s" operand (more on
that below) to effect a search-and-replace.
EXAMPLE 1: (awk)
awk -F: '/\/bin\/ksh/{print $1" "$7}' /etc/passwd
Slashes delimit the awk search for lines containing "/bin/ksh".
When awk makes a match on a line, it will then print fields 1&7
using ":" as the field delimiter.
EXAMPLE 2: (perl)
=~ m/^Subject:.*/ --> Assigns lines beginning with the string
"Subject:" to variable . You could use any delimiter though,
in this case: =~ m!^Subject:.*! I use bang (!) to
delimit my expression.
EXAMPLE 3: (vi/ex)
:%s/^[0-9]+// --> Searches line-by-line for lines starting with 1 or
more numbers and replaces them with (hint: second field contains
replacement, but second filed contains nothing, not even a
space!) NOTHING - right! This erases whole numbers at the
beginning of a line.
NOTE: Rather than using delimiters to enclose regex's and
substitution patterns, TCL uses the constructs "regexp" & "regsub" to
invoke regexes.
Operands:
s Substitute. Searches for regex in field 1 and performs replacement
of matched items with literal in field 2 (no, you cannot place a
regex in field 2! You can, however use backreferenced items
as literals (as well as the whole match through "&" I've heard...)
m Match (perl). Indicates that we are matching.
g Global Search. Indicates that we are searching globally for our regex.
q Create quoted string. AND
qq Create double quoted string.
qq/This is actually in double quotes/
is the same as
"This is actually in double quotes"
Modifiers:
e Causes matches to be processed by eval (perl)
g Global modifier. Your regex search will end at the first positive
match unless you tell it to keep going by using "g"
s/regex/literal/g
i Makes regex case-imsensitive. Useful in small scripts, but the
overhead required makes it cumbersome for use in big apps. (perl)
=~ m/tom clancy/i
m Multi-line mode. Treats input lines as seperate entities, recognizing
"^" and "$" at the beginning and end of all lines. ("." will NOT
match the newline char)
=~ m/tom clancy/mig --> Matches
o Optimize. Good for use with expressions that get reused over and
over and over and... right! Iteration constructs like "foreach" or
"while" are perfect candidates for use with the optimize operator.
This is because the regex engine normally reevaluates any regex
every time it encounters one in your code - even when it is in a
loop! To allay this problem, we can tell the engine to re-use the
one it has already calculated a binary for.
foreach
s Single-line mode. Treats input as one big string. "." WILL match the
newline character.
=~ m/tom clancy/s
ms Use this modifier combo to get the best of both worlds: all lines
treated independently "^" and "$" work for beginning and end of each
line, yet "." will certainly match a newline char.
x Free Formatting. The x modifier allows the regex engine to ignore
all whitespace characters in the expression. This allows you to
"spread out" - mostly to make your code more readable when
implementing large expressions
=~ m{
( 1?[0-9]?[0-9]
| 2([0-4] [0-9]
| 5 [0-5] ))
\.\1\.\1\.\1
}x
Using Lookahead & Perl-ish Regex Variables
LOOKAHEAD:
?= Lookahead operator.
George(?=W\. Bush|H\.W\. Bush|Bush) matches George, but only if
the string is followed by any of the alternatives.
?! Negative lookahead operator
Tommy(?!TuTone|The Cat) matches all instances of "Tommy" but NOT
if it is followed by the alternatives.
$` Contains all data that flowed by before a match was made.
$& Contains the match
$' Contains all remaining data after a successful match
Should you use any of these? NOOOOOOO! From what I've read, these are
fraught with gotchas and nightmarish overhead, so use these at your own
peril. How about some alternatives (since I just showed you something
cool that you can't use hehe.
OK fair enough. Here you go:
$& We all know that you can use backreferences in perl. \1 contains the
current expression matched within the regex, and $1 can be used on
the external side to pull that value as well.
$` can be replaced by a regex! Try (.*?)regex
$' can be evaluated as well by using regex(?=(.*))
(Note the lookahead)
========================================================================
REFERENCES:
All info relayed here was a result of experience and of sticking my nose
into the very useful and detailed O'Reilly Owl book for refreshment:
Mastering Regular Expessions. Jeffery E.F. Freidl. 1997.
O'Reilly & Assoc., Sebastapol, CA.
Back to page 1 <<<<<<<<<< ========================================== EOD |
|
|||||||||
|
[Home] [Acceptable Usage] [Privacy Policy] [Downloads] [LDP Mirror] [Member's Sites] [Archives]
Copyright © 2006: Unix Hobbyists' Administrators' & Coders' Club. All Rights Reserved.
|
|||||||||||