U.H.A.C.C. Join today! Linux Free BSD OSX (Darwin) Open BSD GNU - Free Software Foundation IBM AIX Sun - Solaris, Open Office SGI IRIX
home club info listserv sysblogd forums geeky links tech help webmail contact
hacker emblem


LDP Mirror
Forums RSS

Past club events:

UHACC @ Penguicon
LinuxFest 2004
SCO-B-Q
UHACC @ Flatcon

Club Pages:

ISU chapter: ISUnix
Member's Sites

Projects:

UHACC CVS

Club Documents:

Our Guiding Principles
UHACC Constitution
Operating Code
Membership App
AUP


Valid CSS!

                  REGULAR EXPRESSIONS - PART II

A Primer                                                          page 3
(c) C. Geigner, President, UHACC - November 2003
========================================================================
Allright, so now we know how to construct some pretty useful 
expressions. But what is the context? I purposefully left out operands,
modifiers, and all that in Part I so that you could just concentrate on
expressions without worrying about the constructs that we use to apply
them. I suppose that now is the time to put the rubber to the road, lest
I leave you all out there in theory land.
========================================================================
DELIMITERS:

//   Forward Slashes are the defacto standard for containing expression
     and replacement elements. Defacto, I say, because in many languages
     you may choose any character to delimit elements!
     Field 1 is reserved for the expression that you want to match.
     Field 2 is optional, but can be used with the "s" operand (more on 
     that below) to effect a search-and-replace. 
  EXAMPLE 1: (awk)
  awk -F: '/\/bin\/ksh/{print $1" "$7}' /etc/passwd
       Slashes delimit the awk search for lines containing "/bin/ksh". 
       When awk makes a match on a line, it will then print fields 1&7
       using ":" as the field delimiter.
  EXAMPLE 2: (perl)
   =~ m/^Subject:.*/ --> Assigns lines beginning with the string
      "Subject:" to variable . You could use any delimiter though,
       in this case:  =~ m!^Subject:.*! I use bang (!) to
       delimit my expression.
  EXAMPLE 3: (vi/ex)
  :%s/^[0-9]+// --> Searches line-by-line for lines starting with 1 or
       more numbers and replaces them with (hint: second field contains 
       replacement, but second filed contains nothing, not even a 
       space!) NOTHING - right! This erases whole numbers at the 
       beginning of a line. 

  NOTE: Rather than using delimiters to enclose regex's and 
  substitution patterns, TCL uses the constructs "regexp" & "regsub" to
  invoke regexes.
 
Operands:

s   Substitute. Searches for regex in field 1 and performs replacement 
    of matched items with literal in field 2 (no, you cannot place a 
    regex in field 2! You can, however use backreferenced items 
    as literals (as well as the whole match through "&" I've heard...)
   
m   Match (perl). Indicates that we are matching.
g   Global Search. Indicates that we are searching globally for our regex.

q   Create quoted string. AND
qq  Create double quoted string.
    qq/This is actually in double quotes/
    is the same as
    "This is actually in double quotes" 

Modifiers:

e   Causes matches to be processed by eval (perl)
g   Global modifier. Your regex search will end at the first positive
    match unless you tell it to keep going by using "g"
    s/regex/literal/g

i   Makes regex case-imsensitive. Useful in small scripts, but the 
    overhead required makes it cumbersome for use in big apps. (perl)
     =~ m/tom clancy/i

m   Multi-line mode. Treats input lines as seperate entities, recognizing
    "^" and "$" at the beginning and end of all lines. ("." will NOT
    match the newline char)
     =~ m/tom clancy/mig --> Matches 

o   Optimize. Good for use with expressions that get reused over and 
    over and over and... right! Iteration constructs like "foreach" or 
    "while" are perfect candidates for use with the optimize operator.
    This is because the regex engine normally reevaluates any regex 
    every time it encounters one in your code - even when it is in a 
    loop! To allay this problem, we can tell the engine to re-use the 
    one it has already calculated a binary for.

    foreach  
s   Single-line mode. Treats input as one big string. "." WILL match the
    newline character.
     =~ m/tom clancy/s

ms  Use this modifier combo to get the best of both worlds: all lines 
    treated independently "^" and "$" work for beginning and end of each
    line, yet "." will certainly match a newline char.

x   Free Formatting. The x modifier allows the regex engine to ignore
    all whitespace characters in the expression. This allows you to
    "spread out" - mostly to make your code more readable when 
    implementing large expressions

     =~ m{
       (     1?[0-9]?[0-9]
     |       2([0-4] [0-9]
     |       5 [0-5]       ))
             \.\1\.\1\.\1
    }x


Using Lookahead & Perl-ish Regex Variables

LOOKAHEAD:

?=  Lookahead operator.
    George(?=W\. Bush|H\.W\. Bush|Bush) matches George, but only if
    the string is followed by any of the alternatives.

?!  Negative lookahead operator
    Tommy(?!TuTone|The Cat) matches all instances of "Tommy" but NOT
    if it is followed by the alternatives.


$`  Contains all data that flowed by before a match was made.
$&  Contains the match
$'  Contains all remaining data after a successful match

Should you use any of these? NOOOOOOO! From what I've read, these are
fraught with gotchas and nightmarish overhead, so use these at your own
peril. How about some alternatives (since I just showed you something 
cool that you can't use hehe.
OK fair enough. Here you go:

$& We all know that you can use backreferences in perl. \1 contains the 
current expression matched within the regex, and $1 can be used on 
the external side to pull that value as well.

$` can be replaced by a regex! Try (.*?)regex
$' can be evaluated as well by using regex(?=(.*)) 
   (Note the lookahead)
========================================================================
REFERENCES:
All info relayed here was a result of experience and of sticking my nose
into the very useful and detailed O'Reilly Owl book for refreshment:
Mastering Regular Expessions. Jeffery E.F. Freidl. 1997. 
O'Reilly & Assoc., Sebastapol, CA.

Back to page 1 <<<<<<<<<< ========================================== EOD      

Upcoming Events


UHACC Pre-Meeting


Wednesday Evenings, ~5:15-6:30pm

- Lunker's


Officially unofficial pre-meeting meeting.
Come. Eat. Geek.


UHACC Meeting


Every Wednesday - 7:00-9:00pm

IWU Center for Natural Science Learning and Research, Fishbowl, floor 2. [Directions]



Join us every Wednesday for our usual gratuitous display of geekiness. Meetings are free and attendance is open.

Hope to see you there!


[Home] [Acceptable Usage] [Privacy Policy] [Downloads] [LDP Mirror] [Member's Sites] [Archives]

Copyright © 2006: Unix Hobbyists' Administrators' & Coders' Club. All Rights Reserved.
UHACC, P.O. Box 6376 - Bloomington, Illinois 61702-6376
"First they ignore you, then they laugh at you, then they fight you, then you win." - Mahatma Gandhi