U.H.A.C.C. Join today! Linux Free BSD OSX (Darwin) Open BSD GNU - Free Software Foundation IBM AIX Sun - Solaris, Open Office SGI IRIX
home club info listserv sysblogd forums geeky links tech help webmail contact
hacker emblem


LDP Mirror
Forums RSS

Past club events:

UHACC @ Penguicon
LinuxFest 2004
SCO-B-Q
UHACC @ Flatcon

Club Pages:

ISU chapter: ISUnix
Member's Sites

Projects:

UHACC CVS

Club Documents:

Our Guiding Principles
UHACC Constitution
Operating Code
Membership App
AUP


Valid CSS!

                  REGULAR EXPRESSIONS - PART I (cont'd)

A Primer                                                          page 2
(c) C. Geigner, President, UHACC - October 2003
========================================================================
\<   Left Angle Bracket matches left word boundary.
\>   Right Angle Bracket matches right word boundary.
  EXAMPLE:
  \<[Dd][Oo][Gg]\>  --> matches the word dog, case insensitive, with
                        word-boundary characters* on both ends.
* Boundary chars are ^$'\$ -.\?! and other non-alphanumerics.
========================================================================
I'll bet you're wondering why you haven't seen the ubiquitous "*" yet!
Here's why. Beginners and even non-beginners sometimes are drawn to the
all-powerful "*", and for all the wrong reasons. The goal of using regex
is to make pricise matches, and all to often, we're so anxious to
get a regex that works that we forget to make a regex that works
well. I'd like for you to remember that when searching for the 
correct expression, and especially when you reach for the "*", deal?
========================================================================

*     Star (asterisk) expression matches zero-to-many occurances of the 
      preceding expression. This expression will be greedy, so be 
      careful, especially when using it in conjunction with "." to find 
      "a bunch of whatever."
  EXAMPLE:
  %s/^211\.65\..*// --> search all lines for ones beginning with 
                  "211.65." and delete that match plus all that follows 
                   it (in this case it deletes the whole line). (vi/ex)
  graphite[0-9]*[ab]*\.exe --> matches all files named like:
                   graphite.exe
                   graphite2a.exe
                   graphite34.exe
                   graphite9765343453876439b.exe
                   graphite5.exe

+     Plus sign expression matches one-to-many occurances of the 
      preceding expression. 
  EXAMPLE:
  [0-9]+\.[0-9]+   --> matches any floating point number in n.n form.

I'd like to stop here and stress that metacharacters such as * ? and +
are greedy. That is, they will match as much as they are allowed
to before exiting. Keep this in mind while using

This next feature only works in perl as far as I know, but it is too 
cool not to mention, being that we all have been (or will be at some 
point) in this boat: You want to control the greediness of your search
and restrict it to range from a minimum number of matches to a maximum 
number of matches.
Perl implements Intervals, as a way to reign in greeeedy matching
{min(,max)}  Intervals specify a range: the minimum number of matches
             to a maximum number of matches. 
  EXAMPLE:
  [0-9]{4} acts like [0-9][0-9][0-9][0-9]
  [0-9]{2,4} matches all from [0-9][0-9] to [0-9][0-9][0-9][0-9]
note that in some instances you may need to escape your braces thusly:
\{min,max\}
  
========================================================================
APPENDIX 2:

A2: Character & Class Shorthand Notation
These are supported in awk and perl and flex, but not all are 
supported in other apps. Check compatibility before using. 
Character Shorthand:
  \a    --> matches occurance of BELL (007).
  \b    --> matches occurance of backspace (010).
  \e    --> matches occurance of escape char (033).
  \f    --> matches occurance of form feed (014).
  \n    --> matches occurance of newline (015).
  \r    --> matches occurance of carriage return.
  \t    --> matches occurance of tab (011).
  \xnn  --> matches occurance of ASCII char given in hex value.
  \nnn  --> matches occurance of ASCII char given in octal value.

Class Shorthand
  \d    --> matches occurance of digit
  \D    --> matches occurance of non-digit
  \w    --> matches occurance of alpha-numeric character
  \W    --> matches occurance of non alpha-numeric charater
  \s    --> matches occurance of any whitespace character
  \S    --> matches occurance of any non-whitespace character

NOTE: When using these, it may be necessary to escape your backslashes,
being that may apps (like emacs) will strip out single backslashes
before processing.
========================================================================
REFERENCES:
All info relayed here was a result of experience and of sticking my nose
into the very useful and detailed O'Reilly Owl book for refreshment:
Mastering Regular Expessions. Jeffery E.F. Freidl. 1997. 
O'Reilly & Assoc., Sebastapol, CA.

Back to page 1 <<<<<<<<<< ====================== >>>>>>>>>> On to Page 3
      

Upcoming Events


UHACC Pre-Meeting


Wednesday Evenings, ~5:15-6:30pm

- Lunker's


Officially unofficial pre-meeting meeting.
Come. Eat. Geek.


UHACC Meeting


Every Wednesday - 7:00-9:00pm

IWU Center for Natural Science Learning and Research, Fishbowl, floor 2. [Directions]



Join us every Wednesday for our usual gratuitous display of geekiness. Meetings are free and attendance is open.

Hope to see you there!


[Home] [Acceptable Usage] [Privacy Policy] [Downloads] [LDP Mirror] [Member's Sites] [Archives]

Copyright © 2006: Unix Hobbyists' Administrators' & Coders' Club. All Rights Reserved.
UHACC, P.O. Box 6376 - Bloomington, Illinois 61702-6376
"First they ignore you, then they laugh at you, then they fight you, then you win." - Mahatma Gandhi