
Most
Linux users avoid regular expressions (or "regex"), and unless you're a
programmer you don't really need to know much about them. But a little
knowledge can be useful as regular expressions are often presented as
an option to enhance searches. You'll find them
under More Options in
OpenOffice.org's
Find & Replace, for example.
Think of regular
expressions as a kind of mini-programming language. Rather than
go into details -- you'll find a great resource
here --
I'll just list some examples you might find useful in your searches.
Metacharacters
Regex will find any character you enter
except for the
following:
. ? | ^ $
* + [ ] (
)
\
They're known as "metacharacters" and are part of the regex "language".
| The fullstop (".")
will match any
character except for line breaks: |
Searching on
c.t will
find
cat,
cot,
cut,
c2t,
c#t, etc.
A question mark ("?") makes the
preceding character in the search optional:
|
Searching on
colou?r will
match
colour
and
color.
| The vertical bar ("|") separates
alternatives: |
Searching on
gray|grey will
match both
gray
and
grey.
Searching on
abc|def|xyz
will match
abc,
def
or
xyz.
| The caret ("^")
matches the start of a string -- which is to say, after any line break: |
Searching on
^blah will only
match the first
blah
in a new line beginning
blah, blah, blah
| The dollar sign ("$") matches the
end of a string -- which is to say, before any line break: |
Searching on
blah$ will only
match the last
blah
in a line ending in
blah,
blah, blah
| The asterisk ("*")
matches the preceding character zero or more times: |
Searching on
ab*c will
match
ac,
abc,
abbc,
abbbc, etc.
| The plus sign ("+")
matches the preceding character once or more times: |
Searching on
ab+c will
match
abc,
abbc,
abbbc, etc. but
not
ac.
| Square brackets ("[" and "]") find single
character matches: |
Searching on
gr[ae]y will
match
gray
or
grey
but not
graey
or
gry.
Searching on
in[du] will
match the
ind
in
Windows
and the
inu
in
Linux.
To find multiple characters, repeat the square
brackets:
Searching on
[1-9][0-9] will
match all double-digit numbers from
10 to
99.
Add a hyphen to indicate a range:
Searching on
z[1-3] will
match
z1,
z2 and
z3 but
not
z4.
Multiple ranges are allowed:
Searching on
z[1-3a-c] will
match
z1,
z2,
z3,
za,
zb and
zc but
not
z4,
zd or
zA.
Searching on
z[1-3a-cA-C]
will match all of the above plus
zA,
zB and
zC.
A caret ("
^")
inside square
brackets reverses the sense of the search:
Searching on
z[^1-3] will
match
z4,
z0 and
zB but
not z1,
z2 or
z3.
| Brackets ("("
and ")")
group a series of pattern elements into a single element: |
Searching on
(.pet) will
find
carpet,
parapet
and
petal.
Searching on
(g..)|(m..) in
the string
program
name will find matches in
gra,
m n
and
me .
| The backslash ("\")
allows you to search for any of the metacharacters: |
Searching on
$2 in the
string
$2.50
will find nothing because
$ is a
metacharacter.
Searching on
\$2 in the
string
$2.50
will find
$2.
Searching on
\\ in the
string
C:\filename
will find
\.
Searching on
\\\\ in the
string
C:\\filename
will find
\\.
Searching on
1\+ in the
string
1+2=3
will find
1+.
Character Classes
| The backslash ("\")
is also associated with special characters: |
\d matches any
digit. Equivalent to
[0-9].
\D matches any
non-digit. Equivalent to
[^0-9].
\w matches any
alphanumeric character plus the underscore ("_"). Equivalent
to
[A-Za-z0-9_].
\W matches any
non-alphanumeric character excluding underscore. Equivalent to
[^A-Za-z0-9_].
\s matches
whitespace characters -- including tabs and line breaks.
Equivalent to
[\f\n\r\t\v].
\S matches any
non-white space character. Equivalent to
[^\f\n\r\t\v].
\t matches tab
characters.
\r matches
carriage returns.
\n matches line
feeds.
Note that Windows
uses
\r\n
to terminate lines. Linux just uses
\n.
\f
matches form feeds.
\v
matches vertical tabs.
\b
matches a word boundary:
er\b will only
match the last
er
in
"observer ".
\bword\b will
find "
word"
in "
word ",
"
word,"
and "
-word."
but not
crossword
or
wordy.
\B
matches a non-word boundary:
er\B will only
match the first er in "observer ".
\A
matches the start of a string.
\A. will match
the
a
in
abc.
\Z
matches the end of the string.
.\Z matches
f in
abcdef
Quantifiers
| Curly braces ("{"
and "}")
specify the number of times the preceeding character is to be searched: |
Searching on
o{2} will match
the
oo's
in
good,
food
and
Wooooooo!
but not
oboe.
Conclusion
By now you'll probably appreciate how powerful regex's can be. By
combining metacharacters, you can perform some pretty sophisticated
matches. Searching for:
\b[1-9][0-9]{3}\b will find all numbers between 1000 and
9999.
\b[1-9][0-9]{2,4}\b
will find all numbers between 100 and 99999.
(\<(/?[^\>]+)\>)
will find all HTML tags.
(\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,6})
will find all email addresses.
((?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,15})
will find all 8-15 character strings with at least one
upper case letter, one lower case letter, and one digit. In short,
it'll identify all potential passwords!
<--Previous
Hidden Linux
Next Hidden
Linux -->