for example,
while () # read line into $_ until EOF on standard input { if (/PERL/) # if $_ (current line) contains the word "PERL" { print; # then print this line } }
ignore case option
by appending the character i to the match operator, a "match"
is considered to have been found even if there are case differences
between the target string (i.e. the current line, by default) and
the pattern_to_match.
For example, if in the previous example the if statement were changed
to:
if (/PERL/i)
then lines containing "Perl", "perl", "pERl", etc. would be printed (and not
just those containing "PERL").
using a different target string than the current line ($_)
$target_string_name =~ /pattern_to_match/
=~ is called the binding operator;
note that this is not an assignment operation or an equality
test of any form.
Regular Expression Elements | |
---|---|
. | any character except a newline |
+ | one or more of the preceding character (or group); for example /be+t/ would match "he bet on the race" and "she ate a beet" and "I hope you feel better" but not "Their debt increased"; similarly /b(an)+/ would match "the band" and "banana", but not "batter" |
? | zero or one of the preceding character (or group); for example, /be?t/ would match "he bet" and "in debt", but not "ate a beet" |
* | zero or more of the preceding character (or group); for example, /be?t would match "better", "beet", and "debt" |
[abcd] | match any one of 'a', 'b', 'c', or 'd'; for example, /b[aeiou]t/ would match "rabbit" and "batter", but not "debt" nor "byte" |
[a-d] | match any one in the range of characters from 'a' to 'd' inclusive; for example, /[a-z][A-Z]/ (any lower case letter immediately followed by an upper case letter) would match "deSotto" and "MacIntosh" but not "Susie Smith" |
[a-dA-D0-4] | match any one in the range of characters from 'a' to 'd' or from 'A' to 'D" or
from '0' to '9' inclusive; for example, /0x[0-9A-F]/i ("0x followed by a hexadecimal digit, with the case ignored) would match to "carriage return = 0x0D" and "line feed is 0XA" |
[^....] | match something that is not one of the listed elements following the caret (^); for example, /^a-zA-Z\s/ (not a letter or a whitespace) |
{x,y} | match preceding character (or group) at least x but no more than y times; for example /\s[0-9]{2,5}\s/ (at least 2 but not more than 5 digits between two whitespaces) would match "there are 88 keys on a piano" and "1 year = are 365 days", but not "I saw 3 blind mice" nor "there are over 1000000 neurons in the brain" |
{x} | match the preceding character (or group) exactly x times |
{x,} | match the preceding character (or group) at least x times (with no upper limit) |
(abcd|iou|xyz) | match any substring to "abcd" or "iou" or "xyz"; for example, /profit|loss|income|expense/i would match "His net income was $37,000" or "the expenses where quite high", but not "SanPaulos seems like a dream" |
\ | escape special meaning of regular expression character when followed by one of
the regular expression characters + ? . * ( ) { } [ ] | \ or / for example, /[0-9]\*/ would match "multiply 2*number_of_years_in_jail", but not "there were 357 orangutangs in my bed" |
\r | a carriage return |
\n | a newline (or line feed) |
\t | a tab |
\f | a form feed |
\d | a digit (same as [0-9]) |
\D | a non-digit (same as [^0-9]) |
\w | a word character (same as [0-9a-zA-Z]) |
\W | a non-word character (same as [^0-9a-zA-Z]) |
\s | a space or whitespace (same as [\r\t\n\f]) |
\S | a non-space or non-whitespace |
\b | a word boundary; punctuation or whitespace (or non-alphanumeric at the beginning or end of a string) |
\B | a non-word boundary character |
^ | the beginning of the string |
$ | the end of the string |
(...) | group of characters |
$1,...,$9 | reference to a group which matched (used by substitute operation); $1 is the first group that matched within the string, $2 is the second group, etc. |
$& | last previous pattern matched |
$` | portion of the target string to the left of the last previous pattern match |
$' | portion of the target string to the right of the last previous pattern match |
if the pattern_to_match is found in the current_line ($_) then the matched pattern is replaced by substitution_string; by default this substitution is only done once.
for example,
$_ = "The C++ language is to be used in this course"; s/C\+\+/Java/; print; # outputs "The Java language is to be used in this course"
The substitution operator returns a value equal to the number of substitutions made; for example,
<STDIN> # input a line print "\nThere were ", s/[aeiou]/x/ig, " vowels in this line\n";