The syntax for using regular expressions to match lines in AWK is as follows:
word ~ /match/
Or for not matching, use the following:
word !~ /match/
Remember the following file from the previous episodes:
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
We can run the following command:
$1 ~ /p[elu]/ {print $0}
We will get the following output:
apple red 4
grape purple 10
apple green 8
plum purple 2
pineapple yellow 5
In another example:
$2 ~ /e{2}/ {print $0}
Will produce the output:
apple green 8
Certain characters have special meaning when using regular expressions.
^
- beginning of the line$
- end of the line\A
- beginning of a string\z
- end of a string\b
on a word boundary[ad]
- a or d[a-d]
- any character a through d[^a-d]
- not any character a through d\w
- any word\s
- any white-space character\d
- any digitThe capital version of w, s, and d are negations.
Or, you can reference characters the POSIX standard way:
[:alnum:]
- Alphanumeric characters[:alpha:]
- Alphabetic characters[:blank:]
- Space and TAB characters[:cntrl:]
- Control characters[:digit:]
- Numeric characters[:graph:]
- Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)[:lower:]
- Lowercase alphabetic characters[:print:]
- Printable characters (characters that are not control characters)[:punct:]
- Punctuation characters (characters that are not letters, digits, control characters, or space characters)[:space:]
- Space characters (such as space, TAB, and formfeed, to name a few)[:upper:]
- Uppercase alphabetic characters[:xdigit:]
- Characters that are hexadecimal digits.
- match any character+
- match preceding one or more times*
- match preceding zero or more times?
- match preceding zero or one time{n}
- match preceding exactly n times{n,}
- match preceding n or more times{n,m}
- match preceding between n and m times(...)
- Parentheses are used for grouping|
- Means or in the context of a grouped matchsub
command substitutes the match with the replacement string. This only applies to the first match.gsub
command substitutes all matching items.gensub
command command substitutes the in a similar way as sub and gsub, but with extra functionality&
character in the replacement field references the matched text. You have to use \&
to replace the match with the literal & character.Example:
{ sub(/apple/, "nut", $1);
print $1}
The output is:
name
nut
banana
strawberry
grape
nut
plum
kiwi
potato
pinenut
Another example:
{ sub(/.+(pp|rr)/, "test-&", $1);
print $1}
This produces the following output:
name
test-apple
banana
test-strawberry
grape
test-apple
plum
kiwi
potato
test-pineapple
Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
The HPR Website Design is released to the Public Domain.