Awk is a powerful text parsing tool for unix and unix-like systems.
The basic syntax is:
awk [options] 'pattern {action}' file
Here is a simple example file that we will be using, called file1.txt
:
name color amount
apple red 4
banana yellow 6
strawberry red 3
grape purple 10
apple green 8
plum purple 2
kiwi brown 4
potato brown 9
pineapple yellow 5
First command:
awk '{print $2}' file1.txt
As you can see, the “print” command will display the whatever follows. In this case we are showing the second column using “$2”. This is intuitive. To display all columns, use “$0”.
This example will output:
color
red
yellow
red
purple
green
purple
brown
brown
yellow
Second command:
awk '$2=="yellow"{print $1}' file1.txt
This will output:
banana
pineapple
As you can see, the command matches items in column 2 matching “yellow”, but prints column 1.
By default, awk uses white space as the file separator. You can change this by using the -F option. For instance, file1.csv
looks like this:
name,color,amount
apple,red,4
banana,yellow,6
strawberry,red,3
grape,purple,10
apple,green,8
plum,purple,2
kiwi,brown,4
potato,brown,9
pineapple,yellow,5
A similar command as before:
awk -F"," '$2=="yellow" {print $1}' file1.csv
will still output:
banana
pineapple
Regular expressions work as well:
awk '$2 ~ /p.+p/ {print $0}' file1.txt
This returns:
grape purple 10
plum purple 2
Numbers are interpreted automatically:
awk '$3>5 {print $1, $2}' file1.txt
Will output:
name color
banana yellow
grape purple
apple green
potato brown
Using output redirection, you can write your results to file. For example:
awk -F, '$3>5 {print $1, $2}' file1.csv > output.txt
This will output a file with the contents of the query.
Here’s a cool trick! You can automatically split a file into multiple files grouped by column. For example, if I want to split file1.txt into multiple files by color, here is the command.
awk '{print > $2".txt"}' file1.txt
This will produce files named yellow.txt, red.txt, etc. In upcoming episodes, we will show how to improve the outputs.
Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.
The HPR Website Design is released to the Public Domain.