AWK is an old program, and a bit arcane as a programming language, but it's also very simple to use if you have the right mental model. gawk
(GNU AWK) is also available, but I will be referring to the classic version mostly.
The command line options are specified in the awk reference, and that also includes the basic program structure and expressions.
I like to write one-offs to process regular text files. I mean 'regular text file' as in 'text file following rules', not as 'any plain old text file' - if you need to get fancy when processing something, Python is a better language for quick and dirty programs - I've written about text processing in Python before.
The way I approach writing awk is usually with a one-off command line I build up somewhere so I can touch up and paste into a terminal to run.
The starting structure is usually something like this:
awk '{print}' file.txt
This will just print out each line in file.txt
- that's what the '{print}'
program is. It's time to start writing something up, and here are the basic rules.
Often I will use awk to count specific lines or to extract some information from those lines.
I use the BEGIN
pattern to initialize counters, then match with regular expressions to increment, and write at the END
pattern.
For example, this will count top-header lines in a markdown file.
awk 'BEGIN {h=0}; /^# (.*)$/ {h+=1}; END {print "headers: " h}' file.md
NOTE: in Windows, the caret (^) character is also used as an escape character, so you would have to double it like this:
awk 'BEGIN {h=0}; /^^# (.*)$/ {h+=1}; END {print "headers: " h}' file.md
To print them out:
awk 'BEGIN {h=0}; /^# (.*)$/ {print;h+=1}; END {print "headers: " h}' file.md
Now, if this is all you wanted to do, grep
and wc
have you covered. awk
shines when you need to keep state as you go through your file.
Let's say I only want to count headers after a CONTENT STARTS line.
awk 'BEGIN {c=0;h=0}; /^# (.*)$/ {if (c) {h+=1}}; /CONTENT STARTS/ {c=1} END {print "headers: " h}' file.md
If you want to see more examples, golinuxcloud has some good ones.
Happy AWK scripting!