Learn the AWK Programming Language
Updated by Linode Contributed by Mihalis Tsoukalos
What is AWK?
AWK is a Turing-complete pattern matching programming language. The name AWK is derived from the family names of its three authors: Alfred Aho, Peter Weinberger and Brian Kernighan. AWK is often associated with sed, which is a UNIX command line tool. However, sed is more appropriate for one line UNIX shell commands and is typically used only for text processing.
AWK is great for data reporting, analysis, and extraction and supports arrays, associative arrays, functions, variables, loops, and regular expressions. Current Linux systems use improved versions of the original AWK utility. The main enhancement to these AWK variants is support for a larger set of built-in functions and variables. The most widely used variants of AWK are: Gawk, Mawk, and Nawk.
NoteThis guide uses the Gawk version of AWK.
There are many practical uses of AWK. For example, you can use AWK and the history command to find your top 10 most frequently issued commands:
history | awk '{CMD[$2]++;count++;} END {for (a in CMD)print CMD[a] " "CMD[a]/count*100 " % " a;} ' | grep -v "./" | column -c3 -s " " -t | sort -rn | head -n10
This guide assumes familiarity with programming language concepts and is meant to provide an overview of some basic elements of the AWK programming language. In this guide you will learn:
- How to write and execute AWK programs
- The basics of the AWK programming language
- Practical uses of AWK
AWK Basics
In this section you will learn basics of the AWK programming language, including:
- How to execute AWK from the command line with one-off commands and by storing AWK code in files.
- Creating and using variables, arrays, and functions.
- Special patterns, like
BEGIN
andEND
.
NoteA pattern in AWK controls the execution of rules and a rule is executed when its pattern is a match for the current input record.
Run an AWK Program
A program in AWK can be written via the command line or by executing a file containing the program. If you want to reuse your code, it is better to store it in a file. AWK reads input from standard input or from files specified as command line arguments. Input is divided into individual records and fields. By default, new lines are parsed as a record and whitespace is parsed as a field. After a record is read, it is split into fields. AWK does not alter the original input.
The next two sections will walk you through creating a Hello World! program that you will run, both as a one-off program on the command line, and as reusable code saved in a file.
Hello World! - Command Line
When an AWK program contains the BEGIN
pattern without another special pattern, AWK will not expect any further command line input and exit. Typically, when an AWK program is executed on the command line, without the BEGIN
special pattern, AWK will continue to expect input until you exit by typing CTRL-D. The example Hello World! program below will print and immediately exit.
Execute the command as follows:
awk 'BEGIN { print "Hello World!" }'
The output will be as follows:
Hello World!
Hello World! - Input File
In this section, you will create an AWK program in an input file and then run it from the command line.
Create a file called
helloworld.awk
with the following content:- helloworld.awk
-
1 2
BEGIN { print "Hello World!" }
On the command line, run the
helloworld.awk
program. The-f
option tells AWK to expect a source file as the program to run.awk -f helloworld.awk
The output will be as follows:
Hello World!
You can also run AWK programs as executable scripts. Open
helloworld.awk
and add a bash script line to the top of the file and save it without the.awk
extension.- helloworld
-
1 2 3 4
#!/usr/bin/awk -f BEGIN { print "Hello World!" }
The
#!/usr/bin/awk -f
line defines the start of script execution.Add execute permissions to
helloworld
:chmod +x helloworld
Execute the
helloworld
program:./helloworld
The output will resemble the following:
Hello World!
Variables in AWK
AWK supports built-in and user defined variables. Built-in variables are native to AWK, whereas user defined variables are ones you define.
Built-in Variables
AWK has many built-in variables that are automatically initialized. Some of the most important ones are the following:
Variable | Definition |
---|---|
NF |
Holds the number of fields in the current input record. Each record can have a different number of fields. |
FS |
Defines the input field separator. The default value is a whitespace, but it also matches any sequence of spaces and tabs. Additionally, any number of leading or trailing whitespaces and tabs are ignored. If the value of FS is set to the null string, then each character in the current line becomes a separate field. |
FILENAME |
Stores the filename of the current input file. You cannot use FILENAME inside a BEGIN block, because there are no input files being processed. |
NR |
Keeps track of the total number of records that have been read so far. |
FNR |
Stores the total number of records that have been read from the current input file. |
IGNORECASE |
Tells AWK whether or not to ignore case in all of its comparisons or regular expressions. If IGNORECASE stores a non-zero or null value, then AWK will ignore case. |
ARGC |
Holds the number of command line arguments. |
ARGV |
Stores the actual command line arguments of an AWK program. |
User Defined Variables
User defined variables can store numeric or string values. AWK dynamically assigns variables a type based on the variable’s initial value. User defined variables, by default, are initialized to the empty string. If you convert a variable from a string to a number, the default value is zero. You can convert a string to a number and vice versa as long as the string can be converted to a valid number. It is important to keep in mind that AWK is not a type safe programming language, since this can sometimes generate bugs.
You can set a variable via the command line using the
-v
option. This command will initialize the variablecount
and print its value:awk -v count=8 'BEGIN { print count }'
To initialize variables within an input file, you can use the form
myvariable = "myvar"
for strings andmyvariable = 10
for numeric values. Create a file namedcount.awk
and add the following content:- count.awk
-
1 2 3 4 5
BEGIN { count = 10 print count }
To run this file, switch back to the command line and execute the following command:
awk -f count.awk
Your output should display:
10
Special Patterns
AWK uses patterns to control how a rule should be executed against an input record. The two main categories of patterns in AWK are regular expressions and expressions. Regular expressions use a special format to target specific sets of strings, while expressions encompass various ways to target patterns in AWK, like comparison expressions that may utilize regular expressions. Special patterns in AWK include reserved keywords that perform special actions within your AWK programs. The sections below discuss the special patterns BEGIN
, END
, BEGINFILE
, and ENDFILE
.
BEGIN and END
BEGIN
and END
are executed only once: before receiving any input and after processing all input, respectively. In this way, they can be used to perform startup and cleanup actions in your AWK programs.
Although it is not required to use BEGIN
and END
at the beginning and end of your AWK programs, it is considered good practice to do so. Additionally, you can include multiple BEGIN
and END
blocks in one program.
If an AWK program uses only BEGIN
rules without any other code, the program terminates without reading any of the specified input. However, if an AWK program contains only END
rules without any additional code, all the specified input is read. This is necessary in case the END
rule references the FNR
and NR
variables.
BEGINFILE and ENDFILE
NoteBEGINFILE
andENDFILE
only work withgawk
.
Two other patterns with special functionality are BEGINFILE
and ENDFILE
. BEGINFILE
is executed before AWK reads the first record from a file, whereas ENDFILE
is executed after AWK is done with the last record of a file.
ENDFILE
is convenient for recovering from I/O errors during processing. The AWK program can pass control to ENDFILE
, and instead of stopping abnormally it sets the ERRNO
variable to describe the error that occurred. AWK clears the ERRNO
variable before it starts processing the next file. Similarly, the nextfile
statement – when used inside BEGINFILE
– allows gawk to move to the next data file instead of exiting with a fatal error and without executing the ENDFILE
block.
As an example, create a file named
beginfile.awk
:- beginfile.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
BEGIN { numberOfFiles = 0 } BEGINFILE { print "New file", FILENAME # Check if there is an error while trying to read the file if (ERRNO) { print "Cannot read", FILENAME, "– processing next file!" nextfile } } ENDFILE { numberOfFiles++ } END { print "Total number of files processed: ", numberOfFiles }
This program showcases the usage of
BEGIN
,END
,BEGINFILE
, andENDFILE
by printing the total number of files read as well as the filename of each file.If there is a problem while reading a file, the code will report it.
Printing the filename is done with the help of the
FILENAME
variable.
Execute the file with the following command:
gawk -f hw.awk beginfile.awk givenLine.awk doesNotExist
The output will be similar to the following example. The program does not stop abnormally when it does not find an input file and provides a useful error message.
New file hw.awk Cannot read hw.awk – processing next file! New file beginfile.awk New file givenLine.awk Cannot read givenLine.awk – processing next file! New file doesNotExist Cannot read doesNotExist – processing next file! Total number of files processed: 1
Looping in AWK
AWK supports for
, do-while
, and while
loops that behave similarly to control flow statements in other programming languages. Loops execute code contained within a code block as many times as specified in the control flow statement. To illustrate loops in AWK, a working example is provided below.
Create and save a file named
loops.awk
:- loops.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
BEGIN { for (i = 0; i < ARGC; i++) printf "ARGV[%d] = %s\n", i, ARGV[i] k = 0 while ( k < ARGC ) { printf "ARGV[%d] = %s\n", k, ARGV[k] k++ } m = 0 do { printf "ARGV[%d] = %s\n", m, ARGV[m] m++ } while ( m < ARGC ) } END { for (i = 0; i < 10; i++) printf "%d ", i printf "\n" }
- The program uses the value of the
ARGC
built-in variable to control how many times to loop through each separate block of code. The result will vary depending on how many command line arguments you pass to AWK when executing the program. - The
for
loop after theEND
special pattern will print numbers from 0 - 9.
Execute the
loops.awk
input program with the following command:echo "" | awk -f loops.awk
The output will be similar to the following:
ARGV[0] = awk ARGV[0] = awk ARGV[0] = awk 0 1 2 3 4 5 6 7 8 9
Arrays
AWK does not require array indices to be consecutive integers. Instead, strings and numbers may be used. This is because AWK uses string keys internally to represent an array’s indices, and so arrays in AWK are more like associative arrays that store a collection of pairs. Unlike other programming languages, you do not need to declare an array and its size before using it, and new pairs can be added at any time. The file below serves to illustrate the behavior of arrays in AWK.
Create the file
arrays.awk
:- arrays.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
BEGIN { a[0] = 1; a[1] = 2; a[2] = 3; a[3] = 4; for (i in a) print "Index:", i, "with value:", a[i]; print "Adding two elements and deleting a[0]"; a["One"] = "One_value"; a["Two"] = "Two_value"; delete a[0]; for (i in a) print "Index:", i, "with value:", a[i]; if (a["1"] == a[1]) printf "a[1] = a[\"1\"] = %s\n", a["1"]; }
- The program creates the
a[]
array and initializes it with four separate numeric values. - The
for
block will loop through the array and print the current index and value. - It then adds two new elements to array
a[]
that use string indices instead of numbers. - It demonstrates how to delete an element from an array by deleting the
a[0]
element. - Finally, the
if
statement evaluates ifa["1"]
anda[1]
are equivalent. Since AWK stores all array elements as string keys, both indices point to the same array element and the code in theif
statement executes.
Run the program with the following command:
awk -f arrays.awk
The output will look similar to the following:
Index: 0 with value: 1 Index: 1 with value: 2 Index: 2 with value: 3 Index: 3 with value: 4 Adding two elements and deleting a[0] Index: Two with value: Two_value Index: One with value: One_value Index: 1 with value: 2 Index: 2 with value: 3 Index: 3 with value: 4 a[1] = a["1"] = 2
Note
The order of the array indices may be out of order. This is because arrays in AWK are associative and not assigned in blocks of contiguous memory.
Functions
Like most programming languages, AWK supports user-defined functions and ships with several useful built-in functions. This section will provide examples demonstrating how to use both types of functions.
Predefined Functions
AWK’s built-in functions provide mechanisms for string manipulation, numeric operations, and I/O functions to work with files and shell commands. The example below utilizes the built-in numeric functions rand()
and int()
to show how to call built-in functions.
Create and save a file named
rand.awk
:- rand.awk
-
1 2 3 4 5 6 7
BEGIN { while (i < 20) { n = int(rand()*10); print "value of n:", n; i++; } }
- The
rand.awk
program uses therand()
function to generate a random number and stores it in then
variable. By default,rand()
returns a random number between 0 and 1. To generate numbers larger than 1, the program multiplies the returned random number by 10. - AWK’s
int()
function rounds the result of therand()
function to the nearest integer.
Execute the
rand.awk
program with the following command:awk -f rand.awk
The output will resemble the following:
value of n: 2 value of n: 2 value of n: 8 value of n: 1 value of n: 5 value of n: 1 value of n: 8 value of n: 1 ...
User Defined Functions
The AWK programming language allows you to define your own functions and call them throughout an AWK program file. A function definition must include a name and can include a parameter list. Function names can only contain a sequence of letters, digits, and underscores. The function name cannot begin with a digit. In the example below, you will declare a function definition and utilize it within the AWK program.
Create and save the
myFunction.awk
file:- myFunction.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
function isnum(x) { return(x==x+0) } function sumToN(n) { sum = 0 if (n < 0) { n = -n } if ( isnum(n) ) { for (j = 1; j <= n; j++) sum = sum + j } else { return -1 } return sum } { for (i=1; i<=NF; i++) print $i, "\t:", sumToN($i) }
- The user defined function
sumToN()
takes a single parametern
and uses a for loop to increment its value and stores it in thesum
variable. - The program will take command line input, and pass it as a parameter to the
sumToN()
function and print the calculatedsum
.
Execute
myFunction.awk
with the following command:echo "10 12" | awk -f myFunction.awk
Your output will resemble the example below. If you use a different set of numbers, your output will differ from the example.
10 : 55 12 : 78
Practical Examples
This section of the guide provides a variety of practical examples to further demonstrate the AWK programming language. You can try out each example on your own Linux machine or expand on the examples for your own specific needs.
Printing
Printing a Given Line from a File
To use AWK to print a given line from a text file, create and save the
givenLine.awk
file:- givenLine.awk
-
1 2 3 4
{ if (NR == line) print $0; }
- This program will print out the record that corresponds to the value passed to the
line
variable. The program will require input either from the command line or from a file. - You should pass the value of the
line
variable to the AWK program as a command line argument using the-v
option.
By executing the
givenLine.awk
program as follows, it will print out the first line found in themyFunction.awk
program written in the previous section. (You could similarly pass it any text file.)awk -v line=1 -f givenLine.awk myFunction.awk
The output will resemble the following:
function isnum(x) { return(x==x+0) }
Execute
givenLine.awk
again, passing line 4:awk -v line=4 -f givenLine.awk myFunction.awk
This time the output is as follows:
sum = 0
Printing Two Given Fields from a File
In this example, the AWK program will print the values of the first and third fields of any text file.
Create and save the file
field1and3.awk
:- field1and3.awk
-
1 2 3
{ print $1, $3; }
Create and save the file
words.txt
:- words.txt
-
1
one two three
Execute
field1and3.awk
passingwords.txt
as input:awk -f field1and3.awk words.txt
The output will print only the first and third words (fields) contained in the file:
one three
Note
You can also execute the contents of
field1and3.awk
on the command line and passwords.txt
as input:awk '{print $1, $3}' words.txt
Counting
Counting Lines
The following example AWK program will count the number of lines that are found in the given text file(s).
FNR
stores the total number of records that have been read from the current input file.
Create and save the
countLines.awk
file:- countLines.awk
-
1 2 3 4 5 6 7 8
{ if (FNR==1) print "Processing:", FILENAME; } END { print "Read", NR, "records in total"; }
- The use of
FNR
makes sure that the filename of each processed file will be printed only once. END
makes sure that the results will be printed just before AWK finishes executingcountLines.awk
.
Create and save the
data.txt
file. This file will be passed to AWK as input for processing.- data.txt
-
1 2 3 4 5 6 7
one two three 4 6 seven not eight
Execute
countLines.awk
with the following command, passingdata.txt
as input:awk -f countLines.awk data.txt
The output will resemble the following:
Processing: data.txt Read 7 records in total
Execute
countLines.awk
with multiple files for processing. You can usewords.txt
from the previous exercise.awk -f countLines.awk data.txt words.txt
You should see a similar output:
Processing: data.txt Processing: words.txt Read 8 records in total
Counting Lines with a Specific Pattern
The following AWK code uses the variable n
to count the number of lines that contain the string three
:
awk '/three/ { n++ }; END { print n+0 }'
The code above tells AWK to execute
n++
each time there is a match to the/three/
regular expression.When the processing is done, the code in
END
is executed. This code prints the current value ofn
converted to a number by adding the numeral zero.
Create a file named
dataFile.txt
to pass to AWK as input for processing:- dataFile.txt
-
1 2 3 4 5 6 7
one two three four three two one
Execute the example code and pass
dataFile.txt
as input:awk '/three/ { n++ }; END { print n+0 }' dataFile.txt
The output will look as follows:
2
Counting Characters
In this example, the countChars.awk
file calculates the number of characters found in an input file.
Create and save the file
countChars.awk
:- countChars.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14
BEGIN { n = 0; } { if (FNR==1) print "Processing:", FILENAME; n = n + length($0) + 1; } END { print "Read", n, "characters in total"; }
- This program makes use of the built-in string function
length()
, which returns the number of characters in a string. In the case of the program, the string will be provided by the entirety of the current record, which is indicated by$0
. - The
+ 1
appended to thelength()
function is used to account for the new line character that each line includes.
Execute
countChars.awk
by running the following command and pass it thecountLines.awk
file from the previous exercise.awk -f countChars.awk countLines.awk
The output will look similar to the following:
Processing: countLines.awk Read 110 characters in total
Execute
countChars.awk
with multiple files to process as follows:awk -f countChars.awk countLines.awk field1and3.awk
Processing: countLines.awk Processing: field1and3.awk Read 132 characters in total
Calculating Word Frequencies
This example demonstrates some of the advanced capabilities of AWK. The file wordFreq.awk
reads a text file and counts how many times each word appears in the text file using associative arrays.
Create and save the file
wordFreq.awk
:- wordFreq.awk
-
1 2 3 4 5 6 7 8 9 10 11
{ for (i= 1; i<=NF; i++ ) { $i = tolower($i) freq[$i]++ } } END { for (word in freq) print word, ":", freq[word] }
wordFreq.awk
uses a for loop to traverse through an input file and add each record to thefreq[]
array.- The
tolower()
built-in string function is used to ensure the program does not count the same word multiple times based on differences in case, e.g., seven and Seven are not counted as different words. - Before the program exits, the
END
block prints out each word and its frequency with the input file.
Create and save the file
wordFreq.txt
to use as an input file.- wordFreq.txt
-
1 2 3 4 5
one two three one four seven Seven One Two TWO one three five
Execute the
wordFreq.awk
program and passwordFreq.txt
as input:awk -f wordFreq.awk wordFreq.txt | sort -k3rn
Note
Thesort -k3rn
command is used to sort the output ofwordFreq.awk
based on a numeric sort in reverse order.The output will resemble the following:
one : 4 two : 3 seven : 2 three : 2 five : 1 four : 1
Updating Docker Images
Use the following series of piped commands to update all Docker images found on your local machine to their latest version:
docker images | grep -v REPOSITORY | awk '{print $1}' | xargs -L1 docker pull
- In this example, AWK is just a piece of the entire command. AWK does the job of extracting the first field from the result of executing the
docker images
command.
- In this example, AWK is just a piece of the entire command. AWK does the job of extracting the first field from the result of executing the
Finding
Finding the Top-10 Commands of your Command History
Use The following shell command to find your top 10 most used commands by piping the output of
history
to AWK as input:history | awk '{CMD[$2]++;count++;} END {for (a in CMD)print CMD[a] " "CMD[a]/count*100 " % " a;} ' | grep -v "./" | column -c3 -s " " -t | sort -rn | head -n10
First, the command executes the
history
command to be used as AWK’s input.This is processed by a complex
awk
command that calculates the number of times each command appears inhistory
by considering the second field of each record. This is the field that corresponds to the previously issued commands. These values are stored in theCMD[]
associative array.At the same time, the total number of commands that have been processed are stored in the
count
variable.The frequency of each command is calculated with the
CMD[a]/count*100
statement and printed on the screen along with the command name.The formatting and the sorting of the output is handled by the
grep
,column
,sort
, andhead
command line utilities.
Your output should resemble the following:
2318 18.4775 % git 1224 9.75688 % ll 1176 9.37425 % go 646 5.14946 % docker 584 4.65524 % cat 564 4.49582 % brew 427 3.40375 % lenses-cli 421 3.35592 % cd 413 3.29215 % vi 378 3.01315 % rm
Finding the Number of Records that Appear More than Once
This program’s logic utilizes the behavior of AWK associative arrays. The associative array’s keys are the entire lines of the passed input. This means that if a line appears more than once, it will be found in the associative array and will have a value that is different from the default, which is 0
.
Create and save the file
nDuplicates.awk
:- nDuplicates.awk
-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
BEGIN { total = 0; } { i = tolower($0); if (freq[i] == 1) { total = total + 2; } else if (freq[i] > 1) { total++; } freq[i]++; } END { print "Found", total, "lines with duplicate records."; }
Execute the
nDuplicates.awk
file and pass the file to itself as input:awk -f nDuplicates.awk nDuplicates.awk
The output will look similar to the following:
Found 5 lines with duplicate records.
Execute the command again, passing the file twice to itself:
awk -f nDuplicates.awk nDuplicates.awk nDuplicates.awk
The output will look similar to the following:
Found 42 lines with records that already existed
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
Join our Community
Find answers, ask questions, and help others.
This guide is published under a CC BY-ND 4.0 license.