gawk is the GNU implementation of the Awk programming language, first developed for the UNIX operating system in the 1970s. The Awk programming language specializes in dealing with the format of data in text files, particularly text data organized into columns.
With the Awk programming language, you can manipulate or extract data, generate reports, match patterns, perform calculations, and more, with great flexibility. Awk allows you to perform somewhat difficult tasks with a single line of code. To achieve the same results using traditional programming languages such as C or Python would require extra effort and many lines of code.
gawk also refers to the command-line utility available by default with most Linux distributions. Most distributions also provide a symbolic link for awk that points to gawk. For simplicity, from now on, we will refer to the utility only as awk.
awk processes the data directly from the standard input – STDIN. A common pattern is to pipe the output of other programs into awk to extract and print data, but awk can also process data from files.
In this article, you’ll use awk to analyze data from a file with space-separated columns. Let’s start by reviewing the sample data.
Sample data
For the examples in this guide, let’s use the output of the ps ux command saved in the psux.out file. Here is a sample of the data in the file:
$ head psux.out USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ricardo 1446 0.0 0.2 21644 11536 ? Ss Sep10 0:00 /usr/lib/systemd/systemd -user ricardo 1448 0.0 0.1 49212 5848 ? S Sep10 0:00 (sd-pam) ricardo 1459 0.0 0.1 447560 7148 ? Sl Sep10 0:00 /usr/bin/gnome-keyring-daemon -daemonize -login ricardo 1467 0.0 0.1 369144 6080 tty2 Ssl+ Sep10 0:00 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session Ricardo 1469 0.0 0.1 277692 4112 ? Ss Sep10 0:00 /usr/bin/dbus-broker-launch -scope user ricardo 1471 0.0 0.1 6836 4408 ? S Sep10 0:00 dbus-broker -log 4 -controller 11 -machine-id 16355057c7274843823dd747f8e2978b -max-bytes 100000000000000 -max-fds 25000000000000 -max-matches 50000000000 ricardo 1474 0.0 0.3 467744 14132 tty2 sl+ sep10 0:00 /usr/libexec/gnome-session-binary ricardo 1531 0.0 0.1 297456 4280 ? Ssl Sep10 0:00 /usr/libexec/gnome-session-ctl -monitor ricardo 1532 0.0 0.3 1230908 12920 ? S<sl Sep10 0:01 /usr/bin/pulseaudio -daemonize=no
You can download the full file from here, using this command:
$ curl -o psux.out https://gitlab.com/-/snippets/2013935/raw?inline=false
If you decide to use ps ux output on your system, adjust the values shown in the examples to match your results
.
Next, let’s use awk to view the sample file data.
Basic usage
A basic awk program consists of a pattern followed by an action enclosed in braces. You can provide a program to the online awk utility by enclosing it in single quotes, like this:
$awk ‘pattern { action }’
awk processes the input data (standard input or file) line by line, executing the given action for each line or record that matches the pattern. If the pattern is omitted, awk executes the action on all records. An action can be as simple as printing data from the line or as complex as an entire program. For example, to print all lines of the sample file, use this command:
$ awk ‘{ print }’ psux.out USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ricardo 1446 0.0 0.2 21644 11536 ? Ss Sep10 0:00 /usr/lib/systemd/systemd -user …. TRUNCATED EXIT….
While this example isn’t really useful, it illustrates the basic use of the awk command.
If you are using the ps ux command on your
machine, you can pipe your output directly to awk, instead of providing the input file name: $ ps ux
| awk ‘{ print }’
Next, let’s use awk column processing capabilities to extract some of the data from the sample file
.
The
power of awk begins to become apparent when you use its column processing features. AWK automatically divides each line (or record) into fields. By default, it uses the space character to separate each field, but you can change it by providing the -F command-line parameter followed by the desired separator.
After splitting, awk assigns each field to a numbered variable, starting with the $ character. For example, the first field is $1, the second $2, and so on. The special variable $0 contains the entire record before splitting it.
By using field variables, you can extract data from the input. For example, to print only the command name of the sample file, use the $11 variable because the command name is the eleventh column of each line:
$awk ‘{ print $11 }’ psux.out COMMAND /usr/lib/systemd/systemd (sd-pam) /usr/bin/gnome-keyring-daemon…. TRUNCATED EXIT….
You can also print multiple fields by separating them with commas. For example, to print the command name and CPU utilization in column three, use this command:
$ awk ‘{ print $11, $3 }’ psux.out COMMAND %CPU /usr/lib/systemd/systemd 0.0 (sd-pam) 0.0 /usr/bin/gnome-keyring-daemon 0.0 …. TRUNCATED EXIT….
Finally, use the built-in printf function to format the output and align the columns. Provide a 40-character padding to the right of the first few columns to accommodate longer command names:
$ awk ‘{ printf(“%-40s %sn”, $11, $3) }’ psux.out COMMAND %CPU /usr/lib/systemd/systemd 0.0 (sd-pam) 0.0 /usr/bin/gnome-keyring-daemon 0.0 /usr/libexec/gdm-wayland-session 0.0 …. TRUNCATED EXIT….
Now that you can manipulate and extract individual fields from each record, let’s apply the pattern function to filter the records.
[ You may also be interested in: Manipulating text on the command line with thirst ]
Pattern
matching
In addition to manipulating fields, awk allows you to filter which records to execute actions on through a powerful pattern matching feature. In its most basic use, provide a regular expression enclosed by forward slash/characters to match records. For example, to filter for records that match
firefox, use /firefox/: $ awk ‘/firefox/ { print $11, $3 }’ psux.’ out /usr/lib64/firefox/firefox 66.2 /usr/lib64/firefox/firefox 8.3 /usr/lib64/firefox/firefox 15.6 /usr/lib64/firefox/firefox 9.0 /usr/lib64/firefox/firefox 31.5 /usr/lib64/firefox/firefox 20.6 /usr/lib64/firefox/firefox 31.0 /usr/lib64/firefox/firefox 0.0 /usr/lib64/firefox/firefox 0.0 /usr/lib64/firefox/firefox 0.0 /usr/lib64/firefox/firefox 0.0 /usr/lib64/ firefox/ firefox 0.0 /usr/lib64/firefox/firefox 0.0
You can also use fields and a comparison expression as pattern matching criteria. For example, to print process data that matches PID 6685, compare the $2 field, like this:
$awk ‘$2==6685 { print $11, $3 }’ psux.out /usr/lib64/firefox/firefox 0.0
awk is smart enough to understand numeric fields, allowing you to use relative comparisons such as greater or less than. For example, to display all processes that use more than 5% CPU, use
$3 > 5: $ awk ‘$3 > 5 { print $11, $3 }’ psux.out /usr/bin/gnome-shell 5.1 /usr/lib64/firefox/firefox 66.2 /usr/lib64/firefox/firefox 8.3 /usr/lib64/firefox/firefox 15.6 /usr/lib64/firefox/firefox 9.0 /usr/lib64/firefox/firefox 31.5 /usr/lib64/firefox/firefox 20.6 /usr/lib64/firefox/firefox 31.0
You can combine patterns with operators. For example, to show all processes that match firefox and use more than 5% CPU, combine both patterns with the && operator for a logical AND:
$ awk ‘/firefox/ && $3 > 5 { print $11, $3 }’ psux.out /usr/lib64/firefox/firefox 66.2 /usr/lib64/firefox/firefox 8.3 /usr/lib64/firefox/firefox 15.6 /usr/lib64/firefox/firefox 9.0 /usr/lib64/firefox/firefox 31.5 /usr/lib64/firefox/firefox 20.6 /usr/lib64/firefox/firefox 31.0
Finally, because you are using pattern matching, awk no longer prints the header line. You can add your own header line using the BEGIN pattern to execute a single action before processing any record:
$ awk ‘BEGIN { printf(“%-26s %sn”, “Command”, “CPU%”)} $3 > 10 { print $11, $3 }’ psux.out CPU% command /usr/lib64/firefox/firefox 66.2 /usr/lib64/firefox/firefox 15.6 /usr/lib64/firefox/firefox 31.5 /usr/lib64/firefox/firefox 20.6 /usr/lib64/firefox/firefox 31.0
Let’s manipulate the data into individual fields.
Field manipulation
As we discussed in the previous section, awk understands numeric fields. This allows you to perform data manipulation, including numerical calculations. For example, consider printing memory utilization in column six for all
firefox processes: $ awk ‘/firefox/ { print $11, $6 }’ psux.out /usr/lib64/firefox/firefox 301212 /usr/lib64/firefox/firefox 118220 /usr/lib64/firefox/firefox 168468 /usr/lib64/firefox/firefox 101520 /usr/lib64/firefox/firefox 194336 /usr/lib64/firefox/firefox 111864 /usr/lib64/firefox/firefox 163440 /usr/lib64/firefox/firefox 38496 /usr/lib64/firefox/firefox 174636 /usr/lib64/firefox/firefox 37264 /usr/lib64/firefox/firefox 30608 /usr/lib64/firefox/firefox 174636 /usr/lib64/firefox/firefox 174660
The ps ux command displays memory utilization in kilobytes, which is difficult to read. Let’s convert it to Megabytes by diving the value of the field by
1024: $ awk ‘/firefox/ { print $11, $6/1024 }’ psux.. out /usr/lib64/firefox/firefox 294.152 /usr/lib64/firefox/firefox 115.449 /usr/lib64/firefox/firefox 164.52 /usr/lib64/firefox/firefox 99.1406 /usr/lib64/firefox/firefox 189.781 /usr/lib64/firefox/firefox 109.242 /usr/lib64/firefox/firefox 159.609 /usr/lib64/firefox/firefox 37.5938 /usr/lib64/firefox/firefox 170.543 /usr/lib64/firefox/firefox 36.3906 /usr/lib64/firefox/ firefox 29.8906 /usr/lib64/firefox/firefox 170.543 /usr/lib64/firefox/firefox 170.566
You can also round the numbers up and add the MB suffix using printf to improve readability:
$ awk ‘/firefox/ { printf(“%s %4.0f MBn”, $11, $6/1024) }’ psux.out /usr/lib64/firefox/firefox 294 MB /usr/lib64/firefox/firefox 115 MB /usr/lib64/firefox/firefox 165 MB /usr/lib64/firefox/firefox 99 MB /usr/lib64/firefox/firefox 190 MB /usr/lib64/firefox/firefox 109 MB /usr/ lib64/firefox/firefox 160 MB /usr/lib64/firefox/firefox 38 MB /usr/lib64/firefox/firefox 171 MB /usr/lib64/firefox/firefox 36 MB /usr/lib64/firefox/firefox 30 MB /usr/lib64/firefox/firefox 171 MB /usr/lib64/firefox/firefox 171 MB
Finally, combine this idea with the BEGIN and END patterns for more advanced data manipulation. For example, let’s calculate the total memory usage for all Firefox processes by defining a variable sum in the BEGIN action, adding the value of column six $6 for each line that matches firefox with the sum variable, and then printing it with the END action in Megabytes
: $ awk ‘BEGIN { sum=0 } /firefox/ { sum+=$6 } END { printf(“Total Firefox memory: %.0f MBn”, sum/1024) }’ psux.out Firefox total memory: 1747 MB
[ Download now: A System Administrator’s Guide to Bash Scripts. What’s
next?
Gawk is a powerful and flexible tool for processing text data, particularly data organized into columns. This article provided some useful examples of using this tool to extract and manipulate data, but gawk can do much more. For additional information about gawk, refer to the manual pages in your Linux distribution.
The Awk language has many more resources than we explore in this guide. For detailed information, see the official GNU Awk User Guide.