Skip to content

Cheat sheet of awk

homepage-banner

awk is a powerful tool for processing text files. It is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

What is Awk?

Awk is a programming language that is used for processing and manipulating text files. It is a scripting language that is mostly used for data extraction and reporting purposes. Awk is a pattern scanning and processing language that can be used for a wide range of data processing tasks. Awk reads input files line by line and performs actions based on the patterns it finds in the input text.

Applications of Awk in Linux

Awk is a powerful tool that is widely used in Linux systems. It can be used for various data processing tasks, such as filtering, formatting, and analyzing data. Here are some of the common applications of Awk in Linux:

Data Extraction

Awk can be used to extract specific data from text files. It can search for a specific pattern in a file and extract the data that matches that pattern. For example, if you have a log file with a lot of data, you can use Awk to extract only the data that you need.

Text Formatting

Awk can be used to format text files. It can be used to add or remove spaces, change the case of the text, and perform other formatting tasks. For example, you can use Awk to format a CSV file and remove any unnecessary spaces or characters.

Data Analysis

Awk can also be used for data analysis tasks. It can be used to calculate averages, sum up values, and perform other mathematical operations on data. For example, if you have a file with sales data, you can use Awk to calculate the total sales for each month.

awk [options] ' PATTERN { action } ' file1, file2, ...

Usage

awk built-in variables

FS: field separator, input column separator (the character that separates two columns), default is whitespace;
RS: Record Separator, input line separator (the character that separates lines), default is newline character;
OFS: Output Field Separator, output column separator;
ORS: Output Row Separator, output line separator;

NR: The number of input records, the number of rows processed by awk (counted in different file orders);
FNR: The number of rows processed by awk when processing the current file (each file counts itself);
NF: Number of field, the total number of fields (total number of columns) in the current row;

For example, suppose there is a text file named test.txt with the following content:

this is a test mail
this is the second line.

1. Output the entire text

awk '{ print $0 }' test.txt

2. Output the second and third columns of the text

awk '{ print $2, $3 }' test.txt

Result:

is a
is the

3. Specify the output delimiter as “##” and output the first three columns

awk 'BEGIN{OFS="##"}{print $1, $2, $3}' test.txt

Result:

this##is##a
this##is##the

4. Specify the output delimiter as “:”, and output the first two columns

awk 'BEGIN{OFS=":"}{print $1, $2}' test.txt

Result:

this:is
this:is

5. Insert specified text during output, such as inserting “hello”

awk 'BEGIN{OFS=":"}{print $1, "hello", $2}' test.txt

Result:

this:hello:is
this:hello:is

6. Output text directly

awk 'BEGIN{ print "line one\nline two\nline three" }'

Result:

line one
line two
line three

7. Count the number of columns in each row

awk '{ print NF }' test.txt

Result:

5
5

8. Output the last column of the file

awk '{ print $NF }' test.txt

Result:

mail
line.

9. Output the penultimate column of the file

awk '{ print $(NF-1) }' test.txt

Result:

test
second

10. Define variables in awk command -v variable=”variable value”

awk -v variable="hello world" 'BEGIN{print variable}'
awk 'BEGIN{variable="hello world"; print variable}'

Result:

hello world

11. Use printf to display output

printf format, item1, item2, ...

The format includes:

%c: Display character ASCII code
%d, %i: Decimal integer
%e, %E: Scientific notation
%f: Floating point number
%g, %G: Scientific notation or floating point number
%s: string
%u: Unsigned number
%%: Display % itself

For example:

awk '{printf "%10s\n", $3}' test.txt

Result:

         a
       the

12. Assignment operator, arithmetic operator, string operator

Assignment: = += -= = /= %= ^= *= Arithmetic: -x +x x^y x*y x/y x-y x+y x%y Comparison: x < y x>=y x!=y x~y (pattern matching) Conditional expression: selector? if-true-exp: if-false-exp Logical: && ||

For example, find usernames starting with the letter “r” in /etc/passwd

awk -F: '/^r/{print $1}' /etc/passwd
awk 'BEGIN{FS=":"}/^r/{print $1}' /etc/passwd

Result:

root

13. Display usernames with uid greater than 500 (non-system users)

awk -F: '$3>=500{print $1, $3}' /etc/passwd

Result:

cactiuser 500
test001 501
test002 502

14. Display users whose default login script is bash (pattern matching)

awk -F: '$7~"bash$"{print $1, $7}' /etc/passwd

Result:

root /bin/bash
mysql /bin/bash
cactiuser /bin/bash
test001 /bin/bash
test002 /bin/bash
  1. Run once before/after awk execution, BEGIN/END

For example, add a title

awk 'BEGIN{print "ROW1    ROW2    ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}' test.txt

Result:

ROW1    ROW2    ROW3
this    is      a
this    is      the

Then add ending information

awk 'BEGIN{print "ROW1    ROW2    ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}END{print "date:--/--/--"}' test.txt

Result:

ROW1    ROW2    ROW3
this    is      a
this    is      the
date:--/--/--

16. Control statements (if-else)

For example, if the user is root, return “Admin”; otherwise, return “Common User”

awk -F: '{if ($1=="root") print $1, ": admin"; else print $1, ": Common User"}' /etc/passwd

Result:

root : admin
bin : Common User
daemon : Common User
adm : Common User
lp : Common User

Another example, count the number of non-system users with uid greater than 500

awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd

The result is the number of non-system users

17. Loop through columns using while

For example, print all columns with more than 4 characters in the test.txt file

awk '{i=1; while (i<=NF) { if (length($i)>=4) {print $i}; i++}}' test.txt
awk '{for(i=1;i<NF;i++){ if (length($i)>=4) {print $i} }}' test.txt

Result:

this
test
mail
this
second
line.

18. Use “next” to end processing of the current line and move on to the next

19. Arrays

The awk array index starts at 1 and can be any string.

For example, count the number of users corresponding to different login shells

awk -F: '{shell[$NF]++}END{for(A in shell) {print A, shell[A]}}' /etc/passwd

Result:

/sbin/shutdown 1
/bin/bash 5
/sbin/nologin 25
/sbin/halt 1
/bin/sync 1

Another example, count the number of entries with LISTEN and ESTABLISHED states in TCP network connections

netstat -tan | awk '/^tcp/{STATE[$NF]++}END{for(A in STATE) {print A, STATE[A]}}'

Result:

ESTABLISHED 2
LISTEN 5

Another example, count the number of visits from each IP address in the ngiinx log, and sort them in descending order

awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' /usr/local/nginx/logs/access.log | sort -n -k3 -r

The result is similar to:

185.130.5.224 : 15
103.41.53.252 : 13
120.26.55.211 : 10
120.26.207.203 : 8
23.251.63.45 : 6
45.79.204.72 : 5
169.229.3.91 : 5
120.26.227.63 : 5
121.42.0.35 : 4
58.96.181.111 : 3
189.141.160.11 : 3
123.56.233.103 : 3

20. More complex example

#!/bin/bash
awk '{ if ($4 ~ /Jul\/2016:15:/) print $0 }' /data/log/access.log > ~/access-1.log
awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' access-1.log |sort -nrk3 | head -3| awk '{print $1}' > IP3.txt
IP=$(cat 'IP3.txt')
for i in $(seq 0 1 2); do
  awk '
    BEGIN{
      FS=" ";
      ip="'${IP[$i]}'";
    }
    {
      if($1==ip) c_ip[$7]++
    }
    END{
      print ip;
      for(url in c_ip) { print url, c_ip[url] |"sort -nrk2 |head -3"};
    }' access-1.log
  echo "--------------------"
done;

Conclusion

Awk is a powerful tool that is widely used in Linux systems. It is a versatile programming language that can be used for various data processing tasks. Awk can be used for data extraction, text formatting, and data analysis. It is a must-know tool for any Linux user who works with data.

Reference

  • https://www.gnu.org/software/gawk/manual/gawk.html
  • https://man.linuxde.net/awk
  • https://github.com/TheMozg/awk-raycaster
Leave your message