awk command cheat sheet
awk
is a powerful tool for processing text files. It is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.
What is Awk?
Awk is a programming language that is used for processing and manipulating text files. It is a scripting language that is mostly used for data extraction and reporting purposes. Awk is a pattern scanning and processing language that can be used for a wide range of data processing tasks. Awk reads input files line by line and performs actions based on the patterns it finds in the input text.
Applications of Awk in Linux
Awk is a powerful tool that is widely used in Linux systems. It can be used for various data processing tasks, such as filtering, formatting, and analyzing data. Here are some of the common applications of Awk in Linux:
Data Extraction
Awk can be used to extract specific data from text files. It can search for a specific pattern in a file and extract the data that matches that pattern. For example, if you have a log file with a lot of data, you can use Awk to extract only the data that you need.
Text Formatting
Awk can be used to format text files. It can be used to add or remove spaces, change the case of the text, and perform other formatting tasks. For example, you can use Awk to format a CSV file and remove any unnecessary spaces or characters.
Data Analysis
Awk can also be used for data analysis tasks. It can be used to calculate averages, sum up values, and perform other mathematical operations on data. For example, if you have a file with sales data, you can use Awk to calculate the total sales for each month.
awk [options] ' PATTERN { action } ' file1, file2, ...
Usage
awk built-in variables
FS: field separator, input column separator (the character that separates two columns), default is whitespace;
RS: Record Separator, input line separator (the character that separates lines), default is newline character;
OFS: Output Field Separator, output column separator;
ORS: Output Row Separator, output line separator;
NR: The number of input records, the number of rows processed by awk (counted in different file orders);
FNR: The number of rows processed by awk when processing the current file (each file counts itself);
NF: Number of field, the total number of fields (total number of columns) in the current row;
For example, suppose there is a text file named test.txt
with the following content:
this is a test mail
this is the second line.
1. Output the entire text
awk '{ print $0 }' test.txt
2. Output the second and third columns of the text
awk '{ print $2, $3 }' test.txt
Result:
is a
is the
3. Specify the output delimiter as “##” and output the first three columns
awk 'BEGIN{OFS="##"}{print $1, $2, $3}' test.txt
Result:
this##is##a
this##is##the
4. Specify the output delimiter as “:”, and output the first two columns
awk 'BEGIN{OFS=":"}{print $1, $2}' test.txt
Result:
this:is
this:is
5. Insert specified text during output, such as inserting “hello”
awk 'BEGIN{OFS=":"}{print $1, "hello", $2}' test.txt
Result:
this:hello:is
this:hello:is
6. Output text directly
awk 'BEGIN{ print "line one\nline two\nline three" }'
Result:
line one
line two
line three
7. Count the number of columns in each row
awk '{ print NF }' test.txt
Result:
5
5
8. Output the last column of the file
awk '{ print $NF }' test.txt
Result:
mail
line.
9. Output the penultimate column of the file
awk '{ print $(NF-1) }' test.txt
Result:
test
second
10. Define variables in awk command -v variable=”variable value”
awk -v variable="hello world" 'BEGIN{print variable}'
awk 'BEGIN{variable="hello world"; print variable}'
Result:
hello world
11. Use printf to display output
printf format, item1, item2, ...
The format includes:
%c: Display character ASCII code
%d, %i: Decimal integer
%e, %E: Scientific notation
%f: Floating point number
%g, %G: Scientific notation or floating point number
%s: string
%u: Unsigned number
%%: Display % itself
For example:
awk '{printf "%10s\n", $3}' test.txt
Result:
a
the
12. Assignment operator, arithmetic operator, string operator
Assignment: = += -= = /= %= ^= *= Arithmetic: -x +x x^y x*y x/y x-y x+y x%y Comparison: x < y x>=y x!=y x~y (pattern matching) Conditional expression: selector? if-true-exp: if-false-exp Logical: && ||
For example, find usernames starting with the letter “r” in /etc/passwd
awk -F: '/^r/{print $1}' /etc/passwd
awk 'BEGIN{FS=":"}/^r/{print $1}' /etc/passwd
Result:
root
13. Display usernames with uid greater than 500 (non-system users)
awk -F: '$3>=500{print $1, $3}' /etc/passwd
Result:
cactiuser 500
test001 501
test002 502
14. Display users whose default login script is bash (pattern matching)
awk -F: '$7~"bash$"{print $1, $7}' /etc/passwd
Result:
root /bin/bash
mysql /bin/bash
cactiuser /bin/bash
test001 /bin/bash
test002 /bin/bash
- Run once before/after awk execution, BEGIN/END
For example, add a title
awk 'BEGIN{print "ROW1 ROW2 ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}' test.txt
Result:
ROW1 ROW2 ROW3
this is a
this is the
Then add ending information
awk 'BEGIN{print "ROW1 ROW2 ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}END{print "date:--/--/--"}' test.txt
Result:
ROW1 ROW2 ROW3
this is a
this is the
date:--/--/--
16. Control statements (if-else)
For example, if the user is root, return “Admin”; otherwise, return “Common User”
awk -F: '{if ($1=="root") print $1, ": admin"; else print $1, ": Common User"}' /etc/passwd
Result:
root : admin
bin : Common User
daemon : Common User
adm : Common User
lp : Common User
Another example, count the number of non-system users with uid greater than 500
awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd
The result is the number of non-system users
17. Loop through columns using while
For example, print all columns with more than 4 characters in the test.txt file
awk '{i=1; while (i<=NF) { if (length($i)>=4) {print $i}; i++}}' test.txt
awk '{for(i=1;i<NF;i++){ if (length($i)>=4) {print $i} }}' test.txt
Result:
this
test
mail
this
second
line.
18. Use “next” to end processing of the current line and move on to the next
19. Arrays
The awk array index starts at 1 and can be any string.
For example, count the number of users corresponding to different login shells
awk -F: '{shell[$NF]++}END{for(A in shell) {print A, shell[A]}}' /etc/passwd
Result:
/sbin/shutdown 1
/bin/bash 5
/sbin/nologin 25
/sbin/halt 1
/bin/sync 1
Another example, count the number of entries with LISTEN and ESTABLISHED states in TCP network connections
netstat -tan | awk '/^tcp/{STATE[$NF]++}END{for(A in STATE) {print A, STATE[A]}}'
Result:
ESTABLISHED 2
LISTEN 5
Another example, count the number of visits from each IP address in the ngiinx log, and sort them in descending order
awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' /usr/local/nginx/logs/access.log | sort -n -k3 -r
The result is similar to:
185.130.5.224 : 15
103.41.53.252 : 13
120.26.55.211 : 10
120.26.207.203 : 8
23.251.63.45 : 6
45.79.204.72 : 5
169.229.3.91 : 5
120.26.227.63 : 5
121.42.0.35 : 4
58.96.181.111 : 3
189.141.160.11 : 3
123.56.233.103 : 3
20. More complex example
#!/bin/bash
awk '{ if ($4 ~ /Jul\/2016:15:/) print $0 }' /data/log/access.log > ~/access-1.log
awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' access-1.log |sort -nrk3 | head -3| awk '{print $1}' > IP3.txt
IP=$(cat 'IP3.txt')
for i in $(seq 0 1 2); do
awk '
BEGIN{
FS=" ";
ip="'${IP[$i]}'";
}
{
if($1==ip) c_ip[$7]++
}
END{
print ip;
for(url in c_ip) { print url, c_ip[url] |"sort -nrk2 |head -3"};
}' access-1.log
echo "--------------------"
done;
Conclusion
Awk is a powerful tool that is widely used in Linux systems. It is a versatile programming language that can be used for various data processing tasks. Awk can be used for data extraction, text formatting, and data analysis. It is a must-know tool for any Linux user who works with data.
Reference
https://www.gnu.org/software/gawk/manual/gawk.html
https://man.linuxde.net/awk
Small world. Big idea!
- Welcome to visit the knowledge base of SRE and DevOps!
- License under CC BY-NC 4.0
- No personal information is collected
- Made with Material for MkDocs and generative AI tools
- Copyright issue feedback me#imzye.com, replace # with @
- Get latest SRE news and discuss on Discord Channel