Skip to content

The most useful usage of awk

awk is a powerful tool for processing text files. It is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.

awk [options] ' PATTERN { action } ' file1, file2, ...

awk内置变量

FS:field separator,输入列分隔符(哪个字符分割两个列),默认是空白字符;
RS:Record Separator,输入换行符(哪个字符分割行),默认是换行符;
OFS:Output Field Separator,输出列分隔符;
ORS:Output Row Separator,输出行分隔符;

NR:The number of input records,awk所处理的行数(不同文件顺序计数);
FNR:awk处理当前文件时所处理的行数(不同文件各自计数);
NF:Number of field,当前行的字段总数(总列数);

例如现有一个文本文件,名为test.txt,内容为

this is a test mail
this is the second line.

1、输出整段文本

awk '{ print $0 }' test.txt

2、输出文本的第二列和第三列

awk '{ print $2, $3 }' test.txt

结果:

is a
is the

3、指定输出的分隔符为“##”,输出前三列

awk 'BEGIN{OFS="##"}{print $1, $2, $3}' test.txt

结果为:

this##is##a
this##is##the

4、指定输出分隔符“:”,输出前两列

awk 'BEGIN{OFS=":"}{print $1, $2}' test.txt

结果:

this:is
this:is

5、输出时插入指定文本,例如插入hello

awk 'BEGIN{OFS=":"}{print $1, "hello", $2}' test.txt

结果:

this:hello:is
this:hello:is

6、直接输出文字

awk 'BEGIN{ print "line one\nline two\nline three" }'

结果:

line one
line two
line three

7、统计每行有多少列

awk '{ print NF }' test.txt

结果:

5
5

8、输出文件的最后一列

awk '{ print $NF }' test.txt

结果:

mail
line.

9、输出文件倒数第二列

awk '{ print $(NF-1) }' test.txt

结果:

test
second

10、awk命令中定义变量 -v 变量名=”变量值”

awk -v variable="hello world" 'BEGIN{print variable}'
awk 'BEGIN{variable="hello world"; print variable}'

结果:

hello world

11、printf显示输出

printf format, item1, item2, ...

format包括:

%c:显示字符ASCII码
%d,%i:十进制整数
%e,%E:科学计数法
%f:浮点数
%g,%G:科学计数或浮点数
%s:字符串
%u:无符号数
%%:显示%自身

例如:

awk '{printf "%10s\n", $3}' test.txt

结果:

         a
       the

12、赋值操作符、算术操作符、字符串操作符

赋值:=  +=  -=  *=  /=  %=  ^=  **=
算术:-x  +x  x^y  x*y  x/y  x-y  x+y  x%y
比较:x < y  x>=y  x!=y  x~y(模式匹配)
条件表达式:selector?if-true-exp:if-false-exp
逻辑:&&  ||

如,查找/etc/passwd下以字母r开头的用户名

awk -F: '/^r/{print $1}' /etc/passwd
awk 'BEGIN{FS=":"}/^r/{print $1}' /etc/passwd

结果:

root

13、显示uid大于500的用户名(非系统用户)

awk -F: '$3>=500{print $1, $3}' /etc/passwd

结果:

cactiuser 500
test001 501
test002 502

14、显示默认登录脚本为bash的用户(模式匹配)

awk -F: '$7~"bash$"{print $1, $7}' /etc/passwd

结果:

root /bin/bash
mysql /bin/bash
cactiuser /bin/bash
test001 /bin/bash
test002 /bin/bash

15、awk执行前/执行后运行一次,BEGIN/END

例如,加入title

awk 'BEGIN{print "ROW1    ROW2    ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}' test.txt

结果:

ROW1    ROW2    ROW3
this    is      a
this    is      the

在加入结尾信息

 awk 'BEGIN{print "ROW1    ROW2    ROW3"}{printf "%-8s%-8s%-8s\n", $1, $2, $3}END{print "date:--/--/--"}' test.txt

结果:

ROW1    ROW2    ROW3
this    is      a
this    is      the
date:--/--/--

16、控制语句(if-else)

例如,判断是root就返回Admin,其他用户就返回Common User

awk -F: '{if ($1=="root") print $1, ": admin"; else print $1, ": Common User"}' /etc/passwd

结果:

root : admin
bin : Common User
daemon : Common User
adm : Common User
lp : Common User

再如,统计一共有多少个uid大于500的用户(非系统用户)

awk -F: -v sum=0 '{if ($3>=500) sum++}END{print sum}' /etc/passwd

结果为非系统用户个数

17、while循环列

比如,test.txt文件中,打印出所有大于4个字符的列

awk '{i=1; while (i<=NF) { if (length($i)>=4) {print $i}; i++}}' test.txt
awk '{for(i=1;i<NF;i++){ if (length($i)>=4) {print $i} }}' test.txt

结果

this
test
mail
this
second
line.

18、next 结束本行处理,进入下一行处理

19、数组

awk数组下标从1开始,并可以是任意字符串。

例如,统计不同登录shell对应的用户数量

awk -F: '{shell[$NF]++}END{for(A in shell) {print A, shell[A]}}' /etc/passwd

结果:

/sbin/shutdown 1
/bin/bash 5
/sbin/nologin 25
/sbin/halt 1
/bin/sync 1

再如,统计TCP网络连接中,状态是LISTEN和ESTABLISHED的条目数量

netstat -tan | awk '/^tcp/{STATE[$NF]++}END{for(A in STATE) {print A, STATE[A]}}'

结果:

ESTABLISHED 2
LISTEN 5

再如,统计ngiinx日志中ip地址访问次数,并按由多到少排序

 awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' /usr/local/nginx/logs/access.log | sort -n -k3 -r

结果类似:

185.130.5.224 : 15
103.41.53.252 : 13
120.26.55.211 : 10
120.26.207.203 : 8
23.251.63.45 : 6
45.79.204.72 : 5
169.229.3.91 : 5
120.26.227.63 : 5
121.42.0.35 : 4
58.96.181.111 : 3
189.141.160.11 : 3
123.56.233.103 : 3

20、复杂一点的例子

#!/bin/bash
awk '{ if ($4 ~ /Jul\/2016:15:/) print $0 }' /data/log/access.log > ~/access-1.log
awk '{count[$1]++}; END{ for(ip in count) print ip,": " count[ip]}' access-1.log |sort -nrk3 | head -3| awk '{print $1}' > IP3.txt
IP=$(cat 'IP3.txt')
for i in $(seq 0 1 2); do
  awk '
    BEGIN{
      FS=" ";
      ip="'${IP[$i]}'";
    }
    {
      if($1==ip) c_ip[$7]++
    }
    END{
      print ip;
      for(url in c_ip) { print url, c_ip[url] |"sort -nrk2 |head -3"};
    }' access-1.log
  echo "--------------------"
done;

Reference

  • https://www.gnu.org/software/gawk/manual/gawk.html
  • https://man.linuxde.net/awk

Disclaimer
  1. License under CC BY-NC 4.0
  2. Copyright issue feedback me#imzye.me, replace # with @
  3. Not all the commands and scripts are tested in production environment, use at your own risk
  4. No privacy information is collected here
Try iOS App