awk 介绍
awk是一种使用方便且表现力很强的编程语言,它可以应用在多种不同的计算与数据处理任务中。 每一个awk程序都是由一个或多个 模式–动作 语句组成的序列:awk pattern {action}
awk内建变量
变量 | 意义 | 默认值 |
---|---|---|
ARGC | 命令行参数的个数 | - |
ARGV | 命令行参数数组 | - |
FILENAME | 当前输入文件名 | - |
FNR | 当前输入文件的记录个数 | - |
FS | 控制着输入行的字段分隔符 | “ ” |
NF | 当前记录的字段个数 | - |
NR | 到目前为止读的记录数量 | - |
OFMT | 数值的输出格式 | “%.6g” |
OFS | 输出字段分隔符 | “ ” |
ORS | 输出的记录的分隔符 | “\n” |
RLENGTH | 被函数match匹配的字符串的长度 | - |
RS | 控制着输入行的记录分隔符 | “\n” |
RSTART | 被函数match匹配的字符串的开始 | |
SUBSEP | 小标分割符 | “\034” |
awk格式化输出
[aidu35@aidu35 awk]$ cat 1.txt
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
[aidu35@aidu35 awk]$ awk ' $3 > 0 {print "total pay for", $1, "is", $2*$3}' 1.txt
total pay for Kathy is 40
total pay for Mark is 100
total pay for Mary is 121
total pay for Susie is 76.5
使用printf格式化awk输出
[aidu35@aidu35 awk]$ awk '$3 > 0 {printf("total pay for %s is %.2f\n",$1,$2*$3 )}' 1.txt
total pay for Kathy is 40.00
total pay for Mark is 100.00
total pay for Mary is 121.00
total pay for Susie is 76.50
printf 不会自动产生空格或换行符, 需要自己显式的加上
结合 sort 对awk 格式化的输出进行排序
[aidu35@aidu35 awk]$ awk '$3 > 0 {printf("%-8s is %6.2f\n",$1,$2*$3 )}' 1.txt | sort -k 3 -n
Kathy is 40.00
Susie is 76.50
Mark is 100.00
Mary is 121.00
awk模式匹配
[aidu35@aidu35 awk]$ awk '$1 ~ /Sus/ {print $0}' 1.txt
Susie 4.25 18
awk BEGIN/END
特殊的模式 BEGIN 在第一个输入文件的第一行之前被匹配, END 在最后一个输入文件的最后一行 被处理之后匹配
[aidu35@aidu35 awk]$ awk 'BEGIN {print "NAME RATE HOURS";print ""}{print }END {print "DONE"}' 1.txt
NAME RATE HOURS
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
DONE
awk 计算
[aidu35@aidu35 awk]$ awk '{pay = pay+$2*$3}END {print "total pay is", pay, "average pay is", pay/NR}' 1.txt
total pay is 337.5 average pay is 56.25
awk变量作为数值使用时,默认初始值为0,作为字符串时默认值为空字符串,不需要进行初始化.
[aidu35@aidu35 awk]$ awk '$3 > 15 {emp = emp+1}END {print emp, "employees worded more than 15 hours"}' 1.txt
3 employees worded more than 15 hours
[aidu35@aidu35 awk]$ awk '{names = names $1 " "}END {print names}' 1.txt
Beth Dan Kathy Mark Mary Susie
awk 流程控制
if/else/while/for
awk提供了用于决策的if-else语句, 以及循环语句, 只能用在action里.
[aidu35@aidu35 awk]$ awk '{for(i=0;i<$2;i=i+1) if(i==4){print $0, count} else{ count = count + 1} count = 0}' 1.txt
Mark 5.00 20 4
Mary 5.50 22 4
Susie 4.25 18 4
awk数组
awk数组用来存储一组相关的值
借助数组统计次数
[aidu35@aidu35 awk]$ cat 2.txt
1
2
3
4
5
1
2
4
5
7
[aidu35@aidu35 awk]$ awk '{count[$1]++}END{for(i in count) {printf( "%d appears %d times\n", i,count[i])}}' 2.txt | sort -n
1 appears 2 times
2 appears 2 times
3 appears 1 times
4 appears 2 times
5 appears 2 times
7 appears 1 times
搭配模式匹配
[aidu35@aidu35 awk]$ cat countries
USSR 8649 275 Asia
Canada 3852 25 North America
China 3705 1032 Asia
USA 3615 237 North America
Brazil 3286 134 South America
India 1267 746 Asia
Mexico 762 78 North America
France 211 55 Europe
Japan 144 120 Asia
Germany 96 61 Europe
England 94 56 Europe
[aidu35@aidu35 awk]$ awk '$4 ~ /Asia/ {pop["Asia"] += $3}; $4 ~ /Europe/ {pop["Europe"] += $3} END {print "Asian population is", pop["Asia"], "million"; print "European population is", pop["Europe"], "million"}' countries
Asian population is 2173 million
European population is 172 million
微信扫一扫,订阅我的博客动态^_^