bash 来自邮件日志的唯一 IP 地址的 awk 解析
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4200392/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Awk parsing of unique IP addresses from maillog
提问by f10bit
Yesterday I asked a question here about a onelinerand mjschultzgave me an answer that I instantly fell in love with :) Awk just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.
昨天我在这里问了一个关于oneliner的问题,mjschultz给了我一个我立刻爱上的答案:) Awk 刚刚破坏了手头的任务,在几秒钟内解析了一个大日志文件(500+ MB)。现在我正在尝试将我的其他 oneliners 移植到 awk。
This is the one in question:
这是有问题的一个:
grep "pop3\[" maillog | grep "User logged in" |
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u
I need the list of all unique IP addresses using pop3 to connect to the mail server.
我需要使用 pop3 连接到邮件服务器的所有唯一 IP 地址的列表。
This is an example log entry:
这是一个示例日志条目:
Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext
User logged in
So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.
所以我找到了所有包含“pop3”的行,并为“用户登录”部分解析它们。接下来,我使用 egrep 和正则表达式来匹配 IP 地址,并使用 sort 过滤掉重复的地址。
This is what I have so far for my awk version:
到目前为止,这是我的 awk 版本:
awk '/pop3\[.*.User logged in/ {ip[]=0} END {for (address in ip)
{ print address} }' maillog
This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:
这非常有效,但并非所有日志条目都是相同的,例如有时 IP 会移动到第 8 个字段,如下所示:
Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]
username plaintext User logged in
What would be the best way to catch those entries with awk as well?
用 awk 捕获这些条目的最佳方法是什么?
As always thanks for all the great responses in advance, you've taught me so much already :)
一如既往地感谢您提前做出的所有精彩回复,您已经教会了我很多:)
回答by Dr. belisarius
AWK code
AWK代码
just match your ip format ... be careful that there are no other formats ...
只需匹配您的 ip 格式...注意没有其他格式...
/pop3\[.*.User logged in/ {
where = match(my %ip_addresses = ();
while (<>)
{
next unless m/pop3\[/;
next unless m/User logged in/;
if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
{
$ip_addresses{$ip} = 1;
}
}
foreach my $ip (sort keys %ip_addresses)
{
print "$ip\n";
}
,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
if (where)
ip[substr(awk '/pop3\[.*.User logged in/ {{if (NF == 13) ="";gsub(FS "+",FS)};print }'
/var/log/maillog | awk '!(##代码## in a){a[##代码##];print}'
,RSTART+1,RLENGTH-1)]=0
}
END {for (address in ip)
{ print address} }
running at ideone
在ideone上运行
回答by Jonathan Leffler
That looks more like Perl territory than Awk to me:
对我来说,这看起来更像是 Perl 领域而不是 Awk:
##代码##The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.
排序并不完美 - 是字母而不是数字(因此 192.1.168.10 将出现在 9.25.13.26 之前)。当然,这可以解决。
回答by f10bit
After seeing and trying these approaches I got a new idea.
在看到并尝试了这些方法后,我有了一个新想法。
belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.
belisarius 的代码完成了我的要求,但由于它必须完成所有正则表达式匹配,因此它不是最快的,而速度正是我所追求的。
So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:
所以我想出了这个,因为你可以看到“有问题的”日志行有一个额外的字段,使它们全部为 13 个字段而不是正常的 12 个字段,所以我只是删除了额外的字段,这给了我正确的 IP 列表地址,接下来我再次使用 awk 删除所有重复条目:
##代码##Ideone linkif you want to see the code in action
Ideone 链接,如果您想查看正在运行的代码

