bash 来自邮件日志的唯一 IP 地址的 awk 解析

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4200392/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:55:59  来源:igfitidea点击:

Awk parsing of unique IP addresses from maillog

regexbashawk

提问by f10bit

Yesterday I asked a question here about a onelinerand mjschultzgave me an answer that I instantly fell in love with :) Awk just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

昨天我在这里问了一个关于oneliner的问题,mjschultz给了我一个我立刻爱上的答案:) Awk 刚刚破坏了手头的任务,在几秒钟内解析了一个大日志文件(500+ MB)。现在我正在尝试将我的其他 oneliners 移植到 awk。

This is the one in question:

这是有问题的一个:

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

我需要使用 pop3 连接到邮件服务器的所有唯一 IP 地址的列表。

This is an example log entry:

这是一个示例日志条目:

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

所以我找到了所有包含“pop3”的行,并为“用户登录”部分解析它们。接下来,我使用 egrep 和正则表达式来匹配 IP 地址,并使用 sort 过滤掉重复的地址。

This is what I have so far for my awk version:

到目前为止,这是我的 awk 版本:

awk '/pop3\[.*.User logged in/ {ip[]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

这非常有效,但并非所有日志条目都是相同的,例如有时 IP 会移动到第 8 个字段,如下所示:

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

用 awk 捕获这些条目的最佳方法是什么?

As always thanks for all the great responses in advance, you've taught me so much already :)

一如既往地感谢您提前做出的所有精彩回复,您已经教会了我很多:)

回答by Dr. belisarius

AWK code

AWK代码

just match your ip format ... be careful that there are no other formats ...

只需匹配您的 ip 格式...注意没有其他格式...

/pop3\[.*.User logged in/    {
         where = match(
my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}
,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/) if (where) ip[substr(
awk '/pop3\[.*.User logged in/ {{if (NF == 13) ="";gsub(FS "+",FS)};print }'
/var/log/maillog | awk '!(##代码## in a){a[##代码##];print}'
,RSTART+1,RLENGTH-1)]=0 } END {for (address in ip) { print address} }

running at ideone

ideone上运行

回答by Jonathan Leffler

That looks more like Perl territory than Awk to me:

对我来说,这看起来更像是 Perl 领域而不是 Awk:

##代码##

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.

排序并不完美 - 是字母而不是数字(因此 192.1.168.10 将出现在 9.25.13.26 之前)。当然,这可以解决。

回答by f10bit

After seeing and trying these approaches I got a new idea.

在看到并尝试了这些方法后,我有了一个新想法。

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

belisarius 的代码完成了我的要求,但由于它必须完成所有正则表达式匹配,因此它不是最快的,而速度正是我所追求的。

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

所以我想出了这个,因为你可以看到“有问题的”日志行有一个额外的字段,使它们全部为 13 个字段而不是正常的 12 个字段,所以我只是删除了额外的字段,这给了我正确的 IP 列表地址,接下来我再次使用 awk 删除所有重复条目:

##代码##

Ideone linkif you want to see the code in action

Ideone 链接,如果您想查看正在运行的代码