bash 来自邮件日志的唯一 IP 地址的 awk 解析

Question

提问by f10bit

Yesterday I asked a question here about a onelinerand mjschultzgave me an answer that I instantly fell in love with :) Awk just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.

昨天我在这里问了一个关于oneliner的问题，mjschultz给了我一个我立刻爱上的答案:) Awk 刚刚破坏了手头的任务，在几秒钟内解析了一个大日志文件（500+ MB）。现在我正在尝试将我的其他 oneliners 移植到 awk。

This is the one in question:

这是有问题的一个：

grep "pop3\[" maillog | grep "User logged in" |  
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u

I need the list of all unique IP addresses using pop3 to connect to the mail server.

我需要使用 pop3 连接到邮件服务器的所有唯一 IP 地址的列表。

This is an example log entry:

这是一个示例日志条目：

Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext  
User logged in

So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.

所以我找到了所有包含“pop3”的行，并为“用户登录”部分解析它们。接下来，我使用 egrep 和正则表达式来匹配 IP 地址，并使用 sort 过滤掉重复的地址。

This is what I have so far for my awk version:

到目前为止，这是我的 awk 版本：

awk '/pop3\[.*.User logged in/ {ip[]=0} END {for (address in ip)  
{ print address} }' maillog

This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:

这非常有效，但并非所有日志条目都是相同的，例如有时 IP 会移动到第 8 个字段，如下所示：

Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]  
username plaintext User logged in

What would be the best way to catch those entries with awk as well?

用 awk 捕获这些条目的最佳方法是什么？

As always thanks for all the great responses in advance, you've taught me so much already :)

一如既往地感谢您提前做出的所有精彩回复，您已经教会了我很多:)

Answer 1

回答by Dr. belisarius

AWK code

AWK代码

just match your ip format ... be careful that there are no other formats ...

只需匹配您的 ip 格式...注意没有其他格式...

/pop3\[.*.User logged in/    {
         where = match(my %ip_addresses = ();
while (<>)
{
    next unless m/pop3\[/;
    next unless m/User logged in/;
    if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
    {
         $ip_addresses{$ip} = 1;
    }
}
foreach my $ip (sort keys %ip_addresses)
{
    print "$ip\n";
}
,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
         if (where)
           ip[substr(awk '/pop3\[.*.User logged in/ {{if (NF == 13) ="";gsub(FS "+",FS)};print }'
/var/log/maillog | awk '!(##代码## in a){a[##代码##];print}'
,RSTART+1,RLENGTH-1)]=0
} 

END {for (address in ip)  
{ print address} }

running at ideone

在ideone上运行

Answer 2

回答by Jonathan Leffler

That looks more like Perl territory than Awk to me:

对我来说，这看起来更像是 Perl 领域而不是 Awk：

##代码##

The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.

排序并不完美 - 是字母而不是数字（因此 192.1.168.10 将出现在 9.25.13.26 之前）。当然，这可以解决。

Answer 3

回答by f10bit

After seeing and trying these approaches I got a new idea.

在看到并尝试了这些方法后，我有了一个新想法。

belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.

belisarius 的代码完成了我的要求，但由于它必须完成所有正则表达式匹配，因此它不是最快的，而速度正是我所追求的。

So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:

所以我想出了这个，因为你可以看到“有问题的”日志行有一个额外的字段，使它们全部为 13 个字段而不是正常的 12 个字段，所以我只是删除了额外的字段，这给了我正确的 IP 列表地址，接下来我再次使用 awk 删除所有重复条目：

##代码##

Ideone linkif you want to see the code in action

Ideone 链接，如果您想查看正在运行的代码

bash 来自邮件日志的唯一 IP 地址的 awk 解析

提问by f10bit

回答by Dr. belisarius

回答by Jonathan Leffler

回答by f10bit

相关推荐

最近更新

标签

bash 来自邮件日志的唯一 IP 地址的 awk 解析

提问by f10bit

回答by Dr. belisarius

回答by Jonathan Leffler

回答by f10bit

相关推荐

bash 如何重定向写入 tty 的程序？

AWK 输出到 bash 数组

bash 从令牌列表中生成所有可能的字符串

bash git stderr 输出无法通过管道传输

相关推荐

最近更新

标签