bash 从 Apache 日志中对 uniq IP 地址进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18682308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort uniq IP address in from Apache log
提问by Arthur
I'm trying to extract IP addresses from my apache log, count them, and sort them.
我正在尝试从我的 apache 日志中提取 IP 地址,对其进行计数并对其进行排序。
And for whatever reason, the sorting part is horrible.
无论出于何种原因,排序部分都很糟糕。
Here is the command:
这是命令:
cat access.* | awk '{ print }' | sort | uniq -c | sort -n
Output example:
输出示例:
16789 65.X.X.X
19448 65.X.X.X
1995 138.X.X.X
2407 213.X.X.X
2728 213.X.X.X
5478 188.X.X.X
6496 176.X.X.X
11332 130.X.X.X
I don't understand why these values aren't really sorted. I've also tried to remove blanks at the start of the line (sed 's/^[\t ]*//g'
) and using sort -n -t" " -k1
, which doesn't change anything.
我不明白为什么这些值没有真正排序。我还尝试删除行 ( sed 's/^[\t ]*//g'
)开头的空格并使用sort -n -t" " -k1
,这不会改变任何内容。
Any hint ?
任何提示?
回答by linsort
This may be late, but using the numeric in the first sort will give you the desired result,
这可能晚了,但在第一种排序中使用数字会给你想要的结果,
cat access.log | awk '{print }' | sort -n | uniq -c | sort -nr | head -20
Output:
输出:
29877 93.xxx.xxx.xxx
17538 80.xxx.xxx.xxx
5895 198.xxx.xxx.xxx
3042 37.xxx.xxx.xxx
2956 208.xxx.xxx.xxx
2613 94.xxx.xxx.xxx
2572 89.xxx.xxx.xxx
2268 94.xxx.xxx.xxx
1896 89.xxx.xxx.xxx
1584 46.xxx.xxx.xxx
1402 208.xxx.xxx.xxx
1273 93.xxx.xxx.xxx
1054 208.xxx.xxx.xxx
860 162.xxx.xxx.xxx
830 208.xxx.xxx.xxx
606 162.xxx.xxx.xxx
545 94.xxx.xxx.xxx
480 37.xxx.xxx.xxx
446 162.xxx.xxx.xxx
398 162.xxx.xxx.xxx
回答by Benjamin Dupuis
Why use cat | awk
? You only need to use awk
:
为什么使用cat | awk
?您只需要使用awk
:
awk '{ print }' /var/log/*access*log | sort -n | uniq -c | sort -nr | head -20
回答by Arthur
I don't know why a simple sort -n
didn't work, but adding a non numeric character between the counter and the IP soved my issue.
我不知道为什么一个简单的sort -n
不起作用,但是在计数器和 IP 之间添加一个非数字字符解决了我的问题。
cat access.* | awk '{ print } ' | sort | uniq -c | sed -r 's/^[ \t]*([0-9]+) (.*)$/ --- /' | sort -rn
回答by tue
This should work
这应该工作
cat access.* | awk '{ print }' | sort | awk '{print " " ;}' | sort -n
I can't see a problem.
我看不出有什么问题。
Control characters in the files?
文件中的控制字符?
File system full (temp files)?
文件系统已满(临时文件)?
回答by Antony Gibbs
If sort isn't resulting as expected it's probably due to a locale issue.
如果排序未按预期产生,则可能是由于区域设置问题。
| LC_ALL=C sort -rn
| LC_ALL=C sort -rn
awk '{array[]++}END{ for (ip in array) print array[ip] " " ip}' <path/to/apache/*.log> | LC_ALL=C sort -rn
Sources sort not sorting as expected (space and locale)
https://www.commandlinefu.com/commands/view/9744/sort-ip-by-count-quickly-with-awk-from-apache-logs
https://www.commandlinefu.com/commands/view/9744/sort-ip-by-count-quickly-with-awk-from-apache-logs