如何在 linux shell 中使用正则表达式从文件中提取 IP 地址?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/427979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you extract IP addresses from files using a regex in a linux shell?
提问by Kazimieras Aliulis
How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?
如何在linux shell中通过regexp提取文本部分?比方说,我有一个文件,其中每一行都是一个 IP 地址,但位置不同。使用常见的 unix 命令行工具提取这些 IP 地址的最简单方法是什么?
采纳答案by brien
回答by PolyThinker
I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.
我建议perl。(\d+.\d+.\d+.\d+) 应该可以解决问题。
EDIT: Just to make it more like a complete program, you could do something like the following (not tested):
编辑:只是为了使它更像一个完整的程序,您可以执行以下操作(未测试):
#!/usr/bin/perl -w
use strict;
while (<>) {
if (/(\d+\.\d+\.\d+\.\d+)/) {
print "\n";
}
}
This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretutgives you a more detailed tutorial on regular expressions.
这每行处理一个 IP。如果每行有多个 IP,则需要使用 /g 选项。man perlretut为您提供了更详细的正则表达式教程。
回答by Avi
回答by Allen Ratcliff
You could use awk, as well. Something like ...
您也可以使用 awk。就像是 ...
awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file
awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' 文件
-- may need cleaning. just a quick and dirty response to show basically how to do it with awk
-- 可能需要清洁。只是一个快速而肮脏的回应,基本上展示了如何使用 awk
回答by JB.
I usually start with grep, to get the regexp right.
我通常从 grep 开始,以获得正确的正则表达式。
# [multiple failed attempts here]
grep '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*' file # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file # good enough
Then I'd try and convert it to sed
to filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -o
instead)
然后我会尝试将其转换sed
为过滤掉该行的其余部分。(阅读此主题后,您和我不会再这样做了:我们将改为使用grep -o
)
sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*//p # FAIL
That's when I usually get annoyed with sed
for not using the same regexes as anyone else. So I move to perl
.
那时我通常会因为sed
没有使用与其他人相同的正则表达式而感到恼火。所以我搬到perl
.
$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'
Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:
无论如何,Perl 很高兴知道。如果您安装了少量 CPAN,您甚至可以以较低的成本使其更可靠:
$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)
回答by Sarel Botha
Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.
此处的大多数示例将匹配 999.999.999.999,这在技术上不是有效的 IP 地址。
The following will match on only valid IP addresses (including network and broadcast addresses).
以下将仅匹配有效的 IP 地址(包括网络和广播地址)。
grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt
Omit the -o if you want to see the entire line that matched.
如果您想查看匹配的整行,请省略 -o。
回答by James
回答by apachebeard
for centos6.3
对于centos6.3
ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'
ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'
回答by Phil L.
You can use some shell helper I made: https://github.com/philpraxis/ipextract
您可以使用我制作的一些 shell 助手:https: //github.com/philpraxis/ipextract
included them here for convenience:
为方便起见,将它们包括在此处:
#!/bin/sh
ipextract ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'
}
ipextractnet ()
{
egrep --only-matching -E '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+'
}
ipextracttcp ()
{
egrep --only-matching -E '[[:digit:]]+/tcp'
}
ipextractudp ()
{
egrep --only-matching -E '[[:digit:]]+/udp'
}
ipextractsctp ()
{
egrep --only-matching -E '[[:digit:]]+/sctp'
}
ipextractfqdn ()
{
egrep --only-matching -E '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}'
}
Load it / source it (when stored in ipextract file) from shell:
从 shell 加载/获取它(当存储在 ipextract 文件中时):
$ . ipextract
美元。提取物
Use them:
使用它们:
$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$
For some example of real use:
对于一些实际使用的例子:
ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp
回答by Sankalp
This works fine for me in access logs.
这在访问日志中对我来说很好用。
cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'
Let's break it part by part.
让我们一点一点地分解它。
[0-9]{1,3}
means one to three occurrences of the range mentioned in []. In this case it is 0-9. so it matches patterns like 10 or 183.Followed by a '.'. We will need to escape this as '.' is a meta character and has special meaning for the shell.
[0-9]{1,3}
表示 [] 中提到的范围出现 1 到 3 次。在这种情况下,它是 0-9。所以它匹配像 10 或 183 这样的模式。后跟一个“.”。我们需要将其转义为 '.' 是一个元字符,对外壳有特殊意义。
So now we are at patterns like '123.' '12.' etc.
所以现在我们处于像“123”这样的模式。'12。' 等等。
This pattern repeats itself three times(with the '.'). So we enclose it in brackets.
([0-9]{1,3}\.){3}
And lastly the pattern repeats itself but this time without the '.'. That is why we kept it separately in the 3rd step.
[0-9]{1,3}
这个模式重复三遍(用'.')。所以我们把它括在括号中。
([0-9]{1,3}\.){3}
最后,模式重复了,但这次没有“.”。这就是为什么我们在第 3 步中将其单独保存的原因。
[0-9]{1,3}
If the ips are at the beginning of each line as in my case use:
如果 ips 位于每行的开头,如我的情况,请使用:
egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}'
where '^' is an anchor that tells to search at the start of a line.
其中 '^' 是一个锚点,告诉在一行的开头进行搜索。