如何在 linux shell 中使用正则表达式从文件中提取 IP 地址?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/427979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 16:51:50  来源:igfitidea点击:

How do you extract IP addresses from files using a regex in a linux shell?

regexlinuxbashunixcommand-line

提问by Kazimieras Aliulis

How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?

如何在linux shell中通过regexp提取文本部分?比方说,我有一个文件,其中每一行都是一个 IP 地址,但位置不同。使用常见的 unix 命令行工具提取这些 IP 地址的最简单方法是什么?

采纳答案by brien

You could use grepto pull them out.

您可以使用grep将它们拉出。

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

回答by PolyThinker

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

我建议perl。(\d+.\d+.\d+.\d+) 应该可以解决问题。

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

编辑:只是为了使它更像一个完整的程序,您可以执行以下操作(未测试):

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretutgives you a more detailed tutorial on regular expressions.

这每行处理一个 IP。如果每行有多个 IP,则需要使用 /g 选项。man perlretut为您提供了更详细的正则表达式教程。

回答by Avi

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:

您可以使用sed。但是,如果您了解 perl,从长远来看,这可能会更容易,也更有用:

perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "\n"' < file

回答by Allen Ratcliff

You could use awk, as well. Something like ...

您也可以使用 awk。就像是 ...

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' 文件

-- may need cleaning. just a quick and dirty response to show basically how to do it with awk

-- 可能需要清洁。只是一个快速而肮脏的回应,基本上展示了如何使用 awk

回答by JB.

I usually start with grep, to get the regexp right.

我通常从 grep 开始,以获得正确的正则表达式。

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sedto filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -oinstead)

然后我会尝试将其转换sed为过滤掉该行的其余部分。(阅读此主题后,您和我不会再这样做了:我们将改为使用grep -o

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*//p  # FAIL

That's when I usually get annoyed with sedfor not using the same regexes as anyone else. So I move to perl.

那时我通常会因为sed没有使用与其他人相同的正则表达式而感到恼火。所以我搬到perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

无论如何,Perl 很高兴知道。如果您安装了少量 CPAN,您甚至可以以较低的成本使其更可靠:

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

回答by Sarel Botha

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.

此处的大多数示例将匹配 999.999.999.999,这在技术上不是有效的 IP 地址。

The following will match on only valid IP addresses (including network and broadcast addresses).

以下将仅匹配有效的 IP 地址(包括网络和广播地址)。

grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt

Omit the -o if you want to see the entire line that matched.

如果您想查看匹配的整行,请省略 -o。

回答by James

I wrote a little scriptto see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.

我写了一个小剧本,看我的日志文件更好,这没什么特别的,但可能有很大的帮助谁正在学习Perl的人。在提取 IP 地址后,它会在 IP 地址上进行 DNS 查找。

回答by apachebeard

for centos6.3

对于centos6.3

ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

回答by Phil L.

You can use some shell helper I made: https://github.com/philpraxis/ipextract

您可以使用我制作的一些 shell 助手:https: //github.com/philpraxis/ipextract

included them here for convenience:

为方便起见,将它们包括在此处:

#!/bin/sh
ipextract () 
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' 
}

ipextractnet ()
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' 
}

ipextracttcp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/tcp' 
}

ipextractudp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/udp' 
}

ipextractsctp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/sctp' 
}

ipextractfqdn ()
{ 
egrep --only-matching -E  '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' 
}

Load it / source it (when stored in ipextract file) from shell:

从 shell 加载/获取它(当存储在 ipextract 文件中时):

$ . ipextract

美元。提取物

Use them:

使用它们:

$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$

For some example of real use:

对于一些实际使用的例子:

ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp

回答by Sankalp

This works fine for me in access logs.

这在访问日志中对我来说很好用。

cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'

Let's break it part by part.

让我们一点一点地分解它。

  • [0-9]{1,3}means one to three occurrences of the range mentioned in []. In this case it is 0-9. so it matches patterns like 10 or 183.

  • Followed by a '.'. We will need to escape this as '.' is a meta character and has special meaning for the shell.

  • [0-9]{1,3}表示 [] 中提到的范围出现 1 到 3 次。在这种情况下,它是 0-9。所以它匹配像 10 或 183 这样的模式。

  • 后跟一个“.”。我们需要将其转义为 '.' 是一个元字符,对外壳有特殊意义。

So now we are at patterns like '123.' '12.' etc.

所以现在我们处于像“123”这样的模式。'12。' 等等。

  • This pattern repeats itself three times(with the '.'). So we enclose it in brackets. ([0-9]{1,3}\.){3}

  • And lastly the pattern repeats itself but this time without the '.'. That is why we kept it separately in the 3rd step. [0-9]{1,3}

  • 这个模式重复三遍(用'.')。所以我们把它括在括号中。 ([0-9]{1,3}\.){3}

  • 最后,模式重复了,但这次没有“.”。这就是为什么我们在第 3 步中将其单独保存的原因。 [0-9]{1,3}

If the ips are at the beginning of each line as in my case use:

如果 ips 位于每行的开头,如我的情况,请使用:

egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}'

where '^' is an anchor that tells to search at the start of a line.

其中 '^' 是一个锚点,告诉在一行的开头进行搜索。