如何在 linux shell 中使用正则表达式从文件中提取 IP 地址？

Question

提问by Kazimieras Aliulis

How to extract a text part by regexp in linux shell? Lets say, I have a file where in every line is an IP address, but on a different position. What is the simplest way to extract those IP addresses using common unix command-line tools?

如何在linux shell中通过regexp提取文本部分？比方说，我有一个文件，其中每一行都是一个 IP 地址，但位置不同。使用常见的 unix 命令行工具提取这些 IP 地址的最简单方法是什么？

Answer 1

采纳答案by brien

You could use grepto pull them out.

您可以使用grep将它们拉出。

grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' file.txt

Answer 2

回答by PolyThinker

I'd suggest perl. (\d+.\d+.\d+.\d+) should probably do the trick.

我建议perl。(\d+.\d+.\d+.\d+) 应该可以解决问题。

EDIT: Just to make it more like a complete program, you could do something like the following (not tested):

编辑：只是为了使它更像一个完整的程序，您可以执行以下操作（未测试）：

#!/usr/bin/perl -w
use strict;

while (<>) {
    if (/(\d+\.\d+\.\d+\.\d+)/) {
        print "\n";
    }
}

This handles one IP per line. If you have more than one IPs per line, you need to use the /g option. man perlretutgives you a more detailed tutorial on regular expressions.

这每行处理一个 IP。如果每行有多个 IP，则需要使用 /g 选项。man perlretut为您提供了更详细的正则表达式教程。

Answer 3

回答by Avi

You can use sed. But if you know perl, that might be easier, and more useful to know in the long run:

您可以使用sed。但是，如果您了解 perl，从长远来看，这可能会更容易，也更有用：

perl -n '/(\d+\.\d+\.\d+\.\d+)/ && print "\n"' < file

Answer 4

回答by Allen Ratcliff

You could use awk, as well. Something like ...

您也可以使用 awk。就像是 ...

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' file

awk '{i=1; if (NF > 0) do {if ($i ~ /regexp/) print $i; i++;} while (i <= NF);}' 文件

-- may need cleaning. just a quick and dirty response to show basically how to do it with awk

-- 可能需要清洁。只是一个快速而肮脏的回应，基本上展示了如何使用 awk

Answer 5

回答by JB.

I usually start with grep, to get the regexp right.

我通常从 grep 开始，以获得正确的正则表达式。

# [multiple failed attempts here]
grep    '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'                 file  # good?
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' file  # good enough

Then I'd try and convert it to sedto filter out the rest of the line. (After reading this thread, you and I aren't going to do that anymore: we're going to use grep -oinstead)

然后我会尝试将其转换sed为过滤掉该行的其余部分。（阅读此主题后，您和我不会再这样做了：我们将改为使用grep -o）

sed -ne 's/.*\([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\).*//p  # FAIL

That's when I usually get annoyed with sedfor not using the same regexes as anyone else. So I move to perl.

那时我通常会因为sed没有使用与其他人相同的正则表达式而感到恼火。所以我搬到perl.

$ perl -nle '/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/ and print $&'

Perl's good to know in any case. If you've got a teeny bit of CPAN installed, you can even make it more reliable at little cost:

无论如何，Perl 很高兴知道。如果您安装了少量 CPAN，您甚至可以以较低的成本使其更可靠：

$ perl -MRegexp::Common=net -nE '/$RE{net}{IPV4}/ and say $&' file(s)

Answer 6

回答by Sarel Botha

Most of the examples here will match on 999.999.999.999 which is not technically a valid IP address.

此处的大多数示例将匹配 999.999.999.999，这在技术上不是有效的 IP 地址。

The following will match on only valid IP addresses (including network and broadcast addresses).

以下将仅匹配有效的 IP 地址（包括网络和广播地址）。

grep -E -o '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' file.txt

Omit the -o if you want to see the entire line that matched.

如果您想查看匹配的整行，请省略 -o。

Answer 7

回答by James

I wrote a little scriptto see my log files better, it's nothing special, but might help a lot of the people who are learning perl. It does DNS lookups on the IP addresses after it extracts them.

我写了一个小剧本，看我的日志文件更好，这没什么特别的，但可能有很大的帮助谁正在学习Perl的人。在提取 IP 地址后，它会在 IP 地址上进行 DNS 查找。

Answer 8

回答by apachebeard

for centos6.3

对于centos6.3

ifconfig eth0 | grep 'inet addr' | awk '{print $2}' | awk 'BEGIN {FS=":"} {print $2}'

Answer 9

回答by Phil L.

You can use some shell helper I made: https://github.com/philpraxis/ipextract

您可以使用我制作的一些 shell 助手：https: //github.com/philpraxis/ipextract

included them here for convenience:

为方便起见，将它们包括在此处：

#!/bin/sh
ipextract () 
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)' 
}

ipextractnet ()
{ 
egrep --only-matching -E  '(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)/[[:digit:]]+' 
}

ipextracttcp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/tcp' 
}

ipextractudp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/udp' 
}

ipextractsctp ()
{ 
egrep --only-matching -E  '[[:digit:]]+/sctp' 
}

ipextractfqdn ()
{ 
egrep --only-matching -E  '[a-zA-Z0-9]+[a-zA-Z0-9\-\.]*\.[a-zA-Z]{2,}' 
}

Load it / source it (when stored in ipextract file) from shell:

从 shell 加载/获取它（当存储在 ipextract 文件中时）：

$ . ipextract

美元。提取物

Use them:

使用它们：

$ ipextract < /etc/hosts
127.0.0.1
255.255.255.255
$

For some example of real use:

对于一些实际使用的例子：

ipextractfqdn < /var/log/snort/alert | sort -u
dmesg | ipextractudp

Answer 10

回答by Sankalp

This works fine for me in access logs.

这在访问日志中对我来说很好用。

cat access_log | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'

Let's break it part by part.

让我们一点一点地分解它。

[0-9]{1,3}means one to three occurrences of the range mentioned in []. In this case it is 0-9. so it matches patterns like 10 or 183.
Followed by a '.'. We will need to escape this as '.' is a meta character and has special meaning for the shell.

[0-9]{1,3}表示 [] 中提到的范围出现 1 到 3 次。在这种情况下，它是 0-9。所以它匹配像 10 或 183 这样的模式。
后跟一个“.”。我们需要将其转义为 '.' 是一个元字符，对外壳有特殊意义。

So now we are at patterns like '123.' '12.' etc.

所以现在我们处于像“123”这样的模式。'12。' 等等。

This pattern repeats itself three times(with the '.'). So we enclose it in brackets. ([0-9]{1,3}\.){3}
And lastly the pattern repeats itself but this time without the '.'. That is why we kept it separately in the 3rd step. [0-9]{1,3}

这个模式重复三遍（用'.'）。所以我们把它括在括号中。 ([0-9]{1,3}\.){3}
最后，模式重复了，但这次没有“.”。这就是为什么我们在第 3 步中将其单独保存的原因。 [0-9]{1,3}

If the ips are at the beginning of each line as in my case use:

如果 ips 位于每行的开头，如我的情况，请使用：

egrep -o '^([0-9]{1,3}\.){3}[0-9]{1,3}'

where '^' is an anchor that tells to search at the start of a line.

其中 '^' 是一个锚点，告诉在一行的开头进行搜索。

如何在 linux shell 中使用正则表达式从文件中提取 IP 地址？

提问by Kazimieras Aliulis

采纳答案by brien

回答by PolyThinker

回答by Avi

回答by Allen Ratcliff

回答by JB.

回答by Sarel Botha

回答by James

回答by apachebeard

回答by Phil L.

回答by Sankalp

相关推荐

最近更新

标签

如何在 linux shell 中使用正则表达式从文件中提取 IP 地址？

提问by Kazimieras Aliulis

采纳答案by brien

回答by PolyThinker

回答by Avi

回答by Allen Ratcliff

回答by JB.

回答by Sarel Botha

回答by James

回答by apachebeard

回答by Phil L.

回答by Sankalp

相关推荐

Linux 如何找出给定用户的组？

C# 将类列表序列化为 XML

Linux 使用 Unix 排序对多个键进行排序

Linux 上的网络使用 top/htop

相关推荐

最近更新

标签