ip地址的python解析文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14026529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python parse file for ip addresses
提问by Mark Hill
I have a file with several IP addresses. There are about 900 IPs on 4 lines of txt. I would like the output to be 1 IP per line. How can I accomplish this? Based on other code, I have come up wiht this, but it fails becasue multiple IPs are on single lines:
我有一个包含多个 IP 地址的文件。4行txt大约有900个IP。我希望输出为每行 1 个 IP。我怎样才能做到这一点?基于其他代码,我想出了这个,但它失败了,因为多个 IP 在单行上:
import sys
import re
try:
if sys.argv[1:]:
print "File: %s" % (sys.argv[1])
logfile = sys.argv[1]
else:
logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
try:
file = open(logfile, "r")
ips = []
for text in file.readlines():
text = text.rstrip()
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
outfile = open("/tmp/list.txt", "a")
addy = "".join(ip)
if addy is not '':
print "IP: %s" % (addy)
outfile.write(addy)
outfile.write("\n")
finally:
file.close()
outfile.close()
except IOError, (errno, strerror):
print "I/O Error(%s) : %s" % (errno, strerror)
采纳答案by Martijn Pieters
The $anchor in your expression is preventing you from finding anything but the last entry. Remove that, then use the list returned by .findall():
$表达式中的锚点阻止您找到除最后一个条目之外的任何内容。删除它,然后使用返回的列表.findall():
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
if regex:
ips.extend(found)
回答by Walk
The findall function returns an array of matches, you aren't iterating through each match.
findall 函数返回一个匹配数组,您不会遍历每个匹配。
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
for match in regex:
if match not in ips:
ips.append(match)
回答by jfs
Without re.MULTILINEflag $matches only at the end of string.
没有re.MULTILINE标志$仅在字符串末尾匹配。
To make debugging easier split the code into several parts that you could test independently.
为了使调试更容易,将代码分成几个可以独立测试的部分。
def extract_ips(data):
return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)
the regex filters out some valid ips e.g.,
2130706433, "1::1".And in reverse, the regex matches invalid strings e.g.,
999.999.999.999. You could validate an ip string usingsocket.inet_aton()or more generalsocket.inet_pton(). You could even split the input into pieces without searching for ip and use these functions to keep valid ips.
正则表达式会过滤掉一些有效的 ip,例如
2130706433, "1::1"。反过来,正则表达式匹配无效字符串,例如
999.999.999.999. 您可以使用socket.inet_aton()或更一般的socket.inet_pton(). 您甚至可以在不搜索 ip 的情况下将输入分成几部分,并使用这些函数来保持有效的 ip。
If input file is small and you don't need to preserve original order of ips:
如果输入文件很小并且您不需要保留 ips 的原始顺序:
with open(filename) as infile, open(outfilename, "w") as outfile:
outfile.write("\n".join(set(extract_ips(infile.read()))))
Otherwise:
除此以外:
with open(filename) as infile, open(outfilename, "w") as outfile:
seen = set()
for line in infile:
for ip in extract_ips(line):
if ip not in seen:
seen.add(ip)
print >>outfile, ip
回答by Johnny
Extracting IP Addresses From File
从文件中提取 IP 地址
I answered a similar question in this discussion. In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (e.g. string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel
我在这个讨论中回答了一个类似的问题。简而言之,这是一个基于我正在进行的项目之一的解决方案,用于从不同类型的输入数据(例如字符串、文件、博客帖子等)中提取基于网络和主机的指标:https: //github.com/JohnnyWachter/intel
I would import the IPAddressesand Dataclasses, then use them to accomplish your task in the following manner:
我将导入IPAddresses和Data类,然后使用它们以下列方式完成您的任务:
#!/usr/bin/env/python
"""Extract IPv4 Addresses From Input File."""
from Data import CleanData # Format and Clean the Input Data.
from IPAddresses import ExtractIPs # Extract IPs From Input Data.
def get_ip_addresses(input_file_path):
""""
Read contents of input file and extract IPv4 Addresses.
:param iput_file_path: fully qualified path to input file. Expecting str
:returns: dictionary of IPv4 and IPv4-like Address lists
:rtype: dict
"""
input_data = [] # Empty list to house formatted input data.
input_data.extend(CleanData(input_file_path).to_list())
results = ExtractIPs(input_data).get_ipv4_results()
return results
Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire. The below example makes use of the above function; printing the results to console, and writing them to a specified output file:
# Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode. with open('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty. if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to console print(ip_address) # Write to output file. outfile.write(ip_address)
现在您有了一个列表字典,您可以轻松访问您想要的数据并以您想要的任何方式输出它。下面的例子使用了上面的函数;将结果打印到控制台,并将它们写入指定的输出文件:
# Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode. with open('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty. if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to console print(ip_address) # Write to output file. outfile.write(ip_address)

