Python使用re模块解析导入的文本文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14840310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python using re module to parse an imported text file
提问by user1478335
def regexread():
import re
result = ''
savefileagain = open('sliceeverfile3.txt','w')
#text=open('emeverslicefile4.txt','r')
text='09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013,'
pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
#with open('emeverslicefile4.txt') as text:
f = re.findall(pattern,text)
for item in f:
print(item)
savefileagain.write(item)
#savefileagain.close()
The above function as written parses the text and returns sets of seven numbers. I have three problems.
上面写的函数解析文本并返回七个数字的集合。我有三个问题。
- Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a
TypeError expected string or buffer, which I cannot solve even by reading some of the posts. - Secondly, when I try to write results to the 'write' file, nothing is returned and
- thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.
- 首先,包含与 text='09,...etc' 完全相同文本的“read”文件返回 a
TypeError expected string or buffer,即使阅读一些帖子我也无法解决。 - 其次,当我尝试将结果写入“写入”文件时,没有返回任何内容并且
- 第三,我不确定如何获得与打印语句相同的输出,它是三行,每行七个数字,这是我想要的输出。
This is the first time that I have used regex, so be gentle please!
这是我第一次使用正则表达式,所以请温柔点!
采纳答案by OmegaOuter
This should do the trick, check comments for explanation about what Im doing here =) Good luck
这应该可以解决问题,检查评论以解释我在这里做什么=)祝你好运
import re
filename = 'sliceeverfile3.txt'
pattern = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []
# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
# Read the file contents and generate a list with each line
lines = f.readlines()
# Iterate each line
for line in lines:
# Regex applied to each line
match = re.search(pattern, line)
if match:
# Make sure to add \n to display correctly when we write it back
new_line = match.group() + '\n'
print new_line
new_file.append(new_line)
with open(filename, 'w') as f:
# go to start of file
f.seek(0)
# actually write the lines
f.writelines(new_file)
回答by brwnj
You're sort of on the right track...
你有点走在正确的轨道上......
You'll iterate over the file: How to iterate over the file in python
您将遍历文件: How to iterate over the file in python
and apply the regex to each line. The link above should really answer all 3 of your questions when you realize you're trying to write 'item', which doesn't exist outside of that loop.
并将正则表达式应用于每一行。当您意识到您正在尝试编写在该循环之外不存在的“项目”时,上面的链接应该真正回答您的所有 3 个问题。

