使用 re.split 将文件拆分为 Python 中的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/818705/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spliting a file into lines in Python using re.split
提问by Ashton
I'm trying to split a file with a list comprehension using code similar to:
我正在尝试使用类似于以下内容的代码拆分具有列表理解的文件:
lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]
However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?
然而,行列表总是有一个空字符串作为最后一个元素。有没有人知道避免这种情况的方法(不包括之后放置 pop() 的杂念)?
回答by John Fouhy
Put the regular expression hammer away :-)
把正则表达式锤子拿开:-)
- You can iterate over a file directly;
readlines()
is almost obsolete these days. - Read about
str.strip()
(and its friends,lstrip()
andrstrip()
). - Don't use
file
as a variable name. It's bad form, becausefile
is a built-in function.
- 您可以直接遍历文件;
readlines()
这些天几乎过时了。 - 阅读
str.strip()
(及其朋友,lstrip()
和rstrip()
)。 - 不要
file
用作变量名。这是不好的形式,因为它file
是一个内置函数。
You can write your code as:
您可以将代码编写为:
lines = []
f = open(filename)
for line in f:
if not line.startswith('com'):
lines.append(line.strip())
If you are still getting blank lines in there, you can add in a test:
如果你仍然在那里得到空行,你可以添加一个测试:
lines = []
f = open(filename)
for line in f:
if line.strip() and not line.startswith('com'):
lines.append(line.strip())
If you really want it in one line:
如果你真的想要在一行中:
lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]
Finally, if you're on python 2.6, look at the with statementto improve things a little more.
最后,如果您使用的是 python 2.6,请查看with 语句以进一步改进。
回答by Alex
lines = file.readlines()
行 = file.readlines()
edit:or if you didnt want blank lines in there, you can do
编辑:或者如果你不想在那里有空行,你可以这样做
lines = filter(lambda a:(a!='\n'), file.readlines())
lines = filter(lambda a:(a!='\n'), file.readlines())
edit^2:to remove trailing newines, you can do
编辑^ 2:要删除尾随newine,你可以这样做
lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]
lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]
回答by si28719e
another handy trick, especially when you need the line number, is to use enumerate:
另一个方便的技巧,尤其是当您需要行号时,是使用 enumerate:
fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
dosomethingwith(n, line)
i only found out about enumerate quite recently but it has come in handy quite a few times since then.
我最近才发现 enumerate ,但从那时起它已经派上用场了好几次。
回答by Ryan Ginstrom
This should work, and eliminate the regular expressions as well:
这应该有效,并消除正则表达式:
all_lines = (line.rstrip()
for line in open(filename)
if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)
Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:
由于您使用的是列表推导式而不是生成器表达式(因此整个文件无论如何都会加载到内存中),因此这里有一个快捷方式,可避免代码过滤掉空行:
lines = [line
for line in open(filename).read().splitlines()
if "com" not in line]