使用 re.split 将文件拆分为 Python 中的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/818705/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:54:42  来源:igfitidea点击:

Spliting a file into lines in Python using re.split

pythonregexlist-comprehension

提问by Ashton

I'm trying to split a file with a list comprehension using code similar to:

我正在尝试使用类似于以下内容的代码拆分具有列表理解的文件:

lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]

However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

然而,行列表总是有一个空字符串作为最后一个元素。有没有人知道避免这种情况的方法(不包括之后放置 pop() 的杂念)?

回答by John Fouhy

Put the regular expression hammer away :-)

把正则表达式锤子拿开:-)

  1. You can iterate over a file directly; readlines()is almost obsolete these days.
  2. Read about str.strip()(and its friends, lstrip()and rstrip()).
  3. Don't use fileas a variable name. It's bad form, because fileis a built-in function.
  1. 您可以直接遍历文件;readlines()这些天几乎过时了。
  2. 阅读str.strip()(及其朋友,lstrip()rstrip())。
  3. 不要file用作变量名。这是不好的形式,因为它file是一个内置函数

You can write your code as:

您可以将代码编写为:

lines = []
f = open(filename)
for line in f:
    if not line.startswith('com'):
        lines.append(line.strip())

If you are still getting blank lines in there, you can add in a test:

如果你仍然在那里得到空行,你可以添加一个测试:

lines = []
f = open(filename)
for line in f:
    if line.strip() and not line.startswith('com'):
        lines.append(line.strip())

If you really want it in one line:

如果你真的想要在一行中:

lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]

Finally, if you're on python 2.6, look at the with statementto improve things a little more.

最后,如果您使用的是 python 2.6,请查看with 语句以进一步改进。

回答by Alex

lines = file.readlines()

行 = file.readlines()

edit:or if you didnt want blank lines in there, you can do

编辑:或者如果你不想在那里有空行,你可以这样做

lines = filter(lambda a:(a!='\n'), file.readlines())

lines = filter(lambda a:(a!='\n'), file.readlines())

edit^2:to remove trailing newines, you can do

编辑^ 2:要删除尾随newine,你可以这样做

lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

回答by si28719e

another handy trick, especially when you need the line number, is to use enumerate:

另一个方便的技巧,尤其是当您需要行号时,是使用 enumerate:


fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
    dosomethingwith(n, line)

i only found out about enumerate quite recently but it has come in handy quite a few times since then.

我最近才发现 enumerate ,但从那时起它已经派上用场了好几次。

回答by Ryan Ginstrom

This should work, and eliminate the regular expressions as well:

这应该有效,并消除正则表达式:

all_lines = (line.rstrip()
             for line in open(filename)
             if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)

Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:

由于您使用的是列表推导式而不是生成器表达式(因此整个文件无论如何都会加载到内存中),因此这里有一个快捷方式,可避免代码过滤掉空行:

lines = [line
     for line in open(filename).read().splitlines()
     if "com" not in line]