使用 re.split 将文件拆分为 Python 中的行

Question

提问by Ashton

I'm trying to split a file with a list comprehension using code similar to:

我正在尝试使用类似于以下内容的代码拆分具有列表理解的文件：

lines = [x for x in re.split(r"\n+", file.read()) if not re.match(r"com", x)]

However, the lines list always has an empty string as the last element. Does anyone know a way to avoid this (excluding the cludge of putting a pop() afterwards)?

然而，行列表总是有一个空字符串作为最后一个元素。有没有人知道避免这种情况的方法（不包括之后放置 pop() 的杂念）？

Answer 1

回答by John Fouhy

Put the regular expression hammer away :-)

把正则表达式锤子拿开:-)

You can iterate over a file directly; readlines()is almost obsolete these days.
Read about str.strip()(and its friends, lstrip()and rstrip()).
Don't use fileas a variable name. It's bad form, because fileis a built-in function.

您可以直接遍历文件；readlines()这些天几乎过时了。
阅读str.strip()（及其朋友，lstrip()和rstrip()）。
不要file用作变量名。这是不好的形式，因为它file是一个内置函数。

You can write your code as:

您可以将代码编写为：

lines = []
f = open(filename)
for line in f:
    if not line.startswith('com'):
        lines.append(line.strip())

If you are still getting blank lines in there, you can add in a test:

如果你仍然在那里得到空行，你可以添加一个测试：

lines = []
f = open(filename)
for line in f:
    if line.strip() and not line.startswith('com'):
        lines.append(line.strip())

If you really want it in one line:

如果你真的想要在一行中：

lines = [line.strip() for line in open(filename) if line.strip() and not line.startswith('com')]

Finally, if you're on python 2.6, look at the with statementto improve things a little more.

最后，如果您使用的是 python 2.6，请查看with 语句以进一步改进。

Answer 2

回答by Alex

lines = file.readlines()

行 = file.readlines()

edit:or if you didnt want blank lines in there, you can do

编辑：或者如果你不想在那里有空行，你可以这样做

lines = filter(lambda a:(a!='\n'), file.readlines())

edit^2:to remove trailing newines, you can do

编辑^ 2：要删除尾随newine，你可以这样做

lines = [re.sub('\n','',line) for line in filter(lambda a:(a!='\n'), file.readlines())]

Answer 3

回答by si28719e

another handy trick, especially when you need the line number, is to use enumerate:

另一个方便的技巧，尤其是当您需要行号时，是使用 enumerate：


fp = open("myfile.txt", "r")
for n, line in enumerate(fp.readlines()):
    dosomethingwith(n, line)

i only found out about enumerate quite recently but it has come in handy quite a few times since then.

我最近才发现 enumerate ，但从那时起它已经派上用场了好几次。

Answer 4

回答by Ryan Ginstrom

This should work, and eliminate the regular expressions as well:

这应该有效，并消除正则表达式：

all_lines = (line.rstrip()
             for line in open(filename)
             if "com" not in line)
# filter out the empty lines
lines = filter(lambda x : x, all_lines)

Since you're using a list comprehension and not a generator expression (so the whole file gets loaded into memory anyway), here's a shortcut that avoids code to filter out empty lines:

由于您使用的是列表推导式而不是生成器表达式（因此整个文件无论如何都会加载到内存中），因此这里有一个快捷方式，可避免代码过滤掉空行：

lines = [line
     for line in open(filename).read().splitlines()
     if "com" not in line]

使用 re.split 将文件拆分为 Python 中的行

提问by Ashton

回答by John Fouhy

回答by Alex

回答by si28719e

回答by Ryan Ginstrom

相关推荐

最近更新

标签

使用 re.split 将文件拆分为 Python 中的行

提问by Ashton

回答by John Fouhy

回答by Alex

回答by si28719e

回答by Ryan Ginstrom

相关推荐

Python：如何将 Markdown 格式的文本转换为文本

python Django 模板：比较 IF 语句中的字典长度

python，“a in b”关键字，多个a怎么样？

Python 宏：用例？

相关推荐

最近更新

标签