python 用正则表达式匹配空行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1197600/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:41:38  来源:igfitidea点击:

Matching blank lines with regular expressions

pythonregex

提问by John Fouhy

I've got a string that I'm trying to split into chunks based on blank lines.

我有一个字符串,我试图根据空行将其拆分为多个块。

Given a string s, I thought I could do this:

给定一个 string s,我想我可以这样做:

re.split('(?m)^\s*$', s)

This works in some cases:

这在某些情况下有效:

>>> s = 'foo\nbar\n \nbaz'
>>> re.split('(?m)^\s*$', s)
['foo\nbar\n', '\nbaz']

But it doesn't work if the line is completely empty:

但如果该行完全为空,则它不起作用:

>>> s = 'foo\nbar\n\nbaz'
>>> re.split('(?m)^\s*$', s)
['foo\nbar\n\nbaz']

What am I doing wrong?

我究竟做错了什么?

[python 2.5; no difference if I compile '^\s*$'with re.MULTILINEand use the compiled expression instead]

[蟒蛇 2.5; 没有什么区别,如果我编译'^\s*$'使用re.MULTILINE,使用编译表达式。]

回答by Glenn Maynard

Try this instead:

试试这个:

re.split('\n\s*\n', s)

The problem is that "$ *^" actually only matches "spaces (if any) that are alone on a line"--not the newlines themselves. This leaves the delimiter empty when there's nothing on the line, which doesn't make sense.

问题是“$ *^”实际上只匹配“一行中单独的空格(如果有的话)”——而不是换行符本身。当行上没有任何内容时,这会使分隔符为空,这是没有意义的。

This version also gets rid of the delimiting newlines themselves, which is probably what you want. Otherwise, you'll have the newlines stuck to the beginning and end of each split part.

这个版本也摆脱了分隔换行符本身,这可能是你想要的。否则,您会将换行符卡在每个拆分部分的开头和结尾。

Treating multiple consecutive blank lines as defining an empty block ("abc\n\n\ndef" -> ["abc", "", "def"]) is trickier...

将多个连续的空行视为定义一个空块 ("abc\n\n\ndef" -> ["abc", "", "def"]) 比较棘手......

回答by Sascha Gottfried

The re library can split on one or more empty lines ! An empty line is a string that consists of zero or more whitespaces, starts at the start of the line and ends at the end of a line. Special character '$' matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline (excerpt from docs). That's why we need to add a special character '\s*' for the line break. Everything is possible :-)

re 库可以拆分为一个或多个空行!空行是由零个或多个空格组成的字符串,从行首开始到行尾结束。特殊字符 '$' 匹配字符串的结尾或刚好在字符串末尾的换行符之前,并且在 MULTILINE 模式下也匹配换行符之前(摘自docs)。这就是为什么我们需要为换行符添加一个特殊字符“\s*”。一切皆有可能 :-)

>>> import re
>>> text = "foo\n   \n    \n    \nbar\n"
>>> re.split("(?m)^\s*$\s*", text)
['foo\n', 'bar\n']

The same regex works with windows style line breaks.

相同的正则表达式适用于 Windows 样式的换行符。

>>> import re
>>> text = "foo\r\n       \r\n     \r\n   \r\nbar\r\n"
>>> re.split("(?m)^\s*$\s*", text)
['foo\r\n', 'bar\r\n']

回答by Leroy Scandal

Try this:

试试这个:

blank=''
with open('fu.txt') as txt:
    txt=txt.read().split('\n') 
    for line in txt:
        if line is blank: print('blank')
        else: print(line)

回答by Sinan ünür

Is this what you want?

这是你想要的吗?

>>> s = 'foo\nbar\n\nbaz'
>>> re.split('\n\s*\n',s)
['foo\nbar', 'baz']

>>> s = 'foo\nbar\n \nbaz'
>>> re.split('\n\s*\n',s)
['foo\nbar', 'baz']

>>> s = 'foo\nbar\n\t\nbaz'
>>> re.split('\n\s*\n',s)
['foo\nbar', 'baz']

回答by Instance Hunter

What you're doing wrong is using regular expressions. What is wrong with ('Some\ntext.').split('\n')?

你做错的是使用正则表达式。('Some\ntext.').split('\n') 有什么问题?