python 正则表达式拆分换行符的连续性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2596771/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex to split on successions of newline characters
提问by Humphrey Bogart
I'm trying to split a string on newline characters (catering for Windows, OS X, and Unix text file newline characters). If there are any succession of these, I want to split on that too and not include anyin the result.
我正在尝试在换行符上拆分字符串(适用于 Windows、OS X 和 Unix 文本文件换行符)。如果有任何连续的这些,我也想对其进行拆分,而不在结果中包含任何内容。
So, for when splitting the following:
因此,在拆分以下内容时:
"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"
The result would be:
结果将是:
['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']
What regex should I use?
我应该使用什么正则表达式?
回答by magcius
If there are no spaces at the starts or ends of the lines, you can use line.split()
with no arguments. It will remove doubles.
.
If not, you can use [a for a a.split("\r\n") if a]
.
如果行的开头或结尾没有空格,则可以line.split()
不带参数使用。它将删除双打。. 如果没有,您可以使用[a for a a.split("\r\n") if a]
.
EDIT: the str
type also has a method called "splitlines".
编辑:该str
类型还有一个称为“分割线”的方法。
"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()
"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()
回答by Alex Martelli
The simplest pattern for this purpose is r'[\r\n]+'
which you can pronounce as "one or more carriage-return or newline characters".
用于此目的的最简单模式是r'[\r\n]+'
您可以将其发音为“一个或多个回车符或换行符”。
回答by Ignacio Vazquez-Abrams
re.split(r'[\n\r]+', line)
回答by ghostdog74
>>> s="Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"
>>> import re
>>> re.split("[\r\n]+",s)
['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']
回答by jlettvin
Paying attention to the greediness rules for patterns:
注意模式的贪婪规则:
pattern = re.compile(r'(\r\n){2,}|(\n\r){2,}|(\r){2,}|(\n){2,}')
paragraphs = pattern.split(text)