python 正则表达式拆分换行符的连续性

Question

提问by Humphrey Bogart

I'm trying to split a string on newline characters (catering for Windows, OS X, and Unix text file newline characters). If there are any succession of these, I want to split on that too and not include anyin the result.

我正在尝试在换行符上拆分字符串（适用于 Windows、OS X 和 Unix 文本文件换行符）。如果有任何连续的这些，我也想对其进行拆分，而不在结果中包含任何内容。

So, for when splitting the following:

因此，在拆分以下内容时：

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"

The result would be:

结果将是：

['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']

What regex should I use?

我应该使用什么正则表达式？

Answer 1

回答by magcius

If there are no spaces at the starts or ends of the lines, you can use line.split()with no arguments. It will remove doubles. . If not, you can use [a for a a.split("\r\n") if a].

如果行的开头或结尾没有空格，则可以line.split()不带参数使用。它将删除双打。. 如果没有，您可以使用[a for a a.split("\r\n") if a].

EDIT: the strtype also has a method called "splitlines".

编辑：该str类型还有一个称为“分割线”的方法。

"Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix".splitlines()

Answer 2

回答by Alex Martelli

The simplest pattern for this purpose is r'[\r\n]+'which you can pronounce as "one or more carriage-return or newline characters".

用于此目的的最简单模式是r'[\r\n]+'您可以将其发音为“一个或多个回车符或换行符”。

Answer 3

回答by Ignacio Vazquez-Abrams

re.split(r'[\n\r]+', line)

Answer 4

回答by ghostdog74

>>> s="Foo\r\n\r\nDouble Windows\r\rDouble OS X\n\nDouble Unix\r\nWindows\rOS X\nUnix"
>>> import re
>>> re.split("[\r\n]+",s)
['Foo', 'Double Windows', 'Double OS X', 'Double Unix', 'Windows', 'OS X', 'Unix']

Answer 5

回答by jlettvin

Paying attention to the greediness rules for patterns:

注意模式的贪婪规则：

pattern = re.compile(r'(\r\n){2,}|(\n\r){2,}|(\r){2,}|(\n){2,}')
paragraphs = pattern.split(text)

python 正则表达式拆分换行符的连续性

提问by Humphrey Bogart

回答by magcius

回答by Alex Martelli

回答by Ignacio Vazquez-Abrams

回答by ghostdog74

回答by jlettvin

相关推荐

最近更新

标签

python 正则表达式拆分换行符的连续性

提问by Humphrey Bogart

回答by magcius

回答by Alex Martelli

回答by Ignacio Vazquez-Abrams

回答by ghostdog74

回答by jlettvin

相关推荐

浏览器模拟 - Python

Python 中的输出替代方案

python Django 查询过滤器中的参数“name__icontains”和“description__icontains”是什么意思？

python Python中的一小时差异

相关推荐

最近更新

标签

python Django 查询过滤器中的参数“nameicontains”和“descriptionicontains”是什么意思？