使用 Python 正则表达式按换行符或句点划分字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17618149/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:41:07  来源:igfitidea点击:

Divide string by line break or period with Python regular expressions

pythonregexstringsplit

提问by David Y. Stephenson

I have a string:

我有一个字符串:

"""Hello. It's good to meet you.
My name is Bob."""

I'm trying to find the best way to split this into a list divided by periods and linebreaks:

我正在尝试找到将其拆分为按句点和换行符划分的列表的最佳方法:

["Hello", "It's good to meet you", "My name is Bob"]

I'm pretty sure I should use regular expressions, but, having no experience with them, I'm struggling to figure out how to do this.

我很确定我应该使用正则表达式,但是,由于没有使用它们的经验,我正在努力弄清楚如何做到这一点。

采纳答案by falsetru

You don't need regex.

你不需要正则表达式。

>>> txt = """Hello. It's good to meet you.
... My name is Bob."""
>>> txt.split('.')
['Hello', " It's good to meet you", '\nMy name is Bob', '']
>>> [x for x in map(str.strip, txt.split('.')) if x]
['Hello', "It's good to meet you", 'My name is Bob']

回答by zhangyangyu

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> p = re.compile(r'[^\s\.][^\.\n]+')
>>> p.findall(s)
['Hello', "It's good to meet you", 'My name is Bob']
>>> s = "Hello. #It's good to meet you # .'"
>>> p.findall(s)
['Hello', "#It's good to meet you # "]

回答by Tim Pietzcker

For your example, it would suffice to split on dots, optionally followed by whitespace (and to ignore empty results):

对于您的示例,在点上拆分就足够了,可以选择后跟空格(并忽略空结果):

>>> s = """Hello. It's good to meet you.
... My name is Bob."""
>>> import re
>>> re.split(r"\.\s*", s)
['Hello', "It's good to meet you", 'My name is Bob', '']

In real life, you'd have to handle Mr. Orange, Dr. Greeneand George W. Bush, though...

在现实生活中,您必须处理Mr. Orange,Dr. GreeneGeorge W. Bush,但是...

回答by Casimir et Hippolyte

You can use this split

您可以使用此拆分

re.split(r"(?<!^)\s*[.\n]+\s*(?!$)", s)

回答by eyquem

Mine:

矿:

re.findall('(?=\S)[^.\n]+(?<=\S)',su)