python re.split() 按空格、逗号和句点分割,但不是在 1,000 或 1.50 等情况下

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12683201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:35:37  来源:igfitidea点击:

python re.split() to split by spaces, commas, and periods, but not in cases like 1,000 or 1.50

pythonregex

提问by rohanag

I want to use python re.split()to split a string into individual words by spaces, commas and periods. But I don't want "1,200"to be split into ["1", "200"]or ["1.2"]to be split into ["1", "2"].

我想使用 pythonre.split()按空格、逗号和句点将字符串拆分为单个单词。但我不想"1,200"被分裂成["1", "200"]或被["1.2"]分裂成["1", "2"].

Example

例子

l = "one two 3.4 5,6 seven.eight nine,ten"

The result should be ["one", "two", "3.4", "5,6" , "seven", "eight", "nine", "ten"]

结果应该是 ["one", "two", "3.4", "5,6" , "seven", "eight", "nine", "ten"]

采纳答案by Jo?o Silva

Use a negative lookahead and a negative lookbehind:

使用否定前瞻和否定后视:

> s = "one two 3.4 5,6 seven.eight nine,ten"
> parts = re.split('\s|(?<!\d)[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten']

In other words, you always split by \s(whitespace), and only split by commas and periods if they are notfollowed (?!\d)or preceded (?<!\d)by a digit.

换句话说,你总是用\s(空格)分割,如果后面或前面没有数字,则只用逗号和句点分割。(?!\d)(?<!\d)

DEMO.

演示

EDIT: As per @verdesmarald comment, you may want to use the following instead:

编辑:根据@verdesmarald 评论,您可能希望使用以下内容:

> s = "one two 3.4 5,6 seven.eight nine,ten,1.2,a,5"
> print re.split('\s|(?<!\d)[,.]|[,.](?!\d)', s)
['one', 'two', '3.4', '5,6', 'seven', 'eight', 'nine', 'ten', '1.2', 'a', '5']

This will split "1.2,a,5"into ["1.2", "a", "5"].

这将拆分"1.2,a,5"["1.2", "a", "5"].

DEMO.

演示

回答by Niet the Dark Absol

So you want to split on spaces, and on commas and periods that aren't surrounded by numbers. This should work:

所以你想在空格、逗号和没有被数字包围的句点上拆分。这应该有效:

r" |(?<![0-9])[.,](?![0-9])"