Python：按分隔符列表拆分字符串

Question

提问by blah238

In Python, I'd like to split a string using a list of separators. The separators could be either commas or semicolons. Whitespace should be removed unless it is in the middle of non-whitespace, non-separator characters, in which case it should be preserved.

在 Python 中，我想使用分隔符列表拆分字符串。分隔符可以是逗号或分号。除非空格位于非空格、非分隔符的中间，否则应删除空格，在这种情况下应保留它。

Test case 1: ABC,DEF123,GHI_JKL,MN OP
Test case 2: ABC;DEF123;GHI_JKL;MN OP
Test case 3: ABC ; DEF123,GHI_JKL ; MN OP

测试用例 1：ABC,DEF123,GHI_JKL,MN OP
测试用例 2：ABC;DEF123;GHI_JKL;MN OP
测试用例 3：ABC ; DEF123,GHI_JKL ; MN OP

Sounds like a case for regular expressions, which is fine, but if it's easier or cleaner to do it another way that would be even better.

听起来像是正则表达式的例子，这很好，但如果用另一种方式更容易或更干净，那就更好了。

Thanks!

谢谢！

Answer 1

采纳答案by Joschua

This should be much faster than regex and you can pass a list of separators as you wanted:

这应该比正则表达式快得多，您可以根据需要传递分隔符列表：

def split(txt, seps):
    default_sep = seps[0]

    # we skip seps[0] because that's the default separator
    for sep in seps[1:]:
        txt = txt.replace(sep, default_sep)
    return [i.strip() for i in txt.split(default_sep)]

How to use it:

如何使用它：

>>> split('ABC ; DEF123,GHI_JKL ; MN OP', (',', ';'))
['ABC', 'DEF123', 'GHI_JKL', 'MN OP']

Performance test:

性能测试：

import timeit
import re


TEST = 'ABC ; DEF123,GHI_JKL ; MN OP'
SEPS = (',', ';')


rsplit = re.compile("|".join(SEPS)).split
print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 1.6242462980007986

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 1.3588597209964064

And with a much longer input string:

并使用更长的输入字符串：

TEST = 100 * 'ABC ; DEF123,GHI_JKL ; MN OP , '

print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 130.67168392999884

print(timeit.timeit(lambda: split(TEST, SEPS)))
# 50.31940778599528

Answer 2

回答by Sven Marnach

Using regular expressions, try

使用正则表达式，试试

[s.strip() for s in re.split(",|;", string)]

or

或者

[t.strip() for s in string.split(",") for t in s.split(";")]

without.

没有。

Answer 3

回答by Raph Levien

>>> re.split('\s*,\s*|\s*;\s*', 'a , b; cdf')
['a', 'b', 'cdf']

Answer 4

回答by tmarthal

Taking the above answer, with your test cases, you want to use a regular expression, and one or moreseparation characters. In your case, the separation characters seem to be ',', '|', ';' and whitespace. Whitespace in python is '\w', so the comprehension is:

根据上面的答案，对于您的测试用例，您希望使用正则表达式和一个或多个分隔符。在您的情况下，分隔字符似乎是 ',', '|', ';' 和空格。python中的空格是'\w'，所以理解为：

import re
list = [s for s in re.split("[,|;\W]+", string)]

I cannot reply to sven's answer above, but I split on one or more of the characters inside the brackets, and don't have to use the strip() method.

我无法回复上面 sven 的回答，但我在括号内的一个或多个字符上进行了拆分，并且不必使用 strip() 方法。

Yikes, I didn't read the question correctly... Sven's answer with the strip works; mine assumes the whitespace is another separation.

哎呀，我没有正确阅读这个问题......斯文对脱衣舞的回答有效；我的假设空白是另一种分离。

Python：按分隔符列表拆分字符串

提问by blah238

采纳答案by Joschua

回答by Sven Marnach

回答by Raph Levien

回答by tmarthal

相关推荐

最近更新

标签

Python：按分隔符列表拆分字符串

提问by blah238

采纳答案by Joschua

回答by Sven Marnach

回答by Raph Levien

回答by tmarthal

相关推荐

Python 在 py.test 测试中记录

Python 如何获取两个列表并将它们组合起来排除任何重复项？

Python ValueError：使用序列设置数组元素

将标准输出重定向到 Python 中的文件？

相关推荐

最近更新

标签