Python:按分隔符列表拆分字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4697006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Split string by list of separators
提问by blah238
In Python, I'd like to split a string using a list of separators. The separators could be either commas or semicolons. Whitespace should be removed unless it is in the middle of non-whitespace, non-separator characters, in which case it should be preserved.
在 Python 中,我想使用分隔符列表拆分字符串。分隔符可以是逗号或分号。除非空格位于非空格、非分隔符的中间,否则应删除空格,在这种情况下应保留它。
Test case 1: ABC,DEF123,GHI_JKL,MN OP
Test case 2: ABC;DEF123;GHI_JKL;MN OP
Test case 3: ABC ; DEF123,GHI_JKL ; MN OP
测试用例 1:ABC,DEF123,GHI_JKL,MN OP
测试用例 2:ABC;DEF123;GHI_JKL;MN OP
测试用例 3:ABC ; DEF123,GHI_JKL ; MN OP
Sounds like a case for regular expressions, which is fine, but if it's easier or cleaner to do it another way that would be even better.
听起来像是正则表达式的例子,这很好,但如果用另一种方式更容易或更干净,那就更好了。
Thanks!
谢谢!
采纳答案by Joschua
This should be much faster than regex and you can pass a list of separators as you wanted:
这应该比正则表达式快得多,您可以根据需要传递分隔符列表:
def split(txt, seps):
default_sep = seps[0]
# we skip seps[0] because that's the default separator
for sep in seps[1:]:
txt = txt.replace(sep, default_sep)
return [i.strip() for i in txt.split(default_sep)]
How to use it:
如何使用它:
>>> split('ABC ; DEF123,GHI_JKL ; MN OP', (',', ';'))
['ABC', 'DEF123', 'GHI_JKL', 'MN OP']
Performance test:
性能测试:
import timeit
import re
TEST = 'ABC ; DEF123,GHI_JKL ; MN OP'
SEPS = (',', ';')
rsplit = re.compile("|".join(SEPS)).split
print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 1.6242462980007986
print(timeit.timeit(lambda: split(TEST, SEPS)))
# 1.3588597209964064
And with a much longer input string:
并使用更长的输入字符串:
TEST = 100 * 'ABC ; DEF123,GHI_JKL ; MN OP , '
print(timeit.timeit(lambda: [s.strip() for s in rsplit(TEST)]))
# 130.67168392999884
print(timeit.timeit(lambda: split(TEST, SEPS)))
# 50.31940778599528
回答by Sven Marnach
Using regular expressions, try
使用正则表达式,试试
[s.strip() for s in re.split(",|;", string)]
or
或者
[t.strip() for s in string.split(",") for t in s.split(";")]
without.
没有。
回答by Raph Levien
>>> re.split('\s*,\s*|\s*;\s*', 'a , b; cdf')
['a', 'b', 'cdf']
回答by tmarthal
Taking the above answer, with your test cases, you want to use a regular expression, and one or moreseparation characters. In your case, the separation characters seem to be ',', '|', ';' and whitespace. Whitespace in python is '\w', so the comprehension is:
根据上面的答案,对于您的测试用例,您希望使用正则表达式和一个或多个分隔符。在您的情况下,分隔字符似乎是 ',', '|', ';' 和空格。python中的空格是'\w',所以理解为:
import re
list = [s for s in re.split("[,|;\W]+", string)]
I cannot reply to sven's answer above, but I split on one or more of the characters inside the brackets, and don't have to use the strip() method.
我无法回复上面 sven 的回答,但我在括号内的一个或多个字符上进行了拆分,并且不必使用 strip() 方法。
Yikes, I didn't read the question correctly... Sven's answer with the strip works; mine assumes the whitespace is another separation.
哎呀,我没有正确阅读这个问题......斯文对脱衣舞的回答有效;我的假设空白是另一种分离。

