python re - 在字符前拆分字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4094382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python re - split a string before a character
提问by kakarukeys
how to split a string at positions before a character?
如何在字符之前的位置拆分字符串?
- split a string before 'a'
- input: "fffagggahhh"
- output: ["fff", "aggg", "ahhh"]
- 在 'a' 之前拆分一个字符串
- 输入:“fffagggahhh”
- 输出:[“fff”,“aggg”,“啊哈”]
the obvious way doesn't work:
明显的方法不起作用:
>>> h=re.compile("(?=a)")
>>> h.split("fffagggahhh")
['fffagggahhh']
>>>
回答by adamk
>>> r=re.compile("(a?[^a]+)")
>>> r.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
EDIT:
编辑:
This won't handle correctly double as in the string:
这将无法正确处理a字符串中的double s:
>>> r.findall("fffagggaahhh")
['fff', 'aggg', 'ahhh']
KennyTM's re seems better suited.
KennyTM 的 re 似乎更合适。
回答by pyfunc
Ok, not exactly the solution you want but I thought it will be a useful addition to problem here.
好的,不完全是您想要的解决方案,但我认为这将是对这里问题的有用补充。
Solution without re
无需重新的解决方案
Without re:
无需重新:
>>> x = "fffagggahhh"
>>> k = x.split('a')
>>> j = [k[0]] + ['a'+l for l in k[1:]]
>>> j
['fff', 'aggg', 'ahhh']
>>>
回答by Igor Serebryany
split()takes an argument for the character to split on:
split()接受要拆分的字符的参数:
>>> "fffagggahhh".split('a')
['fff', 'ggg', 'hhh']
回答by kennytm
>>> rx = re.compile("(?:a|^)[^a]*")
>>> rx.findall("fffagggahhh")
['fff', 'aggg', 'ahhh']
>>> rx.findall("aaa")
['a', 'a', 'a']
>>> rx.findall("fgh")
['fgh']
>>> rx.findall("")
['']
回答by Amber
>>> foo = "abbcaaaabbbbcaaab"
>>> bar = foo.split("c")
>>> baz = [bar[0]] + ["c"+x for x in bar[1:]]
>>> baz
['abb', 'caaaabbbb', 'caaab']
Due to how slicing works, this will work properly even if there are no occurrences of cin foo.
由于切片的工作方式,即使没有出现cin ,这也会正常工作foo。
回答by Terrel Shumway
import re
def split_before(pattern,text):
prev = 0
for m in re.finditer(pattern,text):
yield text[prev:m.start()]
prev = m.start()
yield text[prev:]
if __name__ == '__main__':
print list(split_before("a","fffagggahhh"))
re.split treats the pattern as a delimiter.
re.split 将模式视为分隔符。
>>> print list(split_before("a","afffagggahhhaab"))
['', 'afff', 'aggg', 'ahhh', 'a', 'ab']
>>> print list(split_before("a","ffaabcaaa"))
['ff', 'a', 'abc', 'a', 'a', 'a']
>>> print list(split_before("a","aaaaa"))
['', 'a', 'a', 'a', 'a', 'a']
>>> print list(split_before("a","bbbb"))
['bbbb']
>>> print list(split_before("a",""))
['']
回答by John Machin
This one works on repeated a's
这个适用于重复a的
>>> re.findall("a[^a]*|^[^a]*", "aaaaa")
['a', 'a', 'a', 'a', 'a']
>>> re.findall("a[^a]*|[^a]+", "ffaabcaaa")
['ff', 'a', 'abc', 'a', 'a', 'a']
Approach: the main chunks that you are looking for are an afollowed by zero or more not-a. That covers all possibilities except for zero or more not-a. That can happen only at the start of the input string.
方法:您要查找的主要块是 ana后跟零个或多个 not- a。这涵盖了除零个或多个 not- 之外的所有可能性a。这只能发生在输入字符串的开头。

