在 Python 中标记保留分隔符的字符串

Question

提问by fortran

Is there any equivalent to str.splitin Python that also returns the delimiters?

str.split在 Python 中是否有任何等价物也返回分隔符？

I need to preserve the whitespace layout for my output after processing some of the tokens.

在处理一些标记后，我需要为我的输出保留空白布局。

Example:

例子：

>>> s="\tthis is an  example"
>>> print s.split()
['this', 'is', 'an', 'example']

>>> print what_I_want(s)
['\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

Thanks!

谢谢！

Answer 1

回答by Jonathan Feinberg

How about

怎么样

import re
splitter = re.compile(r'(\s+|\S+)')
splitter.findall(s)

Answer 2

回答by Denis Otkidach

>>> re.compile(r'(\s+)').split("\tthis is an  example")
['', '\t', 'this', ' ', 'is', ' ', 'an', '  ', 'example']

Answer 3

回答by Tim Pietzcker

the remodule provides this functionality:

该re模块提供此功能：

>>> import re
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']

(quoted from the Python documentation).

（引自 Python 文档）。

For your example (split on whitespace), use re.split('(\s+)', '\tThis is an example').

对于您的示例（在空格上拆分），请使用re.split('(\s+)', '\tThis is an example').

The key is to enclose the regex on which to split in capturing parentheses. That way, the delimiters are added to the list of results.

关键是将要拆分的正则表达式括在捕获括号中。这样，分隔符就会添加到结果列表中。

Edit: As pointed out, any preceding/trailing delimiters will of course also be added to the list. To avoid that you can use the .strip()method on your input string first.

编辑：正如所指出的，任何前面/后面的定界符当然也会被添加到列表中。为避免这种情况，您可以.strip()先在输入字符串上使用该方法。

Answer 4

回答by jcdyer

Have you looked at pyparsing? Example borrowed from the pyparsing wiki:

你看过pyparsing吗？从pyparsing wiki借用的示例：

>>> from pyparsing import Word, alphas
>>> greet = Word(alphas) + "," + Word(alphas) + "!"
>>> hello1 = 'Hello, World!'
>>> hello2 = 'Greetings, Earthlings!'
>>> for hello in hello1, hello2:
...     print (u'%s \u2192 %r' % (hello, greet.parseString(hello))).encode('utf-8')
... 
Hello, World! → (['Hello', ',', 'World', '!'], {})
Greetings, Earthlings! → (['Greetings', ',', 'Earthlings', '!'], {})

Answer 5

回答by fortran

Thanks guys for pointing for the remodule, I'm still trying to decide between that and using my own function that returns a sequence...

感谢各位指出该re模块，我仍在尝试在它和使用我自己的返回序列的函数之间做出决定......

def split_keep_delimiters(s, delims="\t\n\r "):
    delim_group = s[0] in delims
    start = 0
    for index, char in enumerate(s):
        if delim_group != (char in delims):
            delim_group ^= True
            yield s[start:index]
            start = index
    yield s[start:index+1]

If I had time I'd benchmark them xD

如果我有时间，我会对它们进行基准测试 xD

在 Python 中标记保留分隔符的字符串

提问by fortran

回答by Jonathan Feinberg

回答by Denis Otkidach

回答by Tim Pietzcker

回答by jcdyer

回答by fortran

相关推荐

最近更新

标签

在 Python 中标记保留分隔符的字符串

提问by fortran

回答by Jonathan Feinberg

回答by Denis Otkidach

回答by Tim Pietzcker

回答by jcdyer

回答by fortran

相关推荐

python 获取列表中的每个奇数变量？

python 如何让 raw_input 重复直到我想退出？

理解：从 PHP 数组到 Python？

Python 装饰器处理文档字符串

相关推荐

最近更新

标签