python 在大写字母前插入空格的pythonic方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/199059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 19:38:10  来源:igfitidea点击:

A pythonic way to insert a space before capital letters

pythonregextext-files

提问by Electrons_Ahoy

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".

我有一个文件,我正在通过 python 脚本更改其格式。我在这个文件中有几个驼峰式字符串,我只想在大写字母前插入一个空格 - 所以“WordWordWord”变成了“Word Word Word”。

My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?

我有限的正则表达式经验刚刚在我身上停滞不前 - 有人可以想到一个像样的正则表达式来做到这一点,或者(更好)有没有更pythonic的方法来做到这一点,我错过了?

回答by Greg Hewgill

You could try:

你可以试试:

>>> re.sub(r"(\w)([A-Z])", r" ", "WordWordWord")
'Word Word Word'

回答by Greg Hewgill

If there are consecutive capitals, then Gregs result could not be what you look for, since the \w consumes the caracter in front of the captial letter to be replaced.

如果有连续的大写字母,则 Gregs 结果可能不是您要查找的内容,因为 \w 消耗了要替换的大写字母前面的字符。

>>> re.sub(r"(\w)([A-Z])", r" ", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

后视可以解决这个问题:

>>> re.sub(r"(?<=\w)([A-Z])", r" ", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'

回答by tzot

Perhaps shorter:

也许更短:

>>> re.sub(r"\B([A-Z])", r" ", "DoIThinkThisIsABetterAnswer?")

回答by Markus Jarderot

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

看看我在.NET 上的回答- 如何将“大写”分隔的字符串拆分为数组?

Edit:Maybe better to include it here.

编辑:也许更好地将它包含在这里。

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r' ', text)

For example:

例如:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]

回答by Yaroslav Surzhikov

Maybe you would be interested in one-liner implementation without using regexp:

也许您会对不使用正则表达式的单行实现感兴趣:

''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()

回答by Dan Lenski

With regexes you can do this:

使用正则表达式,您可以执行以下操作:

re.sub('([A-Z])', r' ', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

当然,这仅适用于 ASCII 字符,如果您想使用 Unicode,它是一种全新的蠕虫:-)

回答by David Underhill

If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):

如果您有首字母缩略词,您可能不希望它们之间有空格。这个两阶段正则表达式将保持首字母缩写词完整(并且还将标点符号和其他非大写字母视为添加空格的东西):

re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r' ', re_inner.sub(r' ', 'DaveIsAFKRightNow!Cool'))

The output will be: 'Dave Is AFK Right Now! Cool'

输出将是: 'Dave Is AFK Right Now! Cool'

回答by monkut

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

我同意正则表达式解决方案是最简单的,但我不会说它是最 Pythonic 的。

How about:

怎么样:

text = 'WordWordWord'
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += ' '

    new_text += letter

回答by Brian

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

我认为正则表达式是通往这里的方式,但只是为了提供一个纯 python 版本,而没有(希望)任何 ΤΖΩΤΖΙΟΥ 指出的问题:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

window() 是我用来操作项目的滑动窗口的实用函数,定义为:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()