使用正则表达式将python中的大写重复字母替换为单个小写字母

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4145451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:26:55  来源:igfitidea点击:

Using a regular expression to replace upper case repeated letters in python with a single lowercase letter

pythonregexcapitalization

提问by ajt

I am trying to replace any instances of uppercase letters that repeat themselves twice in a string with a single instance of that letter in a lower case. I am using the following regular expression and it is able to match the repeated upper case letters, but I am unsure as how to make the letter that is being replaced lower case.

我试图用一个小写字母的单个实例替换在字符串中重复两次的大写字母的任何实例。我正在使用以下正则表达式,它能够匹配重复的大写字母,但我不确定如何使被替换的字母变为小写。

import re
s = 'start TT end'
re.sub(r'([A-Z]){2}', r"", s)
>>> 'start T end'

How can I make the "\1" lower case? Should I not be using a regular expression to do this?

如何使“\1”小写?我不应该使用正则表达式来做到这一点吗?

采纳答案by jensgram

Pass a functionas the replargument. The MatchObjectis passed to this function and .group(1)gives the first parenthesized subgroup:

传递一个函数作为repl参数。将MatchObject被传递给这个函数,并.group(1)给出了第一个括号分组:

import re
s = 'start TT end'
callback = lambda pat: pat.group(1).lower()
re.sub(r'([A-Z]){2}', callback, s)

EDIT
And yes, you should use ([A-Z])\1instead of ([A-Z]){2}in order to notmatch e.g. AZ. (See @bobince's answer.)

编辑
是的,您应该使用([A-Z])\1而不是([A-Z]){2}为了匹配例如AZ. (见@bobince 的回答。)

import re
s = 'start TT end'
re.sub(r'([A-Z])', lambda pat: pat.group(1).lower(), s) # Inline

Gives:

给出:

'start t end'

回答by Ignacio Vazquez-Abrams

You can do it with a regular expression, just pass a function as the replacement like the docssay. The problem is your pattern.

您可以使用正则表达式来完成,只需像文档所说的那样传递一个函数作为替换。问题是你的模式。

As it is, your pattern matches runs of anytwo capital letters. I'll leave the actual pattern to you, but it starts with AA|BB|CC|.

实际上,您的模式匹配任意两个大写字母的运行。我会把实际的模式留给你,但它以AA|BB|CC|.

回答by bobince

You can't change case in a replacement string. You would need a replacement function:

您不能在替换字符串中更改大小写。您需要一个替换功能:

>>> def replacement(match):
...     return match.group(1).lower()
... 
>>> re.sub(r'([A-Z])', replacement, 'start TT end')
'start t end'

回答by bgporter

The 'repl' parameter that identifies the replacement can be either a string (as you have it here) or a function. This will do what you wish:

标识替换的 'repl' 参数可以是字符串(如您在此处所见)或函数。这会做你想做的:

import re

def toLowercase(matchobj):
   return matchobj.group(1).lower()

s = 'start TT end'
re.sub(r'([A-Z]){2}', toLowercase, s)
>>> 'start t end'

回答by khachik

Try this:

尝试这个:

def tol(m):
   return m.group(0)[0].lower()

s = 'start TTT AAA end'
re.sub(r'([A-Z]){2,}', tol, s)

Note that this doesn't replace singe upper letters. If you want to do it, use r'([A-Z]){1,}'.

请注意,这不会替换单个大写字母。如果您想这样做,请使用r'([A-Z]){1,}'.

回答by Tony Veijalainen

WARNING! This post has no re as requested. Continue with your own responsibility!

警告!这篇文章没有按要求重新。继续自己的责任!

I do not know how possible are corner cases but this is how normal Python does my naive coding.

我不知道极端情况有多大可能,但这就是正常的 Python 进行我的天真编码的方式。

import string
s = 'start TT end AAA BBBBBBB'
for c in string.uppercase:
    s = s.replace(c+c,c.lower())
print s
""" Output:
start t end aA bbbB
"""