python 可以用正则表达式匹配字符重复吗?如何?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1023902/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:17:22  来源:igfitidea点击:

It is possible to match a character repetition with regex? How?

pythonregex

提问by Andrea Ambu

Question:
Is is possible, with regex, to match a word that contains the same character in different positions?

问题:
是否可以使用正则表达式匹配在不同位置包含相同字符的单词?

Condition:
All words have the same length, you know the character positions (example the 1st, the 2nd and the 4th) of the repeated char, but you don't know what is it.

条件:
所有单词的长度相同,您知道重复字符的字符位置(例如第 1 个、第 2 个和第 4 个),但您不知道它是什么。

Examples:
using lowercase 6char words I'd like to match words where the 3rd and the 4th chars are the same.

示例:
使用小写的 6char 单词我想匹配第 3 个和第 4 个字符相同的单词。

parrot <- match for double r
follia <- match for double l 
carrot <- match for double r
mattia <- match for double t
rettoo <- match for double t
melone <- doesn't match

I can't use the quantifier [\d]{2} because it match any succession of two chars, and what if I say the 2nd and the 4th position instead of 3rd and 4th?

我不能使用量词 [\d]{2} 因为它匹配任何连续的两个字符,如果我说第二个和第四个位置而不是第三个和第四个呢?

Is it possible to do what I want with regex? If yes, how can I do that?

是否可以用正则表达式做我想做的事?如果是,我该怎么做?

EDIT:
Ask asked in the comments, I'm using python

编辑:
在评论中提问,我正在使用 python

回答by Gumbo

You can use a backreference to do this:

您可以使用反向引用来执行此操作:

(.)

This will match consecutive occurrences of any character.

这将匹配任何字符的连续出现。



Edit???Here's some Python example:

编辑???这是一些 Python 示例:

import re

regexp = re.compile(r"(.)")
data = ["parrot","follia","carrot","mattia","rettoo","melone"]

for str in data:
    match = re.search(regexp, str)
    if match:
        print str, "<- match for double", match.group(1)
    else:
        print str, "<- doesn't match"

回答by Arvind

You need to use back references for such cases. I am not sure which language you are using, I tried the following example in my VI editor to search for any alphabet repeating. Pattern Regex:\([a-z]\)\1

对于这种情况,您需要使用反向引用。我不确定您使用的是哪种语言,我在我的 VI 编辑器中尝试了以下示例来搜索任何重复的字母表。 模式正则表达式:\([a-z]\)\1

If you see the example, [a-z] is the pattern you are searching for, and enclose that inside the paranthesis (the parantheses should be escaped in some languages). Once you have a paranthesis, it is a group and can be referred again anywhere in the regex by using \1. If there is more than one group, you can use \1, \2 etc. \1 will be replaced by whatever was matched in the first group.

如果你看到这个例子,[az] 是你正在搜索的模式,并将它括在括号内(在某些语言中,括号应该被转义)。一旦有了括号,它就是一个组,可以使用 \1 在正则表达式中的任何位置再次引用。如果有多个组,您可以使用 \1、\2 等。 \1 将被第一组中匹配的任何内容替换。

Thanks Arvind

谢谢阿文德

回答by Martijn Laarman

/(\b\w*?(\w)\2.*?\b)/

/(\b\w*?(\w)\2.*?\b)/

will match any word with atleast on character repetition $1 being the word $2 the first repetition.

将匹配任何单词,至少在字符重复 $1 是单词 $2 的第一次重复。

回答by SO User

Yes, you can use backreference construct to match the double letters.

是的,您可以使用反向引用构造来匹配双字母。

The regular expression (?<char>\w)\k<char>, using named groups and backreferencing, searches for adjacent paired characters. When applied to the string "I'll have a small coffee," it finds matches in the words "I'll", "small", and "coffee". The metacharacter \wfinds any single-word character. The grouping construct (?<char>)encloses the metacharacter to force the regular expression engine to remember a subexpression match (which, in this case, will be any single character) and save it under the name "char". The backreference construct \k<char>causes the engine to compare the current character to the previously matched character stored under "char". The entire regular expression successfully finds a match wherever a single character is the same as the preceding character.

正则表达式(?<char>\w)\k<char>使用命名组和反向引用搜索相邻的成对字符。当应用于字符串“I'll have a small coffee”时,它会在单词“I'll”、“small”和“coffee”中找到匹配项。元字符\w查找任何单字字符。分组构造(?<char>)包含元字符以强制正则表达式引擎记住子表达式匹配(在这种情况下,它将是任何单个字符)并将其保存在名称“char”下。反向引用结构\k<char>使引擎将当前字符与存储在“char”下的先前匹配的字符进行比较。