Python 正则表达式 - 为什么字符串结尾（$ 和 \Z）不适用于组表达式？

Question

提问by Piotr Migdal

In Python 2.6. it seems that markers of the end of string $and \Zare not compatible with group expressions. Fo example

在 Python 2.6 中。似乎字符串末尾的那个标记$和\Z不符合组表达式兼容。例如

import re
re.findall("\w+[\s$]", "green pears")

returns

返回

['green ']

(so $effectively does not work). And using

（所以$有效地不起作用）。并使用

re.findall("\w+[\s\Z]", "green pears")

results in an error:

导致错误：

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in findall(pattern, string, flags)
    175 
    176     Empty matches are included in the result."""
--> 177     return _compile(pattern, flags).findall(string)
    178 
    179 if sys.hexversion >= 0x02020000:

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/re.pyc in _compile(*key)
    243         p = sre_compile.compile(pattern, flags)
    244     except error, v:
--> 245         raise error, v # invalid expression
    246     if len(_cache) >= _MAXCACHE:
    247         _cache.clear()

error: internal: unsupported set operator

Why does it work that way and how to go around?

为什么它以这种方式工作以及如何解决？

Answer 1

采纳答案by Martijn Pieters

A [..]expression is a character group, meaning it'll match any one character contained therein. You are thus matching a literal $character. A character group always applies to one input character, and thus can never contain an anchor.

甲[..]表达式是一个字符组，这意味着它会匹配任何一个字符包含在其中。因此，您正在匹配文字$字符。字符组始终适用于一个输入字符，因此永远不能包含锚点。

If you wanted to match either a whitespace character orthe end of the string, use a non-capturing group instead, combined with the |or selector:

如果您想匹配空白字符或字符串的结尾，请改用非捕获组，并结合|or 选择器：

r"\w+(?:\s|$)"

Alternatively, look at the \bword boundary anchor. It'll match anywhere a \wgroup start or ends (so it anchors to points in the text where a \wcharacter is preceded or followed by a \Wcharacter, or is at the start or end of the string).

或者，查看\b单词边界锚点。它将匹配\w组开始或结束的任何地方（因此它锚定到文本中\w字符前面或后面的\W字符，或者在字符串的开头或结尾处）。

Answer 2

回答by BrenBarn

Square brackets don't indicate a group, they indicate a character set, which matches onecharacter (any one of those in the brackets) As documented, "special characters lose their special meaning inside sets" (except where indicated otherwise as with classes like \s).

方括号不表示一个组，它们表示一个字符集，它匹配一个字符（括号中的任何一个）如文档所述，“特殊字符在集合内失去其特殊含义”（除非另有说明，如类\s）。

If you want to match \sor end of string, use something like \s|$.

如果要匹配\s或结束字符串，请使用类似\s|$.

Answer 3

回答by Junji Zhi

Martijn Pieters' answer is correct. To elaborate a bit, if you use capturing groups

Martijn Pieters 的回答是正确的。详细说明一下，如果您使用捕获组

r"\w+(\s|$)"

you get:

你得到：

>>> re.findall("\w+(\s|$)", "green pears")
[' ', '']

That's because re.findall()returns the captured group (\s|$)values.

那是因为re.findall()返回捕获的组(\s|$)值。

Parentheses ()are used for two purposes: character groups and captured groups. To disable captured groups but still act as character groups, use (?:...)syntax:

括号()有两个用途：字符组和捕获组。要禁用捕获的组但仍充当字符组，请使用(?:...)语法：

>>> re.findall("\w+(?:\s|$)", "green pears")
['green ', 'pears']

Python 正则表达式 - 为什么字符串结尾（$ 和 \Z）不适用于组表达式？

提问by Piotr Migdal

采纳答案by Martijn Pieters

回答by BrenBarn

回答by Junji Zhi

相关推荐

最近更新

标签

Python 正则表达式 - 为什么字符串结尾（$ 和 \Z）不适用于组表达式？

提问by Piotr Migdal

采纳答案by Martijn Pieters

回答by BrenBarn

回答by Junji Zhi

相关推荐

Python 如何在 Windows 控制台中显示 utf-8

Python 如何使用请求和 JSON 打印变量

Python 插入排序是如何工作的？

使用简单的对话框在 Python 中选择文件

相关推荐

最近更新

标签