Python 正则表达式中的空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23224889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:32:25  来源:igfitidea点击:

whitespace in regular expression

pythonregex

提问by Tirtha

I have a question, can I say \tis equivalent to \s+in regular expression.? I have some lines of code :

我有一个问题,我可以说\t相当于\s+在正则表达式中吗?我有几行代码:

>>> b = '\tNadya Carson'
>>> c = re.compile(r'\s\s*')
>>> c
<_sre.SRE_Pattern object at 0x02729800>
>>> c.sub('',b)
'NadyaCarson'
>>> c = re.compile(r'\s\s+')
>>> c
<_sre.SRE_Pattern object at 0x027292F0>

There is pattern object till here but when I want to substitute with no space, it still shows \t instead of substituting it:

直到这里还有模式对象,但是当我想不带空格替换时,它仍然显示 \t 而不是替换它:

>>> c.sub('',b)
'\tNadya Carson'

Why is the attribute sub not working in this case.? Thank you.!

为什么在这种情况下属性 sub 不起作用。?谢谢。!

采纳答案by Rob Watts

\tis not equivalent to \s+, but \s+should match a tab (\t).

\t不等同于\s+,但\s+应匹配制表符 ( \t)。

The problem in your example is that the second pattern \s\s+is looking for twoor more whitespace characters, and \tis only one whitespace character.

您的示例中的问题是第二个模式\s\s+正在寻找两个或多个空白字符,并且\t只有一个空白字符。

Here are some examples that should help you understand:

以下是一些可以帮助您理解的示例:

>>> result = re.match(r'\s\s+', '\t')
>>> print result
None
>>> result = re.match(r'\s\s+', '\t\t')
>>> print result
<_sre.SRE_Match object at 0x10ff228b8>

\s\s+would also match ' \t', '\n\t', ' \n \t \t\n'.

\s\s+也会匹配' \t', '\n\t', ' \n \t \t\n'

Also, \s\s*is equivalent to \s+. Both will match one or more whitespace characters.

此外,\s\s*相当于\s+. 两者都将匹配一个或多个空白字符。

回答by CONvid19

can I say \t is equivalent to \s+ in regular expression.?

我可以说\t 等价于正则表达式中的\s+。?

No.

不。

\t

Match a tab character

匹配制表符

\s+

\s+

Matches a “whitespace character” (spaces, tabs, and line breaks) Between one and unlimited times, as many times as possible, giving back as needed (greedy) ?+?

匹配一个“空白字符”(空格、制表符和换行符)一次和无限次之间,尽可能多次,根据需要返回(贪婪)?+?

回答by Adam Smith

\s+is not equivalent to \tbecause \sdoes not mean <space>, but instead means <whitespace>. A literal space (sometimes four of which are used for tabs, depending on the application used to display them) is simply . That is, hitting the spacebar creates a literal space. That's hardly surprising.

\s+不等同于\t因为\s不意味着<space>,而是意味着<whitespace>。文字空间(有时其中四个用于选项卡,取决于用于显示它们的应用程序)只是. 也就是说,按空格键会创建一个文字空间。这并不奇怪。

\s\swill never match a \tbecause since \tIS whitespace, \smatches it. It will match \t\t, but that's because there's two characters of whitespace (both tab characters). When your regex runs \s\s+, it's looking for one character of whitespace followed by one, two, three, or really ANY number more. When it reads your regex it does this:

\s\s永远不会匹配 a\t因为因为\t是空白,所以\s匹配它。它会匹配\t\t,但那是因为有两个空格字符(都是制表符)。当您的正则表达式运行时\s\s+,它会查找一个空格字符,后跟一、二、三或任何数字。当它读取您的正则表达式时,它会执行以下操作:

\s\s+

Regular expression visualization

正则表达式可视化

Debuggex Demo

调试器演示

The \tmatches the first \s, but when it hits the second one your regex spits it back out saying "Oh, nope nevermind."

\t匹配的第一个\s,但是当它碰到第二个您正则表达式吐奶它背出来说:“哦,没了没关系。”

Your first regex does this:

你的第一个正则表达式这样做:

\s\s*

Regular expression visualization

正则表达式可视化

Debuggex Demo

调试器演示

Again, the \tmatches your first \s, and when the regex continues it sees that it doesn't match the second \s so it takes the "high road" instead and jumps over it. That's why \s\s*matches, because the *quantifier includes "or zero." while the +quantifier does not.

同样,\t匹配您的第一个\s,当正则表达式继续时,它发现它与第二个 \s 不匹配,因此它采用“高路”并跳过它。这就是\s\s*匹配的原因,因为*量词包含“或零”。而+量词没有。

回答by shubham khantwal

No way, \s+ says one or more white spaces BUT \t is one of the whitespace ocurring once.

没办法, \s+ 表示一个或多个空格,但 \t 是出现一次的空格之一。

So , \s+ contain \t but vice versa is not true.

所以,\s+ 包含\t,反之亦然。