仅 Python 正则表达式匹配空间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38162444/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regex match space only
提问by Dimitry
In python3, how do I match exactly whitespace character and not newline \n or tab \t?
在python3中,如何完全匹配空白字符而不是换行符\n或制表符\t?
I've seen the \s+[^\n]
answer from Regex match space not \nanswer, but for the following example it does not work:
我已经看到Regex 匹配空间的\s+[^\n]
答案 不是 \n答案,但对于以下示例,它不起作用:
a='rasd\nsa sd'
print(re.search(r'\s+[^ \n]',a))
Result is <_sre.SRE_Match object; span=(4, 6), match='\ns'>
, which is the newline matched.
结果是<_sre.SRE_Match object; span=(4, 6), match='\ns'>
,这是匹配的换行符。
回答by Resonance
No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means "match a space".
不需要特殊群体。只需创建一个带有空格字符的正则表达式。空格字符没有任何特殊含义,它只是表示“匹配一个空格”。
RE = re.compile(' +')
So for your case
所以对于你的情况
a='rasd\nsa sd'
print(re.search(' +', a))
would give
会给
<_sre.SRE_Match object; span=(7, 8), match=' '>
回答by Wiktor Stribi?ew
If you want to match 1 or more whitespace chars except the newline and a tab use
如果要匹配除换行符和制表符之外的 1 个或多个空白字符,请使用
r"[^\S\n\t]+"
The [^\S]
matches any char that is not a non-whitespace = any char that is whitespace. However, since the character class is a negated one, when you add characters to it they are excluded from matching.
的[^\S]
任何字符不是相匹配的非空白=任何炭是空格。但是,由于字符类是否定类,因此当您向其中添加字符时,它们会被排除在匹配之外。
import re
a='rasd\nsa sd'
print(re.findall(r'[^\S\n\t]+',a))
# => [' ']
Some more considerations: \s
matches [ \t\n\r\f\v]
if ASCII flag is used. So, if you plan to only match ASCII, you might as well use [ \r\f\v]
to exclude the chars you want. If you need to work with Unicode strings, the solution above is a viable one.
更多注意事项:如果使用 ASCII 标志,则\s
匹配[ \t\n\r\f\v]
。因此,如果您打算只匹配 ASCII,您不妨使用[ \r\f\v]
排除您想要的字符。如果您需要使用 Unicode 字符串,上面的解决方案是可行的。