Python:使用正则表达式从所有行中删除空格
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3984539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: use regular expression to remove the white space from all lines
提问by user469652
^(\s+)only removes the whitespace from the first line. How do I remove the front whitespace from all the lines?
^(\s+)只删除第一行的空格。如何从所有行中删除前面的空格?
采纳答案by AndiDog
Python's regex module does not default to multi-line ^matching, so you need to specify that flag explicitly.
Python 的 regex 模块不默认为multi-line ^matching,因此您需要明确指定该标志。
r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"
# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)
re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)
# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"
It's also possible to include the flag inline to the pattern:
也可以将标志内联到模式中:
re.sub(r"(?m)^\s+", "", "a\n b\n c")
An easier solution is to avoid regular expressions because the original problem is very simple:
一个更简单的解决方案是避免使用正则表达式,因为原始问题非常简单:
content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'
回答by ghostdog74
you can try strip()if you want to remove front and back, or lstrip()if front
你可以试试strip()如果你想去掉正面和背面,或者lstrip()如果正面
>>> s=" string with front spaces and back "
>>> s.strip()
'string with front spaces and back'
>>> s.lstrip()
'string with front spaces and back '
for line in open("file"):
print line.lstrip()
If you really want to use regex
如果你真的想使用正则表达式
>>> import re
>>> re.sub("^\s+","",s) # remove the front
'string with front spaces and back '
>>> re.sub("\s+\Z","",s)
' string with front spaces and back' #remove the back
回答by Tony Veijalainen
nowhite = ''.join(mytext.split())
NO whitespace will remain like you asked (everything is put as one word). More useful usualy is to join everything with ' 'or '\n'to keep words separately.
没有空格会像您问的那样保留(所有内容都放在一个词中)。更有用的通常是将所有内容加入' '或'\n'单独保留单词。
回答by tzot
You'll have to use the re.MULTILINE option:
您必须使用 re.MULTILINE 选项:
re.sub("(?m)^\s+", "", text)
The "(?m)" part enables multiline.
“(?m)”部分启用多行。
回答by John Machin
@AndiDog acknowledges in his (currently accepted) answer that it munches consecutive newlines.
@AndiDog 在他的(目前接受的)回答中承认它会咀嚼连续的换行符。
Here's how to fix that deficiency, which is caused by the fact that \nis BOTH whitespace and a line separator. What we need to do is make an re class that includes only whitespace characters other than newline.
这是解决该缺陷的方法,该缺陷是由\n空格和行分隔符引起的。我们需要做的是创建一个只包含除换行符以外的空白字符的 re 类。
We want whitespace and not newline, which can't be expressed directly in an re class. Let's rewrite that as not not (whitespace and not newline)i.e. not(not whitespace or not not newline(thanks, Augustus) i.e. not(not whitespace or newline)i.e. [^\S\n]in renotation.
我们想要whitespace and not newline,不能直接在 re 类中表达。让我们把它改写为not not (whitespace and not newline)ie not(not whitespace or not not newline(谢谢,奥古斯都)ie not(not whitespace or newline)ie[^\S\n]用re符号表示。
So:
所以:
>>> re.sub(r"(?m)^[^\S\n]+", "", " a\n\n \n\n b\n c\nd e")
'a\n\n\n\nb\nc\nd e'
回答by Tim McNamara
You don't actually need regular expressions for this most of the time. If you are only looking to remove commonindentation across multiple lines, try the textwrapmodule:
大多数时候您实际上并不需要正则表达式。如果您只想删除多行中的常见缩进,请尝试以下textwrap模块:
>>> import textwrap
>>> messy_text = " grrr\n whitespace\n everywhere"
>>> print textwrap.dedent(messy_text)
grrr
whitespace
everywhere
Note that if the indentation is irregular, this will maintained:
请注意,如果缩进不规则,这将保持:
>>> very_messy_text = " grrr\n \twhitespace\n everywhere"
>>> print textwrap.dedent(very_messy_text)
grrr
whitespace
everywhere

