Python 从字符串中删除特定的控制字符(\n \r \t)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4955452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Deleting specific control characters(\n \r \t) from a string
提问by Hossein
I have quite large amount of text which include control charachters like \n \t and \r. I need to replace them with a simple space--> " ". What is the fastest way to do this? Thanks
我有相当多的文本,其中包括 \n \t 和 \r 等控制字符。我需要用一个简单的空格替换它们-->“”。执行此操作的最快方法是什么?谢谢
采纳答案by Sven Marnach
I think the fastest way is to use str.translate():
我认为最快的方法是使用str.translate():
import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", " "))
prints
印刷
a b c d
EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate()is wayfaster than using regular expressions:
编辑:由于这再次变成了关于性能的讨论,这里有一些数字。对于长字符串,translate()是方式比使用正则表达式快:
s = "a\nb\rc\td " * 1250000
regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop
table = string.maketrans("\n\t\r", " ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop
That's about a factor 40.
这大约是 40 倍。
回答by Michal Chruszcz
You may also try regular expressions:
你也可以试试正则表达式:
import re
regex = re.compile(r'[\n\r\t]')
regex.sub(' ', my_str)
回答by Ignacio Vazquez-Abrams
>>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4')
'1 2 3 4'
回答by kurumi
using regex
使用正则表达式
re.sub(r'\s+', ' ', '1\n2\r3\t4')
without regex
没有正则表达式
>>> ' '.join('1\n\n2\r3\t4'.split())
'1 2 3 4'
>>>
回答by John Machin
If you want to normalise whitespace (replace runs of one or more whitespace characters by a single space, and strip leading and trailing whitespace) this can be accomplished by using string methods:
如果要规范化空格(用单个空格替换一个或多个空格字符的运行,并去除前导和尾随空格),可以使用字符串方法来完成:
>>> text = ' foo\tbar\r\nFred Nurke\t Joe Smith\n\n'
>>> ' '.join(text.split())
'foo bar Fred Nurke Joe Smith'
回答by Srikanth
's' is the string where you want to delete specific control characters. As strings are immutable in python, after substitute operation you need to assign it to another string.
's' 是要删除特定控制字符的字符串。由于字符串在python中是不可变的,因此在替换操作之后,您需要将其分配给另一个字符串。
s = re.sub(r'[\n\r\t]*', '', s)
s = re.sub(r'[\n\r\t]*', '', s)

