Python 从字符串中删除特定的控制字符(\n \r \t)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4955452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 18:19:52  来源:igfitidea点击:

Deleting specific control characters(\n \r \t) from a string

pythonstring

提问by Hossein

I have quite large amount of text which include control charachters like \n \t and \r. I need to replace them with a simple space--> " ". What is the fastest way to do this? Thanks

我有相当多的文本,其中包括 \n \t 和 \r 等控制字符。我需要用一个简单的空格替换它们-->“”。执行此操作的最快方法是什么?谢谢

采纳答案by Sven Marnach

I think the fastest way is to use str.translate():

我认为最快的方法是使用str.translate()

import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", "   "))

prints

印刷

a b c d

EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate()is wayfaster than using regular expressions:

编辑:由于这再次变成了关于性能的讨论,这里有一些数字。对于长字符串,translate()方式比使用正则表达式快:

s = "a\nb\rc\td " * 1250000

regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop

table = string.maketrans("\n\t\r", "   ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop

That's about a factor 40.

这大约是 40 倍。

回答by Michal Chruszcz

You may also try regular expressions:

你也可以试试正则表达式:

import re
regex = re.compile(r'[\n\r\t]')
regex.sub(' ', my_str)

回答by Ignacio Vazquez-Abrams

>>> re.sub(r'[\t\n\r]', ' ', '1\n2\r3\t4')
'1 2 3 4'

回答by kurumi

using regex

使用正则表达式

re.sub(r'\s+', ' ', '1\n2\r3\t4')

without regex

没有正则表达式

>>> ' '.join('1\n\n2\r3\t4'.split())
'1 2 3 4'
>>>

回答by John Machin

If you want to normalise whitespace (replace runs of one or more whitespace characters by a single space, and strip leading and trailing whitespace) this can be accomplished by using string methods:

如果要规范化空格(用单个空格替换一个或多个空格字符的运行,并去除前导和尾随空格),可以使用字符串方法来完成:

>>> text = '   foo\tbar\r\nFred  Nurke\t Joe Smith\n\n'
>>> ' '.join(text.split())
'foo bar Fred Nurke Joe Smith'

回答by Srikanth

's' is the string where you want to delete specific control characters. As strings are immutable in python, after substitute operation you need to assign it to another string.

's' 是要删除特定控制字符的字符串。由于字符串在python中是不可变的,因此在替换操作之后,您需要将其分配给另一个字符串。

s = re.sub(r'[\n\r\t]*', '', s)

s = re.sub(r'[\n\r\t]*', '', s)