从 Python 中的字符串中删除特定字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3939361/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 13:25:56  来源:igfitidea点击:

Remove specific characters from a string in Python

pythonstringimmutability

提问by Matt Phillips

I'm trying to remove specific characters from a string using Python. This is the code I'm using right now. Unfortunately it appears to do nothing to the string.

我正在尝试使用 Python 从字符串中删除特定字符。这是我现在正在使用的代码。不幸的是,它似乎对字符串没有任何作用。

for char in line:
    if char in " ?.!/;:":
        line.replace(char,'')

How do I do this properly?

我该如何正确执行此操作?

采纳答案by intuited

Strings in Python are immutable(can't be changed). Because of this, the effect of line.replace(...)is just to create a new string, rather than changing the old one. You need to rebind(assign) it to linein order to have that variable take the new value, with those characters removed.

Python 中的字符串是不可变的(不能更改)。正因为如此, 的效果line.replace(...)只是创建一个新字符串,而不是更改旧字符串。您需要重新绑定(分配)它line,以便让该变量采用新值,并删除这些字符。

Also, the way you are doing it is going to be kind of slow, relatively. It's also likely to be a bit confusing to experienced pythonators, who will see a doubly-nested structure and think for a moment that something more complicated is going on.

此外,相对而言,您这样做的方式会有点慢。对于有经验的 Pythonators 来说,这也可能会有点混乱,他们会看到一个双嵌套结构,并会认为有更复杂的事情正在发生。

Starting in Python 2.6 and newer Python 2.x versions *, you can instead use str.translate, (but read on for Python 3 differences):

从 Python 2.6 和更新的 Python 2.x 版本 * 开始,您可以改为使用str.translate, (但请继续阅读 Python 3 的差异):

line = line.translate(None, '!@#$')

or regular expression replacement with re.sub

或正则表达式替换为 re.sub

import re
line = re.sub('[!@#$]', '', line)

The characters enclosed in brackets constitute a character class. Any characters in linewhich are in that class are replaced with the second parameter to sub: an empty string.

括号中的字符构成一个字符类line该类中的任何字符都将替换为第二个参数sub:一个空字符串。

In Python 3, strings are Unicode. You'll have to translate a little differently. kevpie mentions this in a commenton one of the answers, and it's noted in the documentation for str.translate.

在 Python 3 中,字符串是 Unicode。您将不得不以稍微不同的方式翻译。kevpie提到这在评论的答案之一,这是在提到的文档str.translate

When calling the translatemethod of a Unicode string, you cannot pass the second parameter that we used above. You also can't pass Noneas the first parameter. Instead, you pass a translation table (usually a dictionary) as the only parameter. This table maps the ordinal valuesof characters (i.e. the result of calling ordon them) to the ordinal values of the characters which should replace them, or—usefully to us—Noneto indicate that they should be deleted.

在调用translateUnicode 字符串的方法时,不能传递我们上面使用的第二个参数。您也不能None作为第一个参数传递。相反,您将转换表(通常是字典)作为唯一参数传递。该表将字符的序数值(即调用ord它们的结果)映射到应该替换它们的字符的序数值,或者——对我们有用None——指示它们应该被删除。

So to do the above dance with a Unicode string you would call something like

因此,要使用 Unicode 字符串进行上述舞蹈,您可以调用类似

translation_table = dict.fromkeys(map(ord, '!@#$'), None)
unicode_line = unicode_line.translate(translation_table)

Here dict.fromkeysand mapare used to succinctly generate a dictionary containing

这里dict.fromkeysmap用于简洁地生成一个字典,其中包含

{ord('!'): None, ord('@'): None, ...}

Even simpler, as another answer puts it, create the translation table in place:

更简单的是,正如另一个答案所说,在适当的位置创建翻译表:

unicode_line = unicode_line.translate({ord(c): None for c in '!@#$'})

Or create the same translation table with str.maketrans:

或者使用以下命令创建相同的转换表str.maketrans

unicode_line = unicode_line.translate(str.maketrans('', '', '!@#$'))


* for compatibility with earlier Pythons, you can create a "null" translation table to pass in place of None:

* 为了与早期的 Python 兼容,您可以创建一个“空”转换表来代替None

import string
line = line.translate(string.maketrans('', ''), '!@#$')

Here string.maketransis used to create a translation table, which is just a string containing the characters with ordinal values 0 to 255.

这里string.maketrans用于创建一个转换表,它只是一个字符串,其中包含序数值为 0 到 255 的字符。

回答by Greg Hewgill

Strings are immutable in Python. The replacemethod returns a new string after the replacement. Try:

字符串在 Python 中是不可变的。该replace方法在替换后返回一个新字符串。尝试:

for char in line:
    if char in " ?.!/;:":
        line = line.replace(char,'')

回答by Muhammad Alkarouri

line = line.translate(None, " ?.!/;:")

回答by ghostdog74

>>> line = "abc#@!?efg12;:?"
>>> ''.join( c for c in line if  c not in '?:!/;' )
'abc#@efg12'

回答by gsbabil

Am I missing the point here, or is it just the following:

我在这里错过了这一点,还是只是以下几点:

string = "ab1cd1ef"
string = string.replace("1","") 

print string
# result: "abcdef"

Put it in a loop:

把它放在一个循环中:

a = "a!b@c#d$"
b = "!@#$"
for char in b:
    a = a.replace(char,"")

print a
# result: "abcd"

回答by mgold

The asker almost had it. Like most things in Python, the answer is simpler than you think.

提问者几乎得到了它。与 Python 中的大多数事情一样,答案比您想象的要简单。

>>> line = "H E?.LL!/;O:: "  
>>> for char in ' ?.!/;:':  
...  line = line.replace(char,'')  
...
>>> print line
HELLO

You don't have to do the nested if/for loop thing, but you DO need to check each character individually.

您不必执行嵌套的 if/for 循环操作,但确实需要单独检查每个字符。

回答by cod3monk3y

For the inverse requirement of only allowingcertain charactersin a string, you can use regular expressions with a set complement operator [^ABCabc]. For example, to remove everything except ascii letters, digits, and the hyphen:

对于允许字符串中的某些字符的相反要求,您可以使用带有集合补码运算符的正则表达式[^ABCabc]。例如,要删除除 ascii 字母、数字和连字符以外的所有内容:

>>> import string
>>> import re
>>>
>>> phrase = '  There were "nine" (9) chick-peas in my pocket!!!      '
>>> allow = string.letters + string.digits + '-'
>>> re.sub('[^%s]' % allow, '', phrase)

'Therewerenine9chick-peasinmypocket'

From the python regular expression documentation:

来自python 正则表达式文档

Characters that are not within a range can be matched by complementing the set. If the first character of the set is '^', all the characters that are not in the set will be matched. For example, [^5]will match any character except '5', and [^^]will match any character except '^'. ^has no special meaning if it's not the first character in the set.

不在范围内的字符可以通过对集合进行补充来匹配。如果集合的第一个字符是'^',则匹配所有不在集合中的字符。例如,[^5]将匹配除 '5' 之外的任何字符,[^^]并将匹配除 '^'. ^如果它不是集合中的第一个字符,则没有特殊含义。

回答by pkm

#!/usr/bin/python
import re

strs = "how^ much for{} the maple syrup? .99? That's[] ricidulous!!!"
print strs
nstr = re.sub(r'[?|$|.|!|a|b]',r' ',strs)#i have taken special character to remove but any #character can be added here
print nstr
nestr = re.sub(r'[^a-zA-Z0-9 ]',r'',nstr)#for removing special character
print nestr

回答by Wariat

How about this:

这个怎么样:

def text_cleanup(text):
    new = ""
    for i in text:
        if i not in " ?.!/;:":
            new += i
    return new

回答by Sadheesh

Below one.. with out using regular expression concept..

下面一个..不使用正则表达式概念..

ipstring ="text with symbols!@#$^&*( ends here"
opstring=''
for i in ipstring:
    if i.isalnum()==1 or i==' ':
        opstring+=i
    pass
print opstring