如何从python中的字符串中删除这个\xa0?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26068832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:00:29  来源:igfitidea点击:

How to remove this \xa0 from a string in python?

pythonunicode

提问by slopeofhope

I have the following string:

我有以下字符串:

 word = u'Buffalo,\xa0IL\xa060625'

I don't want the "\xa0" in there. How can I get rid of it? The string I want is:

我不想要“\xa0”在那里。我怎样才能摆脱它?我想要的字符串是:

word = 'Buffalo, IL 06025

采纳答案by mgilson

If you know for sure that is the only character you don't want, you can .replaceit:

如果您确定这是唯一不想要的角色,您可以.replace

>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'

If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:

如果您需要处理所有非 ascii 字符,编码和替换坏字符可能是一个好的开始...:

>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'

回答by khelwood

This seems to work for getting rid of non-ascii characters:

这似乎适用于摆脱非 ascii 字符:

fixedword = word.encode('ascii','ignore')

回答by abarnert

There is no \xathere. If you try to put that into a string literal, you're going to get a syntax error if you're lucky, or it's going to swallow up the next attempted character if you're not, because \xsequences aways have to be followed by two hexadecimal digits.

那里没有\xa。如果您尝试将其放入字符串文字中,那么幸运的话您将得到一个语法错误,否则它将吞掉下一个尝试的字符,因为\x必须在序列离开之后两个十六进制数字。

What you have is \xa0, which is an escape sequence for the character U+00A0, aka "NO-BREAK SPACE".

您拥有的是\xa0,这是字符U+00A0的转义序列,又名“NO-BREAK SPACE”。

I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:

我认为您想用空格替换它们,但是无论您想做什么都很容易编写:

word.replace(u'\xa0', u' ') # replaced with space
word.replace(u'\xa0', u'0') # closest to what you were literally asking for
word.replace(u'\xa0', u'')  # removed completely

回答by Mark Ransom

The most robust way would be to use the unidecodemoduleto convert all non-ASCII characters to their closest ASCIIequivalent automatically.

最可靠的方法是使用该unidecode模块将所有非 ASCII 字符自动转换为最接近的ASCII 字符

The character \xa0(not \xaas you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.

这个字符\xa0(不是\xa你说的)是一个NO-BREAK SPACE,最接近的 ASCII 等价物当然是一个普通的空格。

import unidecode
word = unidecode.unidecode(word)

回答by Amir Imani

You can easily use unicodedatato get rid of all of \x...characters.

您可以轻松地使用unicodedata来摆脱所有\x...字符。

from unicodedata import normalize
normalize('NFKD', word)
>>> 'Buffalo, IL 60625'