如何从python中的字符串中删除这个\xa0?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26068832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove this \xa0 from a string in python?
提问by slopeofhope
I have the following string:
我有以下字符串:
word = u'Buffalo,\xa0IL\xa060625'
I don't want the "\xa0" in there. How can I get rid of it? The string I want is:
我不想要“\xa0”在那里。我怎样才能摆脱它?我想要的字符串是:
word = 'Buffalo, IL 06025
采纳答案by mgilson
If you know for sure that is the only character you don't want, you can .replaceit:
如果您确定这是唯一不想要的角色,您可以.replace:
>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'
If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:
如果您需要处理所有非 ascii 字符,编码和替换坏字符可能是一个好的开始...:
>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'
回答by khelwood
This seems to work for getting rid of non-ascii characters:
这似乎适用于摆脱非 ascii 字符:
fixedword = word.encode('ascii','ignore')
回答by abarnert
There is no \xathere. If you try to put that into a string literal, you're going to get a syntax error if you're lucky, or it's going to swallow up the next attempted character if you're not, because \xsequences aways have to be followed by two hexadecimal digits.
那里没有\xa。如果您尝试将其放入字符串文字中,那么幸运的话您将得到一个语法错误,否则它将吞掉下一个尝试的字符,因为\x必须在序列离开之后两个十六进制数字。
What you have is \xa0, which is an escape sequence for the character U+00A0, aka "NO-BREAK SPACE".
您拥有的是\xa0,这是字符U+00A0的转义序列,又名“NO-BREAK SPACE”。
I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:
我认为您想用空格替换它们,但是无论您想做什么都很容易编写:
word.replace(u'\xa0', u' ') # replaced with space
word.replace(u'\xa0', u'0') # closest to what you were literally asking for
word.replace(u'\xa0', u'') # removed completely
回答by Mark Ransom
The most robust way would be to use the unidecodemoduleto convert all non-ASCII characters to their closest ASCIIequivalent automatically.
最可靠的方法是使用该unidecode模块将所有非 ASCII 字符自动转换为最接近的ASCII 字符。
The character \xa0(not \xaas you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.
这个字符\xa0(不是\xa你说的)是一个NO-BREAK SPACE,最接近的 ASCII 等价物当然是一个普通的空格。
import unidecode
word = unidecode.unidecode(word)
回答by Amir Imani
You can easily use unicodedatato get rid of all of \x...characters.
您可以轻松地使用unicodedata来摆脱所有\x...字符。
from unicodedata import normalize
normalize('NFKD', word)
>>> 'Buffalo, IL 60625'

