Python Pandas 替换特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23839465/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Replace Special Character
提问by user3221876
For some reason, I cannot get this simple statement to work on the ?. It seems to work on anything else but doesn't like that character. Any ideas?
出于某种原因,我无法让这个简单的语句在?. 它似乎适用于其他任何事情,但不喜欢那个角色。有任何想法吗?
DF['NAME']=DF['NAME'].str.replace("?","n")
Thanks
谢谢
回答by jdotjdot
I'm assuming you're using Python 2.x here and this is likely a Unicode problem. Don't worry, you're not alone--unicode is really tough in general and especially in Python 2, which is why it's been made standard in Python 3.
我假设您在这里使用 Python 2.x,这可能是 Unicode 问题。不用担心,您并不孤单——unicode 总体上非常困难,尤其是在 Python 2 中,这就是它在 Python 3 中成为标准的原因。
If all you're concerned about is the ?, you should decode in UTF-8, and then just replace the one character.
如果你只关心?,你应该用 UTF-8 解码,然后只替换一个字符。
That would look something like the following:
这将类似于以下内容:
DF['name'] = DF['name'].str.decode('utf-8').replace(u'\xf1', 'n')
As an example:
举个例子:
>>> "sure?o".decode("utf-8").replace(u"\xf1", "n")
u'sureno'
If your string is already Unicode, then you can (and actually have to) skip the decodestep:
如果您的字符串已经是 Unicode,那么您可以(实际上必须)跳过该decode步骤:
>>> u"sure?o".replace(u"\xf1", "n")
u'sureno'
Note here that u'\xf1'uses the hex escapefor the character in question.
请注意,此处对相关字符u'\xf1'使用了十六进制转义符。
Update
更新
I was informed in the comments that <>.str.replaceis a pandas series method, which I hadn't realized. The answer to this possibly might be something like the following:
我在评论中得知这<>.str.replace是一种Pandas系列方法,我没有意识到。对此的答案可能类似于以下内容:
DF['name'] = map(lambda x: x.decode('utf-8').replace(u'\xf1', 'n'), DF['name'].str)
or something along those lines, if that pandas object is iterable.
或者类似的东西,如果那个Pandas对象是可迭代的。
Another update
另一个更新
It actually just occurred to me that your issue may be as simple as the following:
实际上,我突然想到您的问题可能很简单,如下所示:
DF['NAME']=DF['NAME'].str.replace(u"?","n")
Note how I've added the uin front of the string to make it unicode.
请注意我是如何u在字符串前面添加的以使其成为 unicode 的。
回答by user8336233
You can use replace function with special character to be replaced with a different value of your choice in the following way.
您可以通过以下方式使用带有特殊字符的替换功能替换为您选择的不同值。
if your dataframe is df and you have to do it in all the columns that are string. in case of mine I am doing it for "\n"
如果您的数据框是 df 并且您必须在所有字符串列中执行此操作。在我的情况下,我是为“\n”做的
df= df.applymap(lambda x: x.replace("\n"," "))

