Python 如何使用模块re从数据帧列中删除特殊字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33257344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove special characers from a column of dataframe using module re?
提问by Rahul Shrivastava
Hey I have seen that link but nowhere there they have used re
module that's why I have posted here. Hope you understand and remove the duplicate.
嘿,我看到了那个链接,但他们没有在任何地方使用过re
模块,这就是我在这里发布的原因。希望您理解并删除重复项。
Here is the Link. I want to use re
module.
这是链接。我想使用re
模块。
Table:
桌子:
A B C D
1 Q! W@ 2
2 1$ E% 3
3 S2# D! 4
here I want to remove the special characters from column
B
and C
. I have used .transform()
but I want to do it using re
if possible but I am getting errors.
在这里,我想从column
B
和 中删除特殊字符C
。我已经使用过,.transform()
但re
如果可能的话,我想使用它,但我遇到了错误。
Output:
输出:
A B C D E F
1 Q! W@ 2 Q W
2 1$ E% 3 1 E
3 S2# D! 4 S2 D
My Code:
我的代码:
df['E'] = df['B'].str.translate(None, ",!.; -@!%^&*)(")
It's working only if I know what are the special characters.
只有当我知道什么是特殊字符时它才有效。
But I want to use re
which would be the best way.
但我想使用re
哪种方法最好。
import re
#re.sub(r'\W+', '', your_string)
df['E'] = re.sub(r'\W+', '', df['B'].str)
Here I am getting error:
在这里我收到错误:
TypeError: expected string or buffer
So how should I pass the value to get the correct output.
那么我应该如何传递值以获得正确的输出。
采纳答案by TigerhawkT3
As this answershows, you can use map()
with a lambda
function that will assemble and return any expression you like:
正如此答案所示,您可以使用map()
一个lambda
函数来组合并返回您喜欢的任何表达式:
df['E'] = df['B'].map(lambda x: re.sub(r'\W+', '', x))
lambda
simply defines anonymous functions. You can leave them anonymous, or assign them to a reference like any other object. my_function = lambda x: x.my_method(3)
is equivalent to def my_function(x): return x.my_method(3)
.
lambda
简单地定义匿名函数。您可以让它们匿名,或者像任何其他对象一样将它们分配给引用。my_function = lambda x: x.my_method(3)
相当于def my_function(x): return x.my_method(3)
。
回答by Amir Imani
A one liner without map
is:
一个没有的班轮map
是:
df['E'] = df['B'].str.replace('\W', '')