Python 如何使用模块re从数据帧列中删除特殊字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33257344/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:06:18  来源:igfitidea点击:

How to remove special characers from a column of dataframe using module re?

pythonstringpandas

提问by Rahul Shrivastava

Hey I have seen that link but nowhere there they have used remodule that's why I have posted here. Hope you understand and remove the duplicate.

嘿,我看到了那个链接,但他们没有在任何地方使用过re模块,这就是我在这里发布的原因。希望您理解并删除重复项。

Here is the Link. I want to use remodule.

这是链接。我想使用re模块。

Table:

桌子:

A    B    C    D
1    Q!   W@   2
2    1$   E%   3
3    S2#  D!   4

here I want to remove the special characters from columnBand C. I have used .transform()but I want to do it using reif possible but I am getting errors.

在这里,我想从columnB和 中删除特殊字符C。我已经使用过,.transform()re如果可能的话,我想使用它,但我遇到了错误。

Output:

输出:

A    B    C    D   E   F
1    Q!   W@   2   Q   W
2    1$   E%   3   1   E
3    S2#  D!   4   S2  D

My Code:

我的代码:

df['E'] = df['B'].str.translate(None, ",!.; -@!%^&*)(")

It's working only if I know what are the special characters.

只有当我知道什么是特殊字符时它才有效。

But I want to use rewhich would be the best way.

但我想使用re哪种方法最好。

import re
#re.sub(r'\W+', '', your_string)
df['E'] = re.sub(r'\W+', '', df['B'].str)

Here I am getting error:

在这里我收到错误:

TypeError: expected string or buffer

So how should I pass the value to get the correct output.

那么我应该如何传递值以获得正确的输出。

采纳答案by TigerhawkT3

As this answershows, you can use map()with a lambdafunction that will assemble and return any expression you like:

如此答案所示,您可以使用map()一个lambda函数来组合并返回您喜欢的任何表达式:

df['E'] = df['B'].map(lambda x: re.sub(r'\W+', '', x))

lambdasimply defines anonymous functions. You can leave them anonymous, or assign them to a reference like any other object. my_function = lambda x: x.my_method(3)is equivalent to def my_function(x): return x.my_method(3).

lambda简单地定义匿名函数。您可以让它们匿名,或者像任何其他对象一样将它们分配给引用。my_function = lambda x: x.my_method(3)相当于def my_function(x): return x.my_method(3)

回答by Amir Imani

A one liner without mapis:

一个没有的班轮map是:

df['E'] = df['B'].str.replace('\W', '')