Python 删除熊猫中的标点符号

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39782418/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 22:42:40  来源:igfitidea点击:

Remove punctuations in pandas

pythonstringpandasreplace

提问by vikky

code: df['review'].head()
        index         review
output: 0      These flannel wipes are OK, but in my opinion

I want to remove punctuations from the column of the dataframe and create a new column.

我想从数据框的列中删除标点符号并创建一个新列。

code: import string 
      def remove_punctuations(text):
          return text.translate(None,string.punctuation)

      df["new_column"] = df['review'].apply(remove_punctuations)

Error:
  return text.translate(None,string.punctuation)
  AttributeError: 'float' object has no attribute 'translate'

I am using python 2.7. Any suggestions would be helpful.

我正在使用 python 2.7。任何的意见都将会有帮助。

回答by Bob Haffner

Using Pandas str.replaceand regex:

使用Pandas str.replace和正则表达式:

df["new_column"] = df['review'].str.replace('[^\w\s]','')

回答by David C

You can build a regex using the stringmodule's punctuation list:

您可以使用string模块的标点符号列表构建正则表达式:

df['review'].str.replace('[{}]'.format(string.punctuation), '')

回答by Arthur Gouveia

I solved the problem by looping through the string.punctuation

我通过循环遍历 string.punctuation 解决了这个问题

def remove_punctuations(text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
    return text

You can call the function the same way you did and It should work.

您可以按照相同的方式调用该函数,它应该可以工作。

df["new_column"] = df['review'].apply(remove_punctuations)