pandas 删除熊猫数据框中每一行的标点符号

Question

提问by RJL

I am new to python so this may be a very basic question. I am trying to use lambda to remove punctuation for each row in a pandas dataframe. I used the following, but received an error. I am trying to avoid having convert the df into a list then append the cleaned results into new list, then convert it back to a df.

我是 python 新手，所以这可能是一个非常基本的问题。我正在尝试使用 lambda 来删除 Pandas 数据框中每一行的标点符号。我使用了以下内容，但收到错误消息。我试图避免将 df 转换为列表，然后将清理后的结果附加到新列表中，然后将其转换回 df。

Any suggestions would be appreciated!

任何建议，将不胜感激！

import string

df['cleaned'] = df['old'].apply(lambda x: x.replace(c,'') for c in string.punctuation)

Answer 1

回答by mechanical_meat

You need to iterate over the string in the dataframe, not over string.punctuation. You also need to build the string back up using .join().

您需要遍历数据帧中的字符串，而不是遍历string.punctuation. 您还需要使用.join().

df['cleaned'] = df['old'].apply(lambda x:''.join([i for i in x 
                                                  if i not in string.punctuation]))

When lambda expressions get long like that it can be more readable to write out the function definition separately, e.g. (thanks to @AndyHayden for the optimization tips):

当 lambda 表达式变得如此长时，单独写出函数定义会更具可读性，例如（感谢@AndyHayden 的优化提示）：

def remove_punctuation(s):
    s = ''.join([i for i in s if i not in frozenset(string.punctuation)])
    return s

df['cleaned'] = df['old'].apply(remove_punctuation)

Answer 2

回答by Andy Hayden

Using a regex will most likely be faster here:

在这里使用正则表达式很可能会更快：

In [11]: RE_PUNCTUATION = '|'.join([re.escape(x) for x in string.punctuation])  # perhaps this is available in the re/regex library?

In [12]: s = pd.Series(["a..b", "c<=d", "e|}f"])

In [13]: s.str.replace(RE_PUNCTUATION, "")
Out[13]:
0    ab
1    cd
2    ef
dtype: object

pandas 删除熊猫数据框中每一行的标点符号

提问by RJL

回答by mechanical_meat

回答by Andy Hayden

相关推荐

最近更新

标签

pandas 删除熊猫数据框中每一行的标点符号

提问by RJL

回答by mechanical_meat

回答by Andy Hayden

相关推荐

使用 Pandas 导入多个 SQL 表

Pandas：如何过滤在数据框中出现多次的项目

从 pandas.DataFrame.to_sql 将 SQL 输出为字符串

pandas 有没有办法测试 SQLAlchemy 连接？

相关推荐

最近更新

标签