pandas Python + 数据框：AttributeError：'float' 对象没有属性 'replace'

Question

提问by Debbie

I am trying to write a function to do some text processing on the specified columns (description, event_name) of a Pandas dataframe. I wrote this code:

我正在尝试编写一个函数来对 Pandas 数据帧的指定列（描述、事件名称）进行一些文本处理。我写了这段代码：

#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column

def data_preprocessing(source):

    return source.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    return source.lower()
    return source.replace("\s\s+" , " ")
    return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
    return source.replace("\s\s+" , " ")

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

It is giving the following error:

它给出了以下错误：

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      3 
      4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
      5 

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      data['description'] = data['description'].str.replace('\W+',' ')    
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
      3 def data_preprocessing(source):
      4 
----> 5     return source.replace('[^A-Za-z]',' ')
      6     #data['description'] = data['description'].str.replace('\W+',' ')
      7     source = source.lower()

AttributeError: 'float' object has no attribute 'replace'

If I write the code in following way, without function, it works perfectly:

如果我按以下方式编写代码，没有功能，它可以完美运行：

data['description'] = data['description'].str.replace('[^A-Za-z]',' ')

Answer 1

回答by Peter Leimbigler

Two things to fix:

需要解决的两件事：

First, when you applya lambda function to a pandas Series, the lambda function is applied to each elementof the Series. What I think you need is to apply your function to the entire Series in a vectorized manner.

首先，当您apply将 lambda 函数应用于Pandas系列时，lambda 函数将应用于系列的每个元素。我认为您需要以矢量化方式将您的功能应用于整个系列。

Second, your function has multiple return statements. As a result, only the first statement, return source.replace('[^A-Za-z]',' '), will ever run. What you need to do is make your preprocessing changes on the variable sourceinside your function, and finally return the modified source(or an intermediate variable) at the very end.

其次，您的函数有多个返回语句。结果，只有第一个语句return source.replace('[^A-Za-z]',' ')，将永远运行。您需要做的是对source函数内部的变量进行预处理更改，最后在最后返回修改后的source（或中间变量）。

To rewrite your function to operate on an entire pandas Series, replace every occurrence of source.with source.str.. The new function definition:

重写你的函数，在整个大Pandas系列操作，更换的每次出现source.用source.str.。新函数定义：

def data_preprocessing(source):
    source = source.str.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    source = source.str.lower()
    source = source.str.replace("\s\s+" , " ")
    source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
    source = source.str.replace("\s\s+" , " ")
    return source

Then, instead of this:

然后，而不是这样：

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

Try this:

尝试这个：

data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])

pandas Python + 数据框：AttributeError：'float' 对象没有属性 'replace'

提问by Debbie

回答by Peter Leimbigler

相关推荐

最近更新

标签

pandas Python + 数据框：AttributeError：'float' 对象没有属性 'replace'

提问by Debbie

回答by Peter Leimbigler

相关推荐

如何制作从大型 xlsx 文件加载 Pandas DataFrame 的进度条？

pandas 基于公共列合并多个数据框

Pandas：根据其他列值有条件地替换值

pandas Python 中的偏相关

相关推荐

最近更新

标签