pandas Python + 数据框:AttributeError:'float' 对象没有属性 'replace'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/52596419/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python + dataframe : AttributeError: 'float' object has no attribute 'replace'
提问by Debbie
I am trying to write a function to do some text processing on the specified columns (description, event_name) of a Pandas dataframe. I wrote this code:
我正在尝试编写一个函数来对 Pandas 数据帧的指定列(描述、事件名称)进行一些文本处理。我写了这段代码:
#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column
def data_preprocessing(source):
return source.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
return source.lower()
return source.replace("\s\s+" , " ")
return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
return source.replace("\s\s+" , " ")
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
It is giving the following error:
它给出了以下错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
3
4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
5
~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2549 else:
2550 values = self.asobject
-> 2551 mapped = lib.map_infer(values, f, convert=convert_dtype)
2552
2553 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
data['description'] = data['description'].str.replace('\W+',' ')
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
3 def data_preprocessing(source):
4
----> 5 return source.replace('[^A-Za-z]',' ')
6 #data['description'] = data['description'].str.replace('\W+',' ')
7 source = source.lower()
AttributeError: 'float' object has no attribute 'replace'
If I write the code in following way, without function, it works perfectly:
如果我按以下方式编写代码,没有功能,它可以完美运行:
data['description'] = data['description'].str.replace('[^A-Za-z]',' ')
回答by Peter Leimbigler
Two things to fix:
需要解决的两件事:
First, when you apply
a lambda function to a pandas Series, the lambda function is applied to each elementof the Series. What I think you need is to apply your function to the entire Series in a vectorized manner.
首先,当您apply
将 lambda 函数应用于Pandas系列时,lambda 函数将应用于系列的每个元素。我认为您需要以矢量化方式将您的功能应用于整个系列。
Second, your function has multiple return statements. As a result, only the first statement, return source.replace('[^A-Za-z]',' ')
, will ever run. What you need to do is make your preprocessing changes on the variable source
inside your function, and finally return the modified source
(or an intermediate variable) at the very end.
其次,您的函数有多个返回语句。结果,只有第一个语句return source.replace('[^A-Za-z]',' ')
,将永远运行。您需要做的是对source
函数内部的变量进行预处理更改,最后在最后返回修改后的source
(或中间变量)。
To rewrite your function to operate on an entire pandas Series, replace every occurrence of source.
with source.str.
. The new function definition:
重写你的函数,在整个大Pandas系列操作,更换的每次出现source.
用source.str.
。新函数定义:
def data_preprocessing(source):
source = source.str.replace('[^A-Za-z]',' ')
#data['description'] = data['description'].str.replace('\W+',' ')
source = source.str.lower()
source = source.str.replace("\s\s+" , " ")
source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
source = source.str.replace("\s\s+" , " ")
return source
Then, instead of this:
然后,而不是这样:
data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
Try this:
尝试这个:
data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])