pandas 类型错误:float() 参数必须是字符串或数字,而不是“函数”——Python/Sklearn
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46269795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
TypeError: float() argument must be a string or a number, not 'function' – Python/Sklearn
提问by HMLDude
I have the following code snippet from a program called Flights.py
我有一个名为 Flights.py 的程序的以下代码片段
...
#Load the Dataset
df = dataset
df.isnull().any()
df = df.fillna(lambda x: x.median())
# Define X and Y
X = df.iloc[:, 2:124].values
y = df.iloc[:, 136].values
X_tolist = X.tolist()
# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
The second to last line is throwing the following error:
倒数第二行抛出以下错误:
Traceback (most recent call last):
File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
X_train = sc.fit_transform(X_train)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
return self.partial_fit(X, y)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
estimator=self, dtype=FLOAT_DTYPES)
File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
TypeError: float() argument must be a string or a number, not 'function'
My dataframe df
is of size (22587, 138)
我的数据df
框大小 (22587, 138)
I was taking a look at the following question for inspiration:
我正在查看以下问题以获得灵感:
TypeError: float() argument must be a string or a number, not 'method' in Geocoder
类型错误:float() 参数必须是字符串或数字,而不是地理编码器中的“方法”
I tried the following adjustment:
我尝试了以下调整:
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train.as_matrix)
X_test = sc.transform(X_test.as_matrix)
Which resulted in the following error:
这导致了以下错误:
AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'
I'm currently at a loss for how to scan thru the dataframe and find/convert the offending entries.
我目前不知道如何通过数据帧扫描并查找/转换有问题的条目。
采纳答案by cs95
As this answerexplains, fillna
isn't designed to work with a callback. If you pass one, it will be taken as the literal fill value, meaning your NaN
s will be replaced with lambdas:
正如这个答案所解释的那样,fillna
不是设计用于回调。如果您传递一个,它将被视为文字填充值,这意味着您的NaN
s 将被替换为 lambdas:
df
col1 col2 col3 col4
row1 65.0 24 47.0 NaN
row2 33.0 48 NaN 89.0
row3 NaN 34 67.0 NaN
row4 24.0 12 52.0 17.0
df4.fillna(lambda x: x.median())
col1 col2 \
row1 65 24
row2 33 48
row3 <function <lambda> at 0x10bc47730> 34
row4 24 12
col3 col4
row1 47 <function <lambda> at 0x10bc47730>
row2 <function <lambda> at 0x10bc47730> 89
row3 67 <function <lambda> at 0x10bc47730>
row4 52 17
If you are trying to fill by median, the solution would be to create a dataframe of medians based on the column, and pass that to fillna
.
如果您尝试按中位数填充,解决方案是根据列创建中位数数据框,并将其传递给fillna
.
df
col1 col2 col3 col4
row1 65.0 24 47.0 NaN
row2 33.0 48 NaN 89.0
row3 NaN 34 67.0 NaN
row4 24.0 12 52.0 17.0
df.fillna(df.median())
df
col1 col2 col3 col4
row1 65.0 24 47.0 53.0
row2 33.0 48 52.0 89.0
row3 33.0 34 67.0 53.0
row4 24.0 12 52.0 17.0
回答by sol
I had the same troubles using df = df.fillna(lambda x: x.median())
Here is my solution to get true values rather than 'function' into dataframe:
我在使用df = df.fillna(lambda x: x.median())
这里时遇到了同样的问题,这是我获得真实值而不是“函数”到数据帧的解决方案:
# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np
I create dataframe 10 lines, 3 colunms with nan
我用 nan 创建了 10 行、3 列的数据框
df = pd.DataFrame(np.random.randint(100,size=(10,3)))
df.iloc[3:5,0] = np.nan
df.iloc[4:6,1] = np.nan
df.iloc[5:8,2] = np.nan
Attribute stupid column labels for convenience afterward
为方便之后属性愚蠢的列标签
df.columns=['Number_of_Holy_Hand_Grenades_of_Antioch', 'Number_of_knight_fleeings', 'Number_of_rabbits_of_Caerbannog']
print df.isnull().any() # tell if nan per column
For each Column through their labels, we fill all the nan value by median value computed on the column itself. Can be used with mean(), etc.
对于通过标签的每一列,我们用列本身计算的中值填充所有 nan 值。可以与 mean() 等一起使用。
for i in df.columns: #df.columns[w:] if you have w column of line description
df[i] = df[i].fillna(df[i].median() )
print df.isnull().any()
Now df contains nan replaced by median value
现在 df 包含由中值替换的 nan
print df
you can do for example
你可以做例如
X = df.ix[:,:].values
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
which doesn't work with df = df.fillna(lambda x: x.median())
We can now use df into forward method because all values are true values, not function; contrary to method using lambda into dataframe.fillna() like e.g., all proposals using fillna combined to lambda
这不适用于df = df.fillna(lambda x: x.median())
我们现在可以使用 df into forward 方法,因为所有值都是真值,而不是函数;与使用 lambda 到 dataframe.fillna() 的方法相反,例如,所有使用 fillna 的提案都结合到 lambda
回答by Mark Whitfield
df = df.fillna(lambda x: x.median())
This is not really a valid way of using fillna
. It expects literal values here, or a mapping from column to literal values. It will not apply the function you've provided; instead the value of NA cells will simply be set to the function itself. This is the function that your estimator is attempting to turn into a float.
这并不是真正有效的使用fillna
. 它需要这里的文字值,或从列到文字值的映射。它不会应用您提供的功能;相反,NA 单元格的值将简单地设置为函数本身。这是您的估算器试图将其转换为浮点数的函数。
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html