pandas 类型错误:float() 参数必须是字符串或数字,而不是“函数”——Python/Sklearn

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46269795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:28:30  来源:igfitidea点击:

TypeError: float() argument must be a string or a number, not 'function' – Python/Sklearn

pythonpandasdataframescikit-learn

提问by HMLDude

I have the following code snippet from a program called Flights.py

我有一个名为 Flights.py 的程序的以下代码片段

...
#Load the Dataset
df = dataset
df.isnull().any()
df = df.fillna(lambda x: x.median())

# Define X and Y
X = df.iloc[:, 2:124].values
y = df.iloc[:, 136].values
X_tolist = X.tolist()

# Splitting the dataset into the Training set and Test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

The second to last line is throwing the following error:

倒数第二行抛出以下错误:

Traceback (most recent call last):

  File "<ipython-input-14-d4add2ccf5ab>", line 3, in <module>
    X_train = sc.fit_transform(X_train)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/base.py", line 494, in fit_transform
    return self.fit(X, **fit_params).transform(X)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 560, in fit
    return self.partial_fit(X, y)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/preprocessing/data.py", line 583, in partial_fit
    estimator=self, dtype=FLOAT_DTYPES)

  File "/Users/<username>/anaconda/lib/python3.6/site-packages/sklearn/utils/validation.py", line 382, in check_array
    array = np.array(array, dtype=dtype, order=order, copy=copy)

TypeError: float() argument must be a string or a number, not 'function'

My dataframe dfis of size (22587, 138)

我的数据df框大小 (22587, 138)

I was taking a look at the following question for inspiration:

我正在查看以下问题以获得灵感:

TypeError: float() argument must be a string or a number, not 'method' in Geocoder

类型错误:float() 参数必须是字符串或数字,而不是地理编码器中的“方法”

I tried the following adjustment:

我尝试了以下调整:

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train.as_matrix)
X_test = sc.transform(X_test.as_matrix)

Which resulted in the following error:

这导致了以下错误:

AttributeError: 'numpy.ndarray' object has no attribute 'as_matrix'

I'm currently at a loss for how to scan thru the dataframe and find/convert the offending entries.

我目前不知道如何通过数据帧扫描并查找/转换有问题的条目。

采纳答案by cs95

As this answerexplains, fillnaisn't designed to work with a callback. If you pass one, it will be taken as the literal fill value, meaning your NaNs will be replaced with lambdas:

正如这个答案所解释的那样,fillna不是设计用于回调。如果您传递一个,它将被视为文字填充值,这意味着您的NaNs 将被替换为 lambdas:

df

      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df4.fillna(lambda x: x.median())

                                    col1  col2  \
row1                                  65    24   
row2                                  33    48   
row3  <function <lambda> at 0x10bc47730>    34   
row4                                  24    12   

                                    col3                                col4  
row1                                  47  <function <lambda> at 0x10bc47730>  
row2  <function <lambda> at 0x10bc47730>                                  89  
row3                                  67  <function <lambda> at 0x10bc47730>  
row4                                  52                                  17 


If you are trying to fill by median, the solution would be to create a dataframe of medians based on the column, and pass that to fillna.

如果您尝试按中位数填充,解决方案是根据列创建中位数数据框,并将其传递给fillna.

df
      col1  col2  col3  col4
row1  65.0    24  47.0   NaN
row2  33.0    48   NaN  89.0
row3   NaN    34  67.0   NaN
row4  24.0    12  52.0  17.0

df.fillna(df.median())
df 
      col1  col2  col3  col4
row1  65.0    24  47.0  53.0
row2  33.0    48  52.0  89.0
row3  33.0    34  67.0  53.0
row4  24.0    12  52.0  17.0

回答by sol

I had the same troubles using df = df.fillna(lambda x: x.median())Here is my solution to get true values rather than 'function' into dataframe:

我在使用df = df.fillna(lambda x: x.median())这里时遇到了同样的问题,这是我获得真实值而不是“函数”到数据帧的解决方案:

# -*- coding: utf-8 -*-
import pandas as pd
import numpy as np

I create dataframe 10 lines, 3 colunms with nan

我用 nan 创建了 10 行、3 列的数据框

df = pd.DataFrame(np.random.randint(100,size=(10,3)))
df.iloc[3:5,0] = np.nan
df.iloc[4:6,1] = np.nan
df.iloc[5:8,2] = np.nan

Attribute stupid column labels for convenience afterward

为方便之后属性愚蠢的列标签

df.columns=['Number_of_Holy_Hand_Grenades_of_Antioch', 'Number_of_knight_fleeings', 'Number_of_rabbits_of_Caerbannog']

print df.isnull().any()  # tell if nan per column

For each Column through their labels, we fill all the nan value by median value computed on the column itself. Can be used with mean(), etc.

对于通过标签的每一列,我们用列本身计算的中值填充所有 nan 值。可以与 mean() 等一起使用。

for i in df.columns:     #df.columns[w:] if you have w column of line description 
    df[i] = df[i].fillna(df[i].median() )
print df.isnull().any()

Now df contains nan replaced by median value

现在 df 包含由中值替换的 nan

print df

you can do for example

你可以做例如

X = df.ix[:,:].values
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_std = scaler.fit_transform(X)

which doesn't work with df = df.fillna(lambda x: x.median())We can now use df into forward method because all values are true values, not function; contrary to method using lambda into dataframe.fillna() like e.g., all proposals using fillna combined to lambda

这不适用于df = df.fillna(lambda x: x.median())我们现在可以使用 df into forward 方法,因为所有值都是真值,而不是函数;与使用 lambda 到 dataframe.fillna() 的方法相反,例如,所有使用 fillna 的提案都结合到 lambda

回答by Mark Whitfield

df = df.fillna(lambda x: x.median())

This is not really a valid way of using fillna. It expects literal values here, or a mapping from column to literal values. It will not apply the function you've provided; instead the value of NA cells will simply be set to the function itself. This is the function that your estimator is attempting to turn into a float.

这并不是真正有效的使用fillna. 它需要这里的文字值,或从列到文字值的映射。它不会应用您提供的功能;相反,NA 单元格的值将简单地设置为函数本身。这是您的估算器试图将其转换为浮点数的函数。

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html