pandas Python - 输入包含 NaN、无穷大或对于 dtype('float64') 来说太大的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34358550/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:24:15  来源:igfitidea点击:

Python - Input contains NaN, infinity or a value too large for dtype('float64')

pythonpandasmachine-learningscikit-learnk-means

提问by Mitch

I am new on Python. I am trying to use sklearn.cluster. Here is my code:

我是 Python 新手。我正在尝试使用 sklearn.cluster。这是我的代码:

from sklearn.cluster import MiniBatchKMeans

kmeans=MiniBatchKMeans(n_clusters=2)
kmeans.fit(df)

But I get the following error:

但我收到以下错误:

     50             and not np.isfinite(X).all()):
     51         raise ValueError("Input contains NaN, infinity"
---> 52                          " or a value too large for %r." % X.dtype)

 ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I checked that the there is no Nan or infinity value. So there is only one option left. However, my data info tells me that all variables are float64, so I don't understand where the problem comes from.

我检查过没有 Nan 或无穷大值。所以只剩下一种选择了。但是,我的数据信息告诉我所有变量都是 float64,所以我不明白问题出在哪里。

df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 362358 entries, 135 to 4747145
Data columns (total 8 columns):
User         362358 non-null float64
Hour         362352 non-null float64
Minute       362352 non-null float64
Day          362352 non-null float64
Month        362352 non-null float64
Year         362352 non-null float64
Latitude     362352 non-null float64
Longitude    362352 non-null float64
dtypes: float64(8)
memory usage: 24.9 MB

Thanks a lot,

非常感谢,

回答by David Maust

By looking at your df.info(), it appears that there are 6 more non-null Users values than there are values of any other column. This would indicate that you have 6 nulls in each of the other columns, and that is the reason for the error.

通过查看您的df.info(),似乎比任何其他列的值多 6 个非空用户值。这表明您在其他每一列中都有 6 个空值,这就是错误的原因。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 362358 entries, 135 to 4747145
Data columns (total 8 columns):
User         362358 non-null float64
Hour         362352 non-null float64
Minute       362352 non-null float64
Day          362352 non-null float64
Month        362352 non-null float64
Year         362352 non-null float64
Latitude     362352 non-null float64
Longitude    362352 non-null float64
dtypes: float64(8)
memory usage: 24.9 MB

回答by Fabio Lamanna

I think that fit()accepts only "array-like, shape = [n_samples, n_features]", not pandas dataframes. So try to pass the values of the dataframe into it as:

我认为fit()只接受“类似数组,形状 = [n_samples, n_features]”,而不是 Pandas 数据帧。所以尝试将数据帧的值传递给它:

kmeans=MiniBatchKMeans(n_clusters=2)
kmeans.fit(df.values)

Or shape them in order to run the function correctly. Hope that helps.

或者塑造它们以正确运行功能。希望有帮助。

回答by Max Kleiner

By looking at your df.info(), it appears that there are 6 more non-null Users values than there are values of any other column. This would indicate that you have 6 nulls in each of the other columns, and that is the reason for the error.

通过查看您的 df.info(),似乎比任何其他列的值多 6 个非空用户值。这表明您在其他每一列中都有 6 个空值,这就是错误的原因。

So you can slice your data to the right fit with iloc():

因此,您可以使用 iloc() 将数据切片到合适的位置:

df = pd.read_csv(location1, encoding = "ISO-8859-1").iloc[2:20]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 2 to 19
Data columns (total 6 columns):
zip_code     18 non-null int64
latitude     18 non-null float64
longitude    18 non-null float64
city         18 non-null object
state        18 non-null object
county       18 non-null object
dtypes: float64(2), int64(1), object(3)