Pandas 系列到 numpy 数组转换错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32406265/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:51:29  来源:igfitidea点击:

Pandas series to numpy array conversion error

pythonnumpypandas

提问by user308827

I have a pandas series with foll. value_countsoutput():

我有一个带有 foll 的Pandas系列。value_counts输出():

NaN     2741
 197    1891
 127     188
 194      42
 195      24
 122      21

When I perform describe() on this series, I get:

当我在这个系列上执行 describe() 时,我得到:

df[col_name].describe()
count    2738.000000
mean      172.182250
std        47.387496
min         0.000000
25%       171.250000
50%       197.000000
75%       197.000000
max       197.000000
Name: SS_D_1, dtype: float64

However, if I try to find minimum and maximum, I get nan as answer:

但是,如果我试图找到最小值和最大值,我会得到 nan 作为答案:

numpy.min(df[col_name].values)
nan

Also, when I try t convert it to a numpy array, I seem to get an array with only nan's

另外,当我尝试将其转换为 numpy 数组时,我似乎得到了一个只有 nan 的数组

numpy.array(df[col_name])

Any suggestion on how to convert from pandas series to numpy array succesfully

关于如何成功地从 Pandas 系列转换为 numpy 数组的任何建议

回答by ali_m

Both the function np.minand the method np.ndarray.minwill always return NaN for any array that contains one or more NaN values (this is standard IEE754 floating point behaviour).

对于包含一个或多个 NaN 值的任何数组,函数np.min和方法np.ndarray.min都将始终返回 NaN(这是标准的 IEE754 浮点行为)。

You could use np.nanmin, which ignores NaN values when computing the min, e.g.:

您可以使用np.nanmin,它在计算最小值时会忽略 NaN 值,例如:

np.nanmin(df[col_name].values)

An even simpler option is just to use the pd.Series.min()method, which already ignores NaN values, i.e.:

一个更简单的选择是使用pd.Series.min()已经忽略 NaN 值的方法,即:

df[col_name].min()

I have no idea why numpy.array(df[col_name])would return an array containing only NaNs, unless df[col_name]already contained only NaNs to begin with. I assume this must be due to some other mistake in your code.

我不知道为什么numpy.array(df[col_name])会返回一个只包含 NaN 的数组,除非一df[col_name]开始就只包含 NaN。我认为这一定是由于您的代码中的其他错误造成的。

回答by mork

As of pandas' v 0.24.0 - you can access the backing array of a pandas Series with .arrayand .to_numpy

由于大PandasV 0.24.0的-你可以访问的支持数组Pandas系列.array.to_numpy

pandas 0.24.x release notesQuote: "Series.arrayand Index.array have been added for extracting the array backing a Series or Index... We haven't removed or deprecated Series.values or DataFrame.values, but we highly recommend and using .array or .to_numpy()instead

pandas 0.24.x 发行说明引用:“添加了Series.array和 Index.array 以提取支持系列或索引的数组......我们没有删除或弃用 Series.values 或 DataFrame.values,但我们强烈推荐并使用 .array 或 .to_numpy()代替

... We recommend using Series.array when you need the array of data stored in the Series, and Series.to_numpy() when you know you need a NumPy array."

...当您需要存储在 Series 中的数据数组时,我们建议使用 Series.array,当您知道需要 NumPy 数组时,建议使用 Series.to_numpy ()。