pandas 熊猫中所有 NaN 的总和返回零?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33448003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:07:57  来源:igfitidea点击:

Sum across all NaNs in pandas returns zero?

pythonpandas

提问by dgd

I'm trying to sum across columns of a Pandas dataframe, and when I have NaNs in every column I'm getting sum = zero; I'd expected sum = NaN based on the docs. Here's what I've got:

我正在尝试对 Pandas 数据框的列求和,当我在每一列中都有 NaN 时,我得到 sum = 0;根据文档,我预计 sum = NaN 。这是我所拥有的:

In [136]: df = pd.DataFrame()

In [137]: df['a'] = [1,2,np.nan,3]

In [138]: df['b'] = [4,5,np.nan,6]

In [139]: df
Out[139]: 
    a   b
0   1   4
1   2   5
2 NaN NaN
3   3   6

In [140]: df['total'] = df.sum(axis=1)

In [141]: df
Out[141]: 
    a   b  total
0   1   4      5
1   2   5      7
2 NaN NaN      0
3   3   6      9

The pandas.DataFrame.sum docs say "If an entire row/column is NA, the result will be NA", so I don't understand why "total" = 0 and not NaN for index 2. What am I missing?

pandas.DataFrame.sum 文档说“如果整个行/列都是 NA,结果将是 NA”,所以我不明白为什么索引 2 的“total”= 0 而不是 NaN。我错过了什么?

回答by Vishnudev

pandas 0.24.2 documentation ? API Reference ? DataFrame ? pandas.DataFrame ?

Pandas 0.24.2 文档?API 参考 ? 数据帧?pandas.DataFrame ?

DataFrame.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

min_count: int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

New in version 0.22.0: Added with the default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

DataFrame.sum(self,axis=None,skipna=None,level=None, numeric_only=None, min_count=0, **kwargs)

min_count:整数,默认为 0

执行操作所需的有效值数。如果存在少于 min_count 的非 NA 值,则结果将为 NA。

0.22.0 新版功能: 添加默认值为 0。这意味着全 NA 或空系列的总和为 0,全 NA 或空系列的乘积为 1。

Quoting from pandas latest docs it says the min_countwill be 0 for all-NA series

引用 pandas 的最新文档,它说min_count所有 NA 系列都将是 0

If you say min_count=1then the result of the sum will be a nan

如果你说min_count=1那么总和的结果将是nan

回答by Martien Lubberink

A solution would be to select all cases where rows are all-nan, then set the sum to nan:

一个解决方案是选择所有行都是 nan 的情况,然后将总和设置为 nan:

df['total'] = df.sum(axis=1)    
df.loc[df['a'].isnull() & df['b'].isnull(),'total']=np.nan

or

或者

df['total'] = df.sum(axis=1)    
df.loc[df[['a','b']].isnull().all(1),'total']=np.nan

The latter option is probably more practical, because you can create a list of columns ['a','b', ... , 'z']which you may want to sum.

后一个选项可能更实用,因为您可以创建一个['a','b', ... , 'z']您可能想要求和的列列表。

回答by Izaskun

Great link provided by Jeff.

杰夫提供的很棒的链接。

Here you can find a example:

在这里你可以找到一个例子:

df1 = pd.DataFrame(); 
df1['a'] = [1,2,np.nan,3];
df1['b'] = [np.nan,2,np.nan,3]

df1
Out[4]: 
     a    b
0  1.0  NaN
1  2.0  2.0
2  NaN  NaN
3  3.0  3.0


df1.sum(axis=1, skipna=False)
Out[6]: 
0    NaN
1    4.0
2    NaN
3    6.0
dtype: float64

df1.sum(axis=1, skipna=True)
Out[7]: 
0    1.0
1    4.0
2    0.0
3    6.0
dtype: float64

回答by chris

I got around this by casting the series to a numpy array, which computes the answer correctly.

我通过将系列转换为 numpy 数组来解决这个问题,该数组可以正确计算答案。

print(np.array([np.nan,np.nan,np.nan]).sum()) # nan
print(pd.Series([np.nan,np.nan,np.nan]).sum()) # 0.0
print(pd.Series([np.nan,np.nan,np.nan]).to_numpy().sum()) # nan