pandas 熊猫中所有 NaN 的总和返回零？

Question

提问by dgd

I'm trying to sum across columns of a Pandas dataframe, and when I have NaNs in every column I'm getting sum = zero; I'd expected sum = NaN based on the docs. Here's what I've got:

我正在尝试对 Pandas 数据框的列求和，当我在每一列中都有 NaN 时，我得到 sum = 0；根据文档，我预计 sum = NaN 。这是我所拥有的：

In [136]: df = pd.DataFrame()

In [137]: df['a'] = [1,2,np.nan,3]

In [138]: df['b'] = [4,5,np.nan,6]

In [139]: df
Out[139]: 
    a   b
0   1   4
1   2   5
2 NaN NaN
3   3   6

In [140]: df['total'] = df.sum(axis=1)

In [141]: df
Out[141]: 
    a   b  total
0   1   4      5
1   2   5      7
2 NaN NaN      0
3   3   6      9

The pandas.DataFrame.sum docs say "If an entire row/column is NA, the result will be NA", so I don't understand why "total" = 0 and not NaN for index 2. What am I missing?

pandas.DataFrame.sum 文档说“如果整个行/列都是 NA，结果将是 NA”，所以我不明白为什么索引 2 的“total”= 0 而不是 NaN。我错过了什么？

Answer 1

回答by Vishnudev

pandas 0.24.2 documentation ? API Reference ? DataFrame ? pandas.DataFrame ?

Pandas 0.24.2 文档？API 参考 ? 数据帧？pandas.DataFrame ？

DataFrame.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
New in version 0.22.0: Added with the default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.

DataFrame.sum(self,axis=None,skipna=None,level=None, numeric_only=None, min_count=0, **kwargs)
min_count：整数，默认为 0
执行操作所需的有效值数。如果存在少于 min_count 的非 NA 值，则结果将为 NA。
0.22.0 新版功能: 添加默认值为 0。这意味着全 NA 或空系列的总和为 0，全 NA 或空系列的乘积为 1。

Quoting from pandas latest docs it says the min_countwill be 0 for all-NA series

引用 pandas 的最新文档，它说min_count所有 NA 系列都将是 0

If you say min_count=1then the result of the sum will be a nan

如果你说min_count=1那么总和的结果将是nan

Answer 2

回答by Martien Lubberink

A solution would be to select all cases where rows are all-nan, then set the sum to nan:

一个解决方案是选择所有行都是 nan 的情况，然后将总和设置为 nan：

df['total'] = df.sum(axis=1)    
df.loc[df['a'].isnull() & df['b'].isnull(),'total']=np.nan

or

或者

df['total'] = df.sum(axis=1)    
df.loc[df[['a','b']].isnull().all(1),'total']=np.nan

The latter option is probably more practical, because you can create a list of columns ['a','b', ... , 'z']which you may want to sum.

后一个选项可能更实用，因为您可以创建一个['a','b', ... , 'z']您可能想要求和的列列表。

Answer 3

回答by Izaskun

Great link provided by Jeff.

杰夫提供的很棒的链接。

Here you can find a example:

在这里你可以找到一个例子：

df1 = pd.DataFrame(); 
df1['a'] = [1,2,np.nan,3];
df1['b'] = [np.nan,2,np.nan,3]

df1
Out[4]: 
     a    b
0  1.0  NaN
1  2.0  2.0
2  NaN  NaN
3  3.0  3.0


df1.sum(axis=1, skipna=False)
Out[6]: 
0    NaN
1    4.0
2    NaN
3    6.0
dtype: float64

df1.sum(axis=1, skipna=True)
Out[7]: 
0    1.0
1    4.0
2    0.0
3    6.0
dtype: float64

Answer 4

回答by chris

I got around this by casting the series to a numpy array, which computes the answer correctly.

我通过将系列转换为 numpy 数组来解决这个问题，该数组可以正确计算答案。

print(np.array([np.nan,np.nan,np.nan]).sum()) # nan
print(pd.Series([np.nan,np.nan,np.nan]).sum()) # 0.0
print(pd.Series([np.nan,np.nan,np.nan]).to_numpy().sum()) # nan

pandas 熊猫中所有 NaN 的总和返回零？

提问by dgd

回答by Vishnudev

回答by Martien Lubberink

回答by Izaskun

回答by chris

相关推荐

最近更新

标签

pandas 熊猫中所有 NaN 的总和返回零？

提问by dgd

回答by Vishnudev

回答by Martien Lubberink

回答by Izaskun

回答by chris

相关推荐

pandas ValueError：使用序列设置数组元素。熊猫

带有 CSS 样式的 Pandas df.to_html

pandas 当我没有表对象时，如何在 SQLAlchemy 中删除表？

在 pandas/matplotlib 中获取散点图的 Colorbar 实例

相关推荐

最近更新

标签