pandas 熊猫中所有 NaN 的总和返回零?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33448003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sum across all NaNs in pandas returns zero?
提问by dgd
I'm trying to sum across columns of a Pandas dataframe, and when I have NaNs in every column I'm getting sum = zero; I'd expected sum = NaN based on the docs. Here's what I've got:
我正在尝试对 Pandas 数据框的列求和,当我在每一列中都有 NaN 时,我得到 sum = 0;根据文档,我预计 sum = NaN 。这是我所拥有的:
In [136]: df = pd.DataFrame()
In [137]: df['a'] = [1,2,np.nan,3]
In [138]: df['b'] = [4,5,np.nan,6]
In [139]: df
Out[139]:
a b
0 1 4
1 2 5
2 NaN NaN
3 3 6
In [140]: df['total'] = df.sum(axis=1)
In [141]: df
Out[141]:
a b total
0 1 4 5
1 2 5 7
2 NaN NaN 0
3 3 6 9
The pandas.DataFrame.sum docs say "If an entire row/column is NA, the result will be NA", so I don't understand why "total" = 0 and not NaN for index 2. What am I missing?
pandas.DataFrame.sum 文档说“如果整个行/列都是 NA,结果将是 NA”,所以我不明白为什么索引 2 的“total”= 0 而不是 NaN。我错过了什么?
回答by Vishnudev
pandas 0.24.2 documentation ? API Reference ? DataFrame ? pandas.DataFrame ?
Pandas 0.24.2 文档?API 参考 ? 数据帧?pandas.DataFrame ?
DataFrame.sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
min_count: int, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.
New in version 0.22.0: Added with the default being 0. This means the sum of an all-NA or empty Series is 0, and the product of an all-NA or empty Series is 1.
DataFrame.sum(self,axis=None,skipna=None,level=None, numeric_only=None, min_count=0, **kwargs)
min_count:整数,默认为 0
执行操作所需的有效值数。如果存在少于 min_count 的非 NA 值,则结果将为 NA。
0.22.0 新版功能: 添加默认值为 0。这意味着全 NA 或空系列的总和为 0,全 NA 或空系列的乘积为 1。
Quoting from pandas latest docs it says the min_count
will be 0 for all-NA series
引用 pandas 的最新文档,它说min_count
所有 NA 系列都将是 0
If you say min_count=1
then the result of the sum will be a nan
如果你说min_count=1
那么总和的结果将是nan
回答by Martien Lubberink
A solution would be to select all cases where rows are all-nan, then set the sum to nan:
一个解决方案是选择所有行都是 nan 的情况,然后将总和设置为 nan:
df['total'] = df.sum(axis=1)
df.loc[df['a'].isnull() & df['b'].isnull(),'total']=np.nan
or
或者
df['total'] = df.sum(axis=1)
df.loc[df[['a','b']].isnull().all(1),'total']=np.nan
The latter option is probably more practical, because you can create a list of columns ['a','b', ... , 'z']
which you may want to sum.
后一个选项可能更实用,因为您可以创建一个['a','b', ... , 'z']
您可能想要求和的列列表。
回答by Izaskun
Great link provided by Jeff.
杰夫提供的很棒的链接。
Here you can find a example:
在这里你可以找到一个例子:
df1 = pd.DataFrame();
df1['a'] = [1,2,np.nan,3];
df1['b'] = [np.nan,2,np.nan,3]
df1
Out[4]:
a b
0 1.0 NaN
1 2.0 2.0
2 NaN NaN
3 3.0 3.0
df1.sum(axis=1, skipna=False)
Out[6]:
0 NaN
1 4.0
2 NaN
3 6.0
dtype: float64
df1.sum(axis=1, skipna=True)
Out[7]:
0 1.0
1 4.0
2 0.0
3 6.0
dtype: float64
回答by chris
I got around this by casting the series to a numpy array, which computes the answer correctly.
我通过将系列转换为 numpy 数组来解决这个问题,该数组可以正确计算答案。
print(np.array([np.nan,np.nan,np.nan]).sum()) # nan
print(pd.Series([np.nan,np.nan,np.nan]).sum()) # 0.0
print(pd.Series([np.nan,np.nan,np.nan]).to_numpy().sum()) # nan