Pandas groupby mean() 不忽略 NaN

Question

提问by Tim Tee

If I calculate the mean of a groupby object and within one of the groups there is a NaN(s) the NaNs are ignored. Even when applying np.mean it is still returning just the mean of all valid numbers. I would expect a behaviour of returning NaN as soon as one NaN is within the group. Here a simplified example of the behaviour

如果我计算 groupby 对象的平均值，并且在其中一个组中存在 NaN(s)，则忽略 NaN。即使在应用 np.mean 时，它仍然只返回所有有效数字的平均值。我希望一旦组内有一个 NaN 就会返回 NaN 的行为。这是行为的简化示例

import pandas as pd
import numpy as np
c = pd.DataFrame({'a':[1,np.nan,2,3],'b':[1,2,1,2]})
c.groupby('b').mean()
     a
b     
1  1.5
2  3.0
c.groupby('b').agg(np.mean)
     a
b     
1  1.5
2  3.0

I want to receive following result:

我想收到以下结果：

     a
b     
1  1.5
2  NaN

I am aware that I can replace NaNs beforehand and that i probably can write my own aggregation function to return NaN as soon as NaN is within the group. This function wouldn't be optimized though.

我知道我可以预先替换 NaN 并且我可能可以编写自己的聚合函数以在 NaN 在组内时立即返回 NaN。不过这个功能不会被优化。

Do you know of an argument to achieve the desired behaviour with the optimized functions?

您是否知道使用优化函数实现所需行为的参数？

Btw, I think the desired behaviour was implemented in a previous version of pandas.

顺便说一句，我认为所需的行为是在以前版本的Pandas中实现的。

Answer 1

回答by Mayank Porwal

By default, pandasskips the Nanvalues. You can make it include Nanby specifying skipna=False:

默认情况下，pandas跳过这些Nan值。您可以Nan通过指定将其包含在内skipna=False：

In [215]: c.groupby('b').agg({'a': lambda x: x.mean(skipna=False)})
Out[215]: 
     a
b     
1  1.5
2  NaN

Answer 2

回答by Mortz

Use the skipnaoption -

使用skipna选项 -

c.groupby('b').apply(lambda g: g.mean(skipna=False))

Answer 3

回答by Serge Ballesta

Another approach would be to use a valuethat is not ignored by default, for example np.inf:

另一种方法是使用一个值，即默认情况下不忽略，例如np.inf：

>>> c = pd.DataFrame({'a':[1,np.inf,2,3],'b':[1,2,1,2]})
>>> c.groupby('b').mean()
          a
b          
1  1.500000
2       inf

Answer 4

回答by KEXIN WANG

There are three different methods for it:

它有三种不同的方法：

slowest:

最慢：

    c.groupby('b').apply(lambda g: g.mean(skipna=False))

faster than apply but slower than default sum:

比 apply 快但比默认 sum 慢：

    c.groupby('b').agg({'a': lambda x: x.mean(skipna=False)})

Fastest but need more codes:

最快但需要更多代码：

    method3 = c.groupby('b').sum()
    nan_index = c[c['b'].isna()].index.to_list()
    method3.loc[method3.index.isin(nan_index)] = np.nan

Answer 5

回答by Dmitriy Work

There is `mean(skipna=False)`, but it's not working

有`mean(skipna=False)`，但是没用

GroupBy aggregation methods (min, max, mean, median, etc.) have the skipnaparameter, which is meant for this exact task, but it seems that currently (may-2020) there is a bug(issue opened on mar-2020), which prevents it from working correctly.

GroupBy 聚合方法（最小值、最大值、平均值、中值等）具有skipna用于此确切任务的参数，但目前（2020 年 5 月）似乎存在错误（问题于 2020 年 3 月开放），这会阻止它正常工作。

Quick workaround

快速解决方法

Complete working example based on this comments: @Serge Ballesta, @RoelAdriaans

基于此评论的完整工作示例：@Serge Ballesta、@RoelAdriaans

>>> import pandas as pd
>>> import numpy as np
>>> c = pd.DataFrame({'a':[1,np.nan,2,3],'b':[1,2,1,2]})
>>> c.fillna(np.inf).groupby('b').mean().replace(np.inf, np.nan)

     a
b     
1  1.5
2  NaN

For additional information and updates follow the link above.

有关更多信息和更新，请点击上面的链接。

Pandas groupby mean() 不忽略 NaN

提问by Tim Tee

回答by Mayank Porwal

回答by Mortz

回答by Serge Ballesta

回答by KEXIN WANG

回答by Dmitriy Work

There is `mean(skipna=False)`, but it's not working

有`mean(skipna=False)`，但是没用

Quick workaround

快速解决方法

相关推荐

最近更新

标签

Pandas groupby mean() 不忽略 NaN

提问by Tim Tee

回答by Mayank Porwal

回答by Mortz

回答by Serge Ballesta

回答by KEXIN WANG

回答by Dmitriy Work

There is mean(skipna=False), but it's not working

有mean(skipna=False)，但是没用

Quick workaround

快速解决方法

相关推荐

Pandas fillna 抛出 ValueError：填充值必须在类别中

在 Pandas 数据框中用 NaN 替换字符串值 - Python

Pandas .at 抛出 ValueError: At 基于整数索引的索引只能有整数索引器

带有 Pandas 的高性能笛卡尔积（CROSS JOIN）

相关推荐

最近更新

标签

There is `mean(skipna=False)`, but it's not working

有`mean(skipna=False)`，但是没用