忽略 NaN 的 Pandas 聚合

Question

提问by Zhubarb

I aggregate my Pandas dataframe: data. Specifically, I want to get the average and sum amounts by tuples of [originand type]. For averaging and summing I tried the numpy functions below:

我聚合了我的 Pandas 数据框：data. 具体来说，我想amount通过 [origin和type] 的元组获得平均值和总和。为了求平均值和求和，我尝试了下面的 numpy 函数：

import numpy as np
import pandas as pd
result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()

My issue is that the amountcolumn includes NaNs, which causes the resultof the above code to have a lot of NaNaverage and sums.

我的问题是该amount列包含NaNs，这导致result上述代码的有很多NaN平均值和总和。

I know both pd.Series.sumand pd.Series.meanhave skipna=Trueby default, so why am I still getting NaNs here?

我知道两者pd.Series.sum并且默认情况下pd.Series.mean都有skipna=True，那么为什么我仍然在NaN这里得到s ？

I also tried this, which obviously did not work:

我也试过这个，这显然不起作用：

data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()

EDIT:Upon @Korem's suggestion, I also tried to use a partialas below:

编辑：根据@Korem 的建议，我也尝试使用 apartial如下：

s_na_mean = partial(pd.Series.mean, skipna = True)    
data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()

but get this error:

但得到这个错误：

error: 'functools.partial' object has no attribute '__name__'

Answer 1

回答by Korem

Use numpy's nansumand nanmean:

使用 numpy 的nansum和nanmean：

from numpy import nansum
from numpy import nanmean
data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index()

As a workaround for older version of numpy, and also a way to fix your last try:

作为旧版本 numpy 的解决方法，也是修复上次尝试的方法：

When you do pd.Series.sum(skipna=True)you actually call the method. If you want to use it like this you want to define a partial. So if you don't have nanmean, let's define s_na_meanand use that:

当你这样做时，pd.Series.sum(skipna=True)你实际上调用了该方法。如果你想像这样使用它，你想定义一个partial。所以如果你没有nanmean，让我们定义s_na_mean和使用它：

from functools import partial
s_na_mean = partial(pd.Series.mean, skipna = True)

Answer 2

回答by Miros

It might be too late but anyways it might be useful for others.

可能为时已晚，但无论如何它可能对其他人有用。

Try apply function:

尝试应用功能：

import numpy as np
import pandas as pd

def nan_agg(x):
    res = {}

    res['nansum'] = x.loc[ not x['amount'].isnull(), :]['amount'].sum()
    res['nanmean'] = x.loc[ not x['amount'].isnull(), :]['amount'].mean()

    return pd.Series(res, index=['nansum', 'nanmean'])

result = data.groupby(groupbyvars).apply(nan_agg).reset_index()

忽略 NaN 的 Pandas 聚合

提问by Zhubarb

回答by Korem

回答by Miros

相关推荐

最近更新

标签

忽略 NaN 的 Pandas 聚合

提问by Zhubarb

回答by Korem

回答by Miros

相关推荐

在列表中的字符串中查找最后一个单词（Pandas，Python 3）

缺失数据，在 Pandas 中插入行并用 NAN 填充

为什么 Pandas 默认遍历 DataFrame 列？

Pandas dataframe groupby 计算总体标准差

相关推荐

最近更新

标签