Pandas groupby 自定义函数到每个系列

Question

提问by Naresh Ambati

I am having hard time to apply a custom function to each set of groupby column in Pandas

我很难将自定义函数应用于 Pandas 中的每组 groupby 列

My custom function takes series of numbers and takes the difference of consecutive pairs and returns the mean of all the differences. Below is the code

我的自定义函数采用一系列数字并采用连续对的差异并返回所有差异的平均值。下面是代码

def mean_gap(a):
    b = []
    for i in range(0, len(a)-1):
        b.append((a[i+1]-a[i]))
    return np.mean(b)

so if a = [1,3,7] mean_gap(a) will give me ((3-1)+(7-3))/2) = 3.0

所以如果 a = [1,3,7] mean_gap(a) 会给我 ((3-1)+(7-3))/2) = 3.0

 Dataframe:
   one two
    a  1
    a  3
    a  7
    b  8
    b  9

desired result
     Dataframe:
       one two
        a  3
        b  1

df.groupby(['one'])['two'].???

I am new to pandas. I read that groupby takes values each row at a time, not full series. So I am not able to use lambda after groupby. Please help!

我是Pandas的新手。我读到 groupby 一次对每一行取值，而不是完整系列。所以我不能在 groupby 之后使用 lambda。请帮忙！

Answer 1

回答by ayhan

With a custom function, you can do:

使用自定义函数，您可以执行以下操作：

df.groupby('one')['two'].agg(lambda x: x.diff().mean())
one
a    3
b    1
Name: two, dtype: int64

and reset the index:

并重置索引：

df.groupby('one')['two'].agg(lambda x: x.diff().mean()).reset_index(name='two')


    one  two
0   a    3
1   b    1

An alternative would be:

另一种选择是：

df.groupby('one')['two'].diff().groupby(df['one']).mean()
one
a    3.0
b    1.0
Name: two, dtype: float64

Your approach would have also worked with the following:

您的方法也适用于以下内容：

def mean_gap(a):
    b = []
    a = np.asarray(a)
    for i in range(0, len(a)-1):
        b.append((a[i+1]-a[i]))
    return np.mean(b) 

df.groupby('one')['two'].agg(mean_gap)
one
a    3
b    1
Name: two, dtype: int64

a = np.asarray(a)is necessary because otherwise you would get KeyErrors in b.append((a[i+1]-a[i])).

a = np.asarray(a)是必要的，否则你会在b.append((a[i+1]-a[i])).

Pandas groupby 自定义函数到每个系列

提问by Naresh Ambati

回答by ayhan

相关推荐

最近更新

标签

Pandas groupby 自定义函数到每个系列

提问by Naresh Ambati

回答by ayhan

相关推荐

pandas “IndexError: positional indexers are out-of-bounds” 当它们显然不是

pandas Python：日期时间到季节

pandas 熊猫将 NULL 读取为 NaN 浮点数而不是 str

获取列中的非数字行 pandas python

相关推荐

最近更新

标签