Pandas groupby 自定义函数到每个系列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44348426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby custom function to each series
提问by Naresh Ambati
I am having hard time to apply a custom function to each set of groupby column in Pandas
我很难将自定义函数应用于 Pandas 中的每组 groupby 列
My custom function takes series of numbers and takes the difference of consecutive pairs and returns the mean of all the differences. Below is the code
我的自定义函数采用一系列数字并采用连续对的差异并返回所有差异的平均值。下面是代码
def mean_gap(a):
b = []
for i in range(0, len(a)-1):
b.append((a[i+1]-a[i]))
return np.mean(b)
so if a = [1,3,7] mean_gap(a) will give me ((3-1)+(7-3))/2) = 3.0
所以如果 a = [1,3,7] mean_gap(a) 会给我 ((3-1)+(7-3))/2) = 3.0
Dataframe:
one two
a 1
a 3
a 7
b 8
b 9
desired result
Dataframe:
one two
a 3
b 1
df.groupby(['one'])['two'].???
df.groupby(['one'])['two'].???
I am new to pandas. I read that groupby takes values each row at a time, not full series. So I am not able to use lambda after groupby. Please help!
我是Pandas的新手。我读到 groupby 一次对每一行取值,而不是完整系列。所以我不能在 groupby 之后使用 lambda。请帮忙!
回答by ayhan
With a custom function, you can do:
使用自定义函数,您可以执行以下操作:
df.groupby('one')['two'].agg(lambda x: x.diff().mean())
one
a 3
b 1
Name: two, dtype: int64
and reset the index:
并重置索引:
df.groupby('one')['two'].agg(lambda x: x.diff().mean()).reset_index(name='two')
one two
0 a 3
1 b 1
An alternative would be:
另一种选择是:
df.groupby('one')['two'].diff().groupby(df['one']).mean()
one
a 3.0
b 1.0
Name: two, dtype: float64
Your approach would have also worked with the following:
您的方法也适用于以下内容:
def mean_gap(a):
b = []
a = np.asarray(a)
for i in range(0, len(a)-1):
b.append((a[i+1]-a[i]))
return np.mean(b)
df.groupby('one')['two'].agg(mean_gap)
one
a 3
b 1
Name: two, dtype: int64
a = np.asarray(a)
is necessary because otherwise you would get KeyErrors in b.append((a[i+1]-a[i]))
.
a = np.asarray(a)
是必要的,否则你会在b.append((a[i+1]-a[i]))
.