pandas python pandas自定义agg函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14246817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas custom agg function
提问by brian_the_bungler
Dataframe:
one two
a 1 x
b 1 y
c 2 y
d 2 z
e 3 z
grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function
Desired output from grp.agg:
grp.agg 的期望输出:
one two
1 x|y
2 y|z
3 z
My agg function before integrating dataframes was "|".join(sorted(set(x))). Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set())for each column item like two above. I also tried np.char.join().
在集成数据帧之前,我的 agg 函数是"|".join(sorted(set(x))). 理想情况下,我希望组中有任意数量的列,并且 agg"|".join(sorted(set())为每个列项返回 ,如上面的两个。我也试过了np.char.join()。
Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)
喜欢 Pandas,它把我从 800 行复杂的程序带到了 400 行放大的公园里。谢谢 :)
回答by Zelazny7
You were so close:
你是如此接近:
In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
two
one
1 x|y
2 y|z
3 z
Expanded answer to handle sorting and take only the set:
处理排序并仅获取集合的扩展答案:
In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])
In [2]: df
Out[2]:
one two three
a 1 x e
b 1 y e
c 2 y c
d 2 z b
e 3 z a
In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
two three
one
1 x|y e
2 y|z b|c
3 z a
回答by Lahiru Karunaratne
There is a better way to concatenate strings, in pandas documentation.
So I prefer this way:
在 pandas文档中有一种更好的方法来连接字符串。
所以我更喜欢这种方式:
In [1]: df.groupby('one').agg(lambda x: x.str.cat(sep='|'))
Out[1]:
two
one
1 x|y
2 y|z
3 z
回答by qartal
Just an elaboration on the accepted answer:
只是对接受的答案的详细说明:
df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Note that the type of df.groupby('one')is SeriesGroupBy. And the function aggdefined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that xtype in the above lambda is Series.
请注意,类型df.groupby('one')为SeriesGroupBy。以及agg在此类型上定义的函数。如果您查看此函数的文档,它会说它的输入是一个适用于 Series 的函数。这意味着x上述 lambda中的类型是系列。
Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):
另一个注意事项是不需要将 agg 函数定义为 lambda。如果聚合函数很复杂,则可以将其单独定义为如下所示的常规函数。唯一的限制是 x 类型应该是 Series (或与之兼容):
def myfun1(x):
return "|".join(x.tolist())
and then:
进而:
df.groupby('one').agg(myfun1)

