pandas python pandas自定义agg函数

Question

提问by brian_the_bungler

Dataframe:
  one two
a  1  x
b  1  y
c  2  y
d  2  z
e  3  z

grp = DataFrame.groupby('one')
grp.agg(lambda x: ???) #or equivalent function

Desired output from grp.agg:

grp.agg 的期望输出：

one two
1   x|y
2   y|z
3   z

My agg function before integrating dataframes was "|".join(sorted(set(x))). Ideally I want to have any number of columns in the group and agg returns the "|".join(sorted(set())for each column item like two above. I also tried np.char.join().

在集成数据帧之前，我的 agg 函数是"|".join(sorted(set(x))). 理想情况下，我希望组中有任意数量的列，并且 agg"|".join(sorted(set())为每个列项返回，如上面的两个。我也试过了np.char.join()。

Love Pandas and it has taken me from a 800 line complicated program to a 400 line walk in the park that zooms. Thank you :)

喜欢 Pandas，它把我从 800 行复杂的程序带到了 400 行放大的公园里。谢谢：）

Answer 1

回答by Zelazny7

You were so close:

你是如此接近：

In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Expanded answer to handle sorting and take only the set:

处理排序并仅获取集合的扩展答案：

In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])

In [2]: df
Out[2]:
   one two three
a    1   x     e
b    1   y     e
c    2   y     c
d    2   z     b
e    3   z     a

In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
     two three
one
1    x|y     e
2    y|z   b|c
3      z     a

Answer 2

回答by Lahiru Karunaratne

There is a better way to concatenate strings, in pandas documentation.
So I prefer this way:

在 pandas文档中有一种更好的方法来连接字符串。
所以我更喜欢这种方式：

In [1]: df.groupby('one').agg(lambda x: x.str.cat(sep='|'))
Out[1]:
     two
one
1    x|y
2    y|z
3      z

Answer 3

回答by qartal

Just an elaboration on the accepted answer:

只是对接受的答案的详细说明：

df.groupby('one').agg(lambda x: "|".join(x.tolist()))

Note that the type of df.groupby('one')is SeriesGroupBy. And the function aggdefined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that xtype in the above lambda is Series.

请注意，类型df.groupby('one')为SeriesGroupBy。以及agg在此类型上定义的函数。如果您查看此函数的文档，它会说它的输入是一个适用于 Series 的函数。这意味着x上述 lambda中的类型是系列。

Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):

另一个注意事项是不需要将 agg 函数定义为 lambda。如果聚合函数很复杂，则可以将其单独定义为如下所示的常规函数。唯一的限制是 x 类型应该是 Series （或与之兼容）：

def myfun1(x):
    return "|".join(x.tolist())

and then:

进而：

df.groupby('one').agg(myfun1)

pandas python pandas自定义agg函数

提问by brian_the_bungler

回答by Zelazny7

回答by Lahiru Karunaratne

回答by qartal

相关推荐

最近更新

标签

pandas python pandas自定义agg函数

提问by brian_the_bungler

回答by Zelazny7

回答by Lahiru Karunaratne

回答by qartal

相关推荐

高效地将单行添加到 Pandas Series 或 DataFrame

pandas 如何舍入熊猫`DatetimeIndex`？

pandas 将表/数据帧与 Python 中的公共列连接起来

如何删除 Pandas DataFrame 中的一行并重新标记索引？

相关推荐

最近更新

标签