pandas groupby 在多列中连接字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32117848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:47:33  来源:igfitidea点击:

pandas groupby concatenate strings in multiple columns

pythonpandasgroup-by

提问by Blue Moon

I have this pandas data frame:

我有这个Pandas数据框:

df = DataFrame({'id':['a','b','b','b','c','c'], 'category':['z','z','x','y','y','y'], 'category2':['1','2','2','2','1','2']})

which looks like:

看起来像:

  category category2 id
0        z         1  a
1        z         2  b
2        x         2  b
3        y         2  b
4        y         1  c
5        y         2  c

What i'd like to do is to groupby id and return the other two columns as a concatenation of unique strings.

我想要做的是 groupby id 并将其他两列作为唯一字符串的串联返回。

The outcome would look like:

结果将如下所示:

  category category2 id
0        z         1  a
1      zxy         2  b
2        y        12  c

回答by unutbu

Use groupby/aggto aggregate the groups. For each group, apply setto find the unique strings, and ''.jointo concatenate the strings:

使用groupby/agg聚集的群体。对于每个组,申请set查找唯一字符串,并''.join连接字符串:

In [34]: df.groupby('id').agg(lambda x: ''.join(set(x)))
Out[34]: 
   category category2
id                   
a         z         1
b       yxz         2
c         y        12

To move idfrom the index to a column of the resultant DataFrame, call reset_index:

要从id索引移动到结果 DataFrame 的列,请调用reset_index

In [59]: df.groupby('id').agg(lambda x: ''.join(set(x))).reset_index()
Out[59]: 
  id category category2
0  a        z         1
1  b      yxz         2
2  c        y        12