为 pandas.DataFrame 复制 GROUP_CONCAT
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18138693/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replicating GROUP_CONCAT for pandas.DataFrame
提问by Mitch Flax
I have a pandas DataFrame df:
我有一个Pandas数据帧 df:
+------+---------+  
| team | user    |  
+------+---------+  
| A    | elmer   |  
| A    | daffy   |  
| A    | bugs    |  
| B    | dawg    |  
| A    | foghorn |  
| B    | speedy  |  
| A    | goofy   |  
| A    | marvin  |  
| B    | pepe    |  
| C    | petunia |  
| C    | porky   |  
+------+---------  
I want to find or write a function to return a DataFrame that I would return in MySQL using the following:
我想找到或编写一个函数来返回一个我将使用以下命令在 MySQL 中返回的数据帧:
SELECT
  team,
  GROUP_CONCAT(user)
FROM
  df
GROUP BY
  team
for the following result:
对于以下结果:
+------+---------------------------------------+  
| team | group_concat(user)                    |  
+------+---------------------------------------+  
| A    | elmer,daffy,bugs,foghorn,goofy,marvin |  
| B    | dawg,speedy,pepe                      |  
| C    | petunia,porky                         |  
+------+---------------------------------------+  
I can think of nasty ways to do this by iterating over rows and adding to a dictionary, but there's got to be a better way.
我可以想到通过迭代行并添加到字典中来做到这一点的讨厌的方法,但必须有更好的方法。
回答by Phillip Cloud
Do the following:
请执行下列操作:
df.groupby('team').apply(lambda x: ','.join(x.user))
to get a Seriesof strings or
得到一个Series字符串或
df.groupby('team').apply(lambda x: list(x.user))
to get a Seriesof lists of strings.
得到Series的list字符串秒。
Here's what the results look like:
结果如下:
In [33]: df.groupby('team').apply(lambda x: ', '.join(x.user))
Out[33]:
team
a       elmer, daffy, bugs, foghorn, goofy, marvin
b                               dawg, speedy, pepe
c                                   petunia, porky
dtype: object
In [34]: df.groupby('team').apply(lambda x: list(x.user))
Out[34]:
team
a       [elmer, daffy, bugs, foghorn, goofy, marvin]
b                               [dawg, speedy, pepe]
c                                   [petunia, porky]
dtype: object
Note that in general any further operations on these types of Serieswill be slow and are generally discouraged. If there's another way to aggregate without putting   a listinside of a Seriesyou should consider using that approach instead.
请注意,通常对这些类型的任何进一步操作Series都会很慢并且通常不鼓励。如果有另一种聚合方式而不将 alist放入内部,Series您应该考虑使用该方法。
回答by Kamil Sindi
A more general solution if you want to use agg:
如果您想使用更通用的解决方案agg:
df.groupby('team').agg({'user' : lambda x: ', '.join(x)})

