Python 熊猫分组和加入列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23794082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:27:56  来源:igfitidea点击:

pandas groupby and join lists

pythonpandas

提问by fast tooth

I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:

我有一个数据框 df,有两列,我想对一列进行分组并加入属于同一组的列表,例如:

column_a, column_b
1,         [1,2,3]
1,         [2,5]
2,         [5,6]

after the process:

过程后:

column_a, column_b
1,         [1,2,3,2,5]
2,         [5,6]

I want to keep all the duplicates. I have the following questions:

我想保留所有重复项。我有以下问题:

  • The dtypes of the dataframe are object(s). convert_objects() doesn't convert column_b to list automatically. How can I do this?
  • what does the function in df.groupby(...).apply(lambda x: ...) apply to ? what is the form of x ? list?
  • the solution to my main problem?
  • 数据框的 dtypes 是对象。convert_objects() 不会自动将 column_b 转换为列表。我怎样才能做到这一点?
  • df.groupby(...).apply(lambda x: ...) 中的函数适用于什么?x 的形式是什么?列表?
  • 我的主要问题的解决方案?

Thanks in advance.

提前致谢。

采纳答案by TomAugspurger

objectdtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objectstries to convert a column to one of those dtypes.

objectdtype 是一个包罗万象的 dtype,基本上意味着不是 int、float、bool、datetime 或 timedelta。所以它将它们存储为一个列表。convert_objects尝试将一列转换为这些 dtypes 之一。

You want

你要

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about [groupby].(http://pandas.pydata.org/pandas-docs/stable/groupby.html).

这按列中的值对数据框进行分组a。阅读有关 [groupby] 的更多信息。(http://pandas.pydata.org/pandas-docs/stable/groupby.html)。

This is doing a regular list sum(concatenation) just like [1, 2, 3] + [2, 5]

这是做一个常规列表sum(串联)就像[1, 2, 3] + [2, 5]

回答by qwwqwwq

df.groupby('column_a').agg(sum)

This works because of operator overloading sumconcatenates the lists together. The index of the resulting df will be the values from column_a:

这是有效的,因为运算符重载sum将列表连接在一起。生成的 df 的索引将是以下值column_a