Python 熊猫分组和加入列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23794082/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas groupby and join lists
提问by fast tooth
I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:
我有一个数据框 df,有两列,我想对一列进行分组并加入属于同一组的列表,例如:
column_a, column_b
1, [1,2,3]
1, [2,5]
2, [5,6]
after the process:
过程后:
column_a, column_b
1, [1,2,3,2,5]
2, [5,6]
I want to keep all the duplicates. I have the following questions:
我想保留所有重复项。我有以下问题:
- The dtypes of the dataframe are object(s). convert_objects() doesn't convert column_b to list automatically. How can I do this?
- what does the function in df.groupby(...).apply(lambda x: ...) apply to ? what is the form of x ? list?
- the solution to my main problem?
- 数据框的 dtypes 是对象。convert_objects() 不会自动将 column_b 转换为列表。我怎样才能做到这一点?
- df.groupby(...).apply(lambda x: ...) 中的函数适用于什么?x 的形式是什么?列表?
- 我的主要问题的解决方案?
Thanks in advance.
提前致谢。
采纳答案by TomAugspurger
object
dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objects
tries to convert a column to one of those dtypes.
object
dtype 是一个包罗万象的 dtype,基本上意味着不是 int、float、bool、datetime 或 timedelta。所以它将它们存储为一个列表。convert_objects
尝试将一列转换为这些 dtypes 之一。
You want
你要
In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz
In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]
This groups the data frame by the values in column a
. Read more about [groupby].(http://pandas.pydata.org/pandas-docs/stable/groupby.html).
这按列中的值对数据框进行分组a
。阅读有关 [groupby] 的更多信息。(http://pandas.pydata.org/pandas-docs/stable/groupby.html)。
This is doing a regular list sum
(concatenation) just like [1, 2, 3] + [2, 5]
这是做一个常规列表sum
(串联)就像[1, 2, 3] + [2, 5]
回答by qwwqwwq
df.groupby('column_a').agg(sum)
This works because of operator overloading sum
concatenates the lists together. The index of the resulting df will be the values from column_a
:
这是有效的,因为运算符重载sum
将列表连接在一起。生成的 df 的索引将是以下值column_a
: