Python 熊猫分组和加入列表

Question

提问by fast tooth

I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:

我有一个数据框 df，有两列，我想对一列进行分组并加入属于同一组的列表，例如：

column_a, column_b
1,         [1,2,3]
1,         [2,5]
2,         [5,6]

after the process:

过程后：

column_a, column_b
1,         [1,2,3,2,5]
2,         [5,6]

I want to keep all the duplicates. I have the following questions:

我想保留所有重复项。我有以下问题：

The dtypes of the dataframe are object(s). convert_objects() doesn't convert column_b to list automatically. How can I do this?
what does the function in df.groupby(...).apply(lambda x: ...) apply to ? what is the form of x ? list?
the solution to my main problem?

数据框的 dtypes 是对象。convert_objects() 不会自动将 column_b 转换为列表。我怎样才能做到这一点？
df.groupby(...).apply(lambda x: ...) 中的函数适用于什么？x 的形式是什么？列表？
我的主要问题的解决方案？

Thanks in advance.

提前致谢。

Answer 1

采纳答案by TomAugspurger

objectdtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. convert_objectstries to convert a column to one of those dtypes.

objectdtype 是一个包罗万象的 dtype，基本上意味着不是 int、float、bool、datetime 或 timedelta。所以它将它们存储为一个列表。convert_objects尝试将一列转换为这些 dtypes 之一。

You want

你要

In [63]: df
Out[63]: 
   a          b    c
0  1  [1, 2, 3]  foo
1  1     [2, 5]  bar
2  2     [5, 6]  baz


In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]: 
         c                b
a                          
1  foo bar  [1, 2, 3, 2, 5]
2      baz           [5, 6]

This groups the data frame by the values in column a. Read more about [groupby].(http://pandas.pydata.org/pandas-docs/stable/groupby.html).

这按列中的值对数据框进行分组a。阅读有关 [groupby] 的更多信息。（http://pandas.pydata.org/pandas-docs/stable/groupby.html）。

This is doing a regular list sum(concatenation) just like [1, 2, 3] + [2, 5]

这是做一个常规列表sum（串联）就像[1, 2, 3] + [2, 5]

Answer 2

回答by qwwqwwq

df.groupby('column_a').agg(sum)

This works because of operator overloading sumconcatenates the lists together. The index of the resulting df will be the values from column_a:

这是有效的，因为运算符重载sum将列表连接在一起。生成的 df 的索引将是以下值column_a：

Python 熊猫分组和加入列表

提问by fast tooth

采纳答案by TomAugspurger

回答by qwwqwwq

相关推荐

最近更新

标签

Python 熊猫分组和加入列表

提问by fast tooth

采纳答案by TomAugspurger

回答by qwwqwwq

相关推荐

Python 如何使用 SciPy/Numpy 过滤/平滑？

通过 pip 为 python 2.7 安装 py2exe：此包需要 Python 3.3 或更高版本

Python exceptions.TypeError: src 不是一个 numpy 数组，也不是一个标量

Python 根据Pandas中的列名删除多列

相关推荐

最近更新

标签