Python 如何将数据帧行分组到pandas groupby中的列表中？

Question

提问by Abhishek Thakur

I have a pandas data frame dflike:

我有一个熊猫数据框，df如：

a b
A 1
A 2
B 5
B 5
B 4
C 6

I want to group by the first column and get second column as lists in rows:

我想按第一列分组并将第二列作为行中的列表：

A [1,2]
B [5,5,4]
C [6]

Is it possible to do something like this using pandas groupby?

是否可以使用 Pandas groupby 做这样的事情？

Answer 1

采纳答案by EdChum

You can do this using groupbyto group on the column of interest and then applylistto every group:

您可以使用groupby对感兴趣的列进行分组，然后applylist对每个组进行分组：

In [1]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6]})
        df

Out[1]: 
   a  b
0  A  1
1  A  2
2  B  5
3  B  5
4  B  4
5  C  6

In [2]: df.groupby('a')['b'].apply(list)
Out[2]: 
a
A       [1, 2]
B    [5, 5, 4]
C          [6]
Name: b, dtype: object

In [3]: df1 = df.groupby('a')['b'].apply(list).reset_index(name='new')
        df1
Out[3]: 
   a        new
0  A     [1, 2]
1  B  [5, 5, 4]
2  C        [6]

Answer 2

回答by Acorbe

As you were saying the groupbymethod of a pd.DataFrameobject can do the job.

正如您所说，对象的groupby方法pd.DataFrame可以完成这项工作。

Example

例子

 L = ['A','A','B','B','B','C']
 N = [1,2,5,5,4,6]

 import pandas as pd
 df = pd.DataFrame(zip(L,N),columns = list('LN'))


 groups = df.groupby(df.L)

 groups.groups
      {'A': [0, 1], 'B': [2, 3, 4], 'C': [5]}

which gives and index-wise description of the groups.

它给出了组的索引描述。

To get elements of single groups, you can do, for instance

要获取单个组的元素，您可以执行以下操作，例如

 groups.get_group('A')

     L  N
  0  A  1
  1  A  2

  groups.get_group('B')

     L  N
  2  B  5
  3  B  5
  4  B  4

Answer 3

回答by B. M.

If performance is important go down to numpy level:

如果性能很重要，请下降到 numpy 级别：

import numpy as np

df = pd.DataFrame({'a': np.random.randint(0, 60, 600), 'b': [1, 2, 5, 5, 4, 6]*100})

def f(df):
         keys, values = df.sort_values('a').values.T
         ukeys, index = np.unique(keys, True)
         arrays = np.split(values, index[1:])
         df2 = pd.DataFrame({'a':ukeys, 'b':[list(a) for a in arrays]})
         return df2

Tests:

测试：

In [301]: %timeit f(df)
1000 loops, best of 3: 1.64 ms per loop

In [302]: %timeit df.groupby('a')['b'].apply(list)
100 loops, best of 3: 5.26 ms per loop

Answer 4

回答by Anamika Modi

A handy way to achieve this would be:

实现这一目标的便捷方法是：

df.groupby('a').agg({'b':lambda x: list(x)})

Look into writing Custom Aggregations: https://www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

研究编写自定义聚合：https: //www.kaggle.com/akshaysehgal/how-to-group-by-aggregate-using-py

Answer 5

回答by Markus Dutschke

To solve this for several columns of a dataframe:

要为数据帧的多列解决此问题：

In [5]: df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6],'c'
   ...: :[3,3,3,4,4,4]})

In [6]: df
Out[6]: 
   a  b  c
0  A  1  3
1  A  2  3
2  B  5  3
3  B  5  4
4  B  4  4
5  C  6  4

In [7]: df.groupby('a').agg(lambda x: list(x))
Out[7]: 
           b          c
a                      
A     [1, 2]     [3, 3]
B  [5, 5, 4]  [3, 4, 4]
C        [6]        [4]

This answer was inspired from Anamika Modi's answer. Thank you!

这个答案的灵感来自Anamika Modi的答案。谢谢！

Answer 6

回答by YOBEN_S

Let us using df.groupbywith list and Seriesconstructor

让我们使用df.groupby列表和Series构造函数

pd.Series({x : y.b.tolist() for x , y in df.groupby('a')})
Out[664]: 
A       [1, 2]
B    [5, 5, 4]
C          [6]
dtype: object

Answer 7

回答by cs95

Use any of the following groupbyand aggrecipes.

使用以下任何一种groupby和agg食谱。

# Setup
df = pd.DataFrame({
  'a': ['A', 'A', 'B', 'B', 'B', 'C'],
  'b': [1, 2, 5, 5, 4, 6],
  'c': ['x', 'y', 'z', 'x', 'y', 'z']
})
df

   a  b  c
0  A  1  x
1  A  2  y
2  B  5  z
3  B  5  x
4  B  4  y
5  C  6  z

To aggregate multiple columns as lists, use any of the following:

要将多列聚合为列表，请使用以下任一方法：

df.groupby('a').agg(list)
df.groupby('a').agg(pd.Series.tolist)

           b          c
a                      
A     [1, 2]     [x, y]
B  [5, 5, 4]  [z, x, y]
C        [6]        [z]

To group-listify a single column only, convert the groupby to a SeriesGroupByobject, then call SeriesGroupBy.agg. Use,

要仅对单个列进行分组列表，请将 groupby 转换为SeriesGroupBy对象，然后调用SeriesGroupBy.agg. 用，

df.groupby('a').agg({'b': list})  # 4.42 ms 
df.groupby('a')['b'].agg(list)    # 2.76 ms - faster

a
A       [1, 2]
B    [5, 5, 4]
C          [6]
Name: b, dtype: object

Answer 8

回答by Ganesh Kharad

Here I have grouped elements with "|" as a separator

在这里，我用“|”对元素进行了分组作为分隔符

    import pandas as pd

    df = pd.read_csv('input.csv')

    df
    Out[1]:
      Area  Keywords
    0  A  1
    1  A  2
    2  B  5
    3  B  5
    4  B  4
    5  C  6

    df.dropna(inplace =  True)
    df['Area']=df['Area'].apply(lambda x:x.lower().strip())
    print df.columns
    df_op = df.groupby('Area').agg({"Keywords":lambda x : "|".join(x)})

    df_op.to_csv('output.csv')
    Out[2]:
    df_op
    Area  Keywords

    A       [1| 2]
    B    [5| 5| 4]
    C          [6]

Answer 9

回答by Vanshika

If looking for a uniquelistwhile grouping multiple columns this could probably help:

如果在对多个列进行分组时寻找唯一列表，这可能会有所帮助：

df.groupby('a').agg(lambda x: list(set(x))).reset_index()

Answer 10

回答by Mithril

It is time to use agginstead of apply.

是时候使用agg而不是apply.

When

什么时候

df = pd.DataFrame( {'a':['A','A','B','B','B','C'], 'b':[1,2,5,5,4,6], 'c': [1,2,5,5,4,6]})

If you want multiple columns stack into list , result in pd.DataFrame

如果您希望多列堆叠到列表中，则导致 pd.DataFrame

df.groupby('a')[['b', 'c']].agg(list)
# or 
df.groupby('a').agg(list)

If you want single column in list, result in ps.Series

如果你想要列表中的单列，结果 ps.Series

df.groupby('a')['b'].agg(list)
#or
df.groupby('a')['b'].apply(list)

Note, result in pd.DataFrameis about 10x slower than result in ps.Serieswhen you only aggregate single column, use it in multicolumns case .

请注意， result inpd.DataFrame比ps.Series仅聚合单列时的result 慢约 10 倍，在多列情况下使用它。

Python 如何将数据帧行分组到pandas groupby中的列表中？

提问by Abhishek Thakur

采纳答案by EdChum

回答by Acorbe

回答by B. M.

If performance is important go down to numpy level:

如果性能很重要，请下降到 numpy 级别：

Tests:

测试：

回答by Anamika Modi

回答by Markus Dutschke

回答by YOBEN_S

回答by cs95

回答by Ganesh Kharad

回答by Vanshika

回答by Mithril

相关推荐

最近更新

标签

Python 如何将数据帧行分组到pandas groupby中的列表中？

提问by Abhishek Thakur

采纳答案by EdChum

回答by Acorbe

回答by B. M.

If performance is important go down to numpy level:

如果性能很重要，请下降到 numpy 级别：

Tests:

测试：

回答by Anamika Modi

回答by Markus Dutschke

回答by YOBEN_S

回答by cs95

回答by Ganesh Kharad

回答by Vanshika

回答by Mithril

相关推荐

Python pypdf 将多个pdf文件合并为一个pdf

导入 mechanize 时在 python3 中出现错误

查找字符串 Python 中最后一次出现的字符

Python 解析文本文件中的数据

相关推荐

最近更新

标签