Python Pandas DataFrame Groupby 两列并获取计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17679089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:52:48  来源:igfitidea点击:

Pandas DataFrame Groupby two columns and get counts

pythonpandasdataframe

提问by Nilani Algiriyage

I have a pandas dataframe in the following format:

我有以下格式的熊猫数据框:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']

df:

df:

   col1 col2 col3     col4 col5
0   1.1    A  1.1    x/y/z    1
1   1.1    A  1.7      x/y    3
2   1.1    A  2.5  x/y/z/n    3
3   2.6    B  2.6      x/u    2
4   2.5    B  3.3        x    4
5   3.4    B  3.8    x/u/v    2
6   2.6    B    4    x/y/z    5
7   2.6    A  4.2        x    3
8   3.4    B  4.3  x/u/v/b    6
9   3.4    C  4.5        -    3
10  2.6    B  4.6      x/y    5
11  1.1    D  4.7    x/y/z    1
12  1.1    D  4.7        x    1
13  3.3    D  4.8  x/u/v/w    1

Now I want to group this by two columns like following:

现在我想按如下两列对其进行分组:

df.groupby(['col5','col2']).reset_index()

OutPut:

输出:

             index col1 col2 col3     col4 col5
col5 col2                                      
1    A    0      0  1.1    A  1.1    x/y/z    1
     D    0     11  1.1    D  4.7    x/y/z    1
          1     12  1.1    D  4.7        x    1
          2     13  3.3    D  4.8  x/u/v/w    1
2    B    0      3  2.6    B  2.6      x/u    2
          1      5  3.4    B  3.8    x/u/v    2
3    A    0      1  1.1    A  1.7      x/y    3
          1      2  1.1    A  2.5  x/y/z/n    3
          2      7  2.6    A  4.2        x    3
     C    0      9  3.4    C  4.5        -    3
4    B    0      4  2.5    B  3.3        x    4
5    B    0      6  2.6    B    4    x/y/z    5
          1     10  2.6    B  4.6      x/y    5
6    B    0      8  3.4    B  4.3  x/u/v/b    6

I want to get the count by each row like following. Expected Output:

我想得到每一行的计数,如下所示。预期输出:

col5 col2 count
1    A      1
     D      3
2    B      2
etc...

How to get my expected output? And I want to find largest count for each 'col2' value?

如何获得我的预期输出?我想找到每个“col2”值的最大计数?

采纳答案by waitingkuo

Followed by @Andy's answer, you can do following to solve your second question:

其次是@Andy 的回答,您可以按照以下步骤解决您的第二个问题:

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

回答by Andy Hayden

You are looking for size:

您正在寻找size

In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5  col2
1     A       1
      D       3
2     B       2
3     A       3
      C       1
4     B       1
5     B       2
6     B       1
dtype: int64


To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:

要获得与waitkuo(“第二个问题”)相同但更简洁的答案,请按级别分组:

In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A       3
B       2
C       1
D       3
dtype: int64

回答by The Gr8 Adakron

Inserting datainto a pandas dataframe and providing column name.

数据插入 Pandas 数据框并提供列名

import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df)   #printing dataframe.

This is our printed data:

这是我们打印的数据:

enter image description here

在此处输入图片说明

For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER"in dataframe.

为了在 pandas 和 counter 中制作一组数据框
您需要再提供一个用于计算分组的列,让我们在 dataframe 中将该列称为“COUNTER”

Like this:

像这样:

df['COUNTER'] =1       #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)

OUTPUT:

输出:

enter image description here

在此处输入图片说明

回答by seansio1995

You can just use the built-in function count follow by the groupby function

您可以只使用内置函数 count 后跟 groupby 函数

df.groupby(['col5','col2']).count()

回答by Tom

Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:

如果您想在数据框中添加一个包含组计数的新列(比如“count_column”):

df.count_column=df.groupby(['col5','col2']).col5.transform('count')

(I picked 'col5' as it contains no nan)

(我选择了“col5”,因为它不包含 nan)

回答by Ted Petrou

Idiomatic solution that uses only a single groupby

仅使用单个 groupby 的惯用解决方案

(df.groupby(['col5', 'col2']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count') 
   .drop_duplicates(subset='col2'))

  col5 col2  count
0    3    A      3
1    1    D      3
2    5    B      2
6    3    C      1

Explanation

解释

The result of the groupby sizemethod is a Series with col5and col2in the index. From here, you can use another groupby method to find the maximum value of each value in col2but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2with the drop_duplicatesmethod.

groupbysize方法的结果是在索引中包含col5和的系列col2。从这里开始,您可以使用另一个 groupby 方法来查找每个值的最大值, col2但没有必要这样做。您可以简单地对所有值进行降序排序,然后仅保留第一次出现col2with 的行drop_duplicates