Python Pandas DataFrame Groupby 两列并获取计数

Question

提问by Nilani Algiriyage

I have a pandas dataframe in the following format:

我有以下格式的熊猫数据框：

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']

df:

df：

   col1 col2 col3     col4 col5
0   1.1    A  1.1    x/y/z    1
1   1.1    A  1.7      x/y    3
2   1.1    A  2.5  x/y/z/n    3
3   2.6    B  2.6      x/u    2
4   2.5    B  3.3        x    4
5   3.4    B  3.8    x/u/v    2
6   2.6    B    4    x/y/z    5
7   2.6    A  4.2        x    3
8   3.4    B  4.3  x/u/v/b    6
9   3.4    C  4.5        -    3
10  2.6    B  4.6      x/y    5
11  1.1    D  4.7    x/y/z    1
12  1.1    D  4.7        x    1
13  3.3    D  4.8  x/u/v/w    1

Now I want to group this by two columns like following:

现在我想按如下两列对其进行分组：

df.groupby(['col5','col2']).reset_index()

OutPut:

输出：

             index col1 col2 col3     col4 col5
col5 col2                                      
1    A    0      0  1.1    A  1.1    x/y/z    1
     D    0     11  1.1    D  4.7    x/y/z    1
          1     12  1.1    D  4.7        x    1
          2     13  3.3    D  4.8  x/u/v/w    1
2    B    0      3  2.6    B  2.6      x/u    2
          1      5  3.4    B  3.8    x/u/v    2
3    A    0      1  1.1    A  1.7      x/y    3
          1      2  1.1    A  2.5  x/y/z/n    3
          2      7  2.6    A  4.2        x    3
     C    0      9  3.4    C  4.5        -    3
4    B    0      4  2.5    B  3.3        x    4
5    B    0      6  2.6    B    4    x/y/z    5
          1     10  2.6    B  4.6      x/y    5
6    B    0      8  3.4    B  4.3  x/u/v/b    6

I want to get the count by each row like following. Expected Output:

我想得到每一行的计数，如下所示。预期输出：

col5 col2 count
1    A      1
     D      3
2    B      2
etc...

How to get my expected output? And I want to find largest count for each 'col2' value?

如何获得我的预期输出？我想找到每个“col2”值的最大计数？

Answer 1

采纳答案by waitingkuo

Followed by @Andy's answer, you can do following to solve your second question:

其次是@Andy 的回答，您可以按照以下步骤解决您的第二个问题：

In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]: 
      0
col2   
A     3
B     2
C     1
D     3

Answer 2

回答by Andy Hayden

You are looking for size:

您正在寻找size：

In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5  col2
1     A       1
      D       3
2     B       2
3     A       3
      C       1
4     B       1
5     B       2
6     B       1
dtype: int64

To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:

要获得与waitkuo（“第二个问题”）相同但更简洁的答案，请按级别分组：

In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A       3
B       2
C       1
D       3
dtype: int64

Answer 3

回答by The Gr8 Adakron

Inserting datainto a pandas dataframe and providing column name.

将数据插入 Pandas 数据框并提供列名。

import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df)   #printing dataframe.

This is our printed data:

这是我们打印的数据：

For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER"in dataframe.

为了在 pandas 和 counter 中制作一组数据框，
您需要再提供一个用于计算分组的列，让我们在 dataframe 中将该列称为“COUNTER”。

Like this:

像这样：

df['COUNTER'] =1       #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)

OUTPUT:

输出：

Answer 4

回答by seansio1995

You can just use the built-in function count follow by the groupby function

您可以只使用内置函数 count 后跟 groupby 函数

df.groupby(['col5','col2']).count()

Answer 5

回答by Tom

Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:

如果您想在数据框中添加一个包含组计数的新列（比如“count_column”）：

df.count_column=df.groupby(['col5','col2']).col5.transform('count')

(I picked 'col5' as it contains no nan)

（我选择了“col5”，因为它不包含 nan）

Answer 6

回答by Ted Petrou

Idiomatic solution that uses only a single groupby

仅使用单个 groupby 的惯用解决方案

(df.groupby(['col5', 'col2']).size() 
   .sort_values(ascending=False) 
   .reset_index(name='count') 
   .drop_duplicates(subset='col2'))

  col5 col2  count
0    3    A      3
1    1    D      3
2    5    B      2
6    3    C      1

Explanation

解释

The result of the groupby sizemethod is a Series with col5and col2in the index. From here, you can use another groupby method to find the maximum value of each value in col2but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2with the drop_duplicatesmethod.

groupbysize方法的结果是在索引中包含col5和的系列col2。从这里开始，您可以使用另一个 groupby 方法来查找每个值的最大值， col2但没有必要这样做。您可以简单地对所有值进行降序排序，然后仅保留第一次出现col2with 的行drop_duplicates。

Python Pandas DataFrame Groupby 两列并获取计数

提问by Nilani Algiriyage

采纳答案by waitingkuo

回答by Andy Hayden

回答by The Gr8 Adakron

回答by seansio1995

回答by Tom

回答by Ted Petrou

Idiomatic solution that uses only a single groupby

仅使用单个 groupby 的惯用解决方案

相关推荐

最近更新

标签

Python Pandas DataFrame Groupby 两列并获取计数

提问by Nilani Algiriyage

采纳答案by waitingkuo

回答by Andy Hayden

回答by The Gr8 Adakron

回答by seansio1995

回答by Tom

回答by Ted Petrou

Idiomatic solution that uses only a single groupby

仅使用单个 groupby 的惯用解决方案

相关推荐

Python中找不到文件错误

Python 3.4 - 如何获得字典值的平均值？

Python 'CSV 不存在' - Pandas DataFrame

如何在python中比较字符串和整数？

相关推荐

最近更新

标签