Python Pandas DataFrame Groupby 两列并获取计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17679089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas DataFrame Groupby two columns and get counts
提问by Nilani Algiriyage
I have a pandas dataframe in the following format:
我有以下格式的熊猫数据框:
df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']
df:
df:
col1 col2 col3 col4 col5
0 1.1 A 1.1 x/y/z 1
1 1.1 A 1.7 x/y 3
2 1.1 A 2.5 x/y/z/n 3
3 2.6 B 2.6 x/u 2
4 2.5 B 3.3 x 4
5 3.4 B 3.8 x/u/v 2
6 2.6 B 4 x/y/z 5
7 2.6 A 4.2 x 3
8 3.4 B 4.3 x/u/v/b 6
9 3.4 C 4.5 - 3
10 2.6 B 4.6 x/y 5
11 1.1 D 4.7 x/y/z 1
12 1.1 D 4.7 x 1
13 3.3 D 4.8 x/u/v/w 1
Now I want to group this by two columns like following:
现在我想按如下两列对其进行分组:
df.groupby(['col5','col2']).reset_index()
OutPut:
输出:
index col1 col2 col3 col4 col5
col5 col2
1 A 0 0 1.1 A 1.1 x/y/z 1
D 0 11 1.1 D 4.7 x/y/z 1
1 12 1.1 D 4.7 x 1
2 13 3.3 D 4.8 x/u/v/w 1
2 B 0 3 2.6 B 2.6 x/u 2
1 5 3.4 B 3.8 x/u/v 2
3 A 0 1 1.1 A 1.7 x/y 3
1 2 1.1 A 2.5 x/y/z/n 3
2 7 2.6 A 4.2 x 3
C 0 9 3.4 C 4.5 - 3
4 B 0 4 2.5 B 3.3 x 4
5 B 0 6 2.6 B 4 x/y/z 5
1 10 2.6 B 4.6 x/y 5
6 B 0 8 3.4 B 4.3 x/u/v/b 6
I want to get the count by each row like following. Expected Output:
我想得到每一行的计数,如下所示。预期输出:
col5 col2 count
1 A 1
D 3
2 B 2
etc...
How to get my expected output? And I want to find largest count for each 'col2' value?
如何获得我的预期输出?我想找到每个“col2”值的最大计数?
采纳答案by waitingkuo
Followed by @Andy's answer, you can do following to solve your second question:
其次是@Andy 的回答,您可以按照以下步骤解决您的第二个问题:
In [56]: df.groupby(['col5','col2']).size().reset_index().groupby('col2')[[0]].max()
Out[56]:
0
col2
A 3
B 2
C 1
D 3
回答by Andy Hayden
You are looking for size
:
您正在寻找size
:
In [11]: df.groupby(['col5', 'col2']).size()
Out[11]:
col5 col2
1 A 1
D 3
2 B 2
3 A 3
C 1
4 B 1
5 B 2
6 B 1
dtype: int64
To get the same answer as waitingkuo (the "second question"), but slightly cleaner, is to groupby the level:
要获得与waitkuo(“第二个问题”)相同但更简洁的答案,请按级别分组:
In [12]: df.groupby(['col5', 'col2']).size().groupby(level=1).max()
Out[12]:
col2
A 3
B 2
C 1
D 3
dtype: int64
回答by The Gr8 Adakron
Inserting datainto a pandas dataframe and providing column name.
将数据插入 Pandas 数据框并提供列名。
import pandas as pd
df = pd.DataFrame([['A','C','A','B','C','A','B','B','A','A'], ['ONE','TWO','ONE','ONE','ONE','TWO','ONE','TWO','ONE','THREE']]).T
df.columns = [['Alphabet','Words']]
print(df) #printing dataframe.
This is our printed data:
这是我们打印的数据:
For making a group of dataframe in pandas and counter,
You need to provide one more column which counts the grouping, let's call that column as, "COUNTER"in dataframe.
为了在 pandas 和 counter 中制作一组数据框,
您需要再提供一个用于计算分组的列,让我们在 dataframe 中将该列称为“COUNTER”。
Like this:
像这样:
df['COUNTER'] =1 #initially, set that counter to 1.
group_data = df.groupby(['Alphabet','Words'])['COUNTER'].sum() #sum function
print(group_data)
OUTPUT:
输出:
回答by seansio1995
You can just use the built-in function count follow by the groupby function
您可以只使用内置函数 count 后跟 groupby 函数
df.groupby(['col5','col2']).count()
回答by Tom
Should you want to add a new column (say 'count_column') containing the groups' counts into the dataframe:
如果您想在数据框中添加一个包含组计数的新列(比如“count_column”):
df.count_column=df.groupby(['col5','col2']).col5.transform('count')
(I picked 'col5' as it contains no nan)
(我选择了“col5”,因为它不包含 nan)
回答by Ted Petrou
Idiomatic solution that uses only a single groupby
仅使用单个 groupby 的惯用解决方案
(df.groupby(['col5', 'col2']).size()
.sort_values(ascending=False)
.reset_index(name='count')
.drop_duplicates(subset='col2'))
col5 col2 count
0 3 A 3
1 1 D 3
2 5 B 2
6 3 C 1
Explanation
解释
The result of the groupby size
method is a Series with col5
and col2
in the index. From here, you can use another groupby method to find the maximum value of each value in col2
but it is not necessary to do. You can simply sort all the values descendingly and then keep only the rows with the first occurrence of col2
with the drop_duplicates
method.
groupbysize
方法的结果是在索引中包含col5
和的系列col2
。从这里开始,您可以使用另一个 groupby 方法来查找每个值的最大值, col2
但没有必要这样做。您可以简单地对所有值进行降序排序,然后仅保留第一次出现col2
with 的行drop_duplicates
。