Python pandas:如何根据多列对唯一值进行分组和计数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35134507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:02:17  来源:igfitidea点击:

Python pandas: How to group by and count unique values based on multiple columns?

pythonpandasgroup-byunique

提问by UserYmY

I have datafarme df:

我有数据农场 df:

id name number
1 sam   76
2 sam    8
2 peter  8 
4 Hyman   2

I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?

我想在“id”列上分组并根据(名称,数字)对计算唯一值的数量?

id count(name-number)
1    1
2    2
4    1     

I have tried this, but it does not work:

我试过这个,但它不起作用:

df.groupby('id')[('number','name')].nunique().reset_index()

回答by mvd

You can do:

你可以做:

import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups

which gives:

这使:

{('Hyman', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}

to get number of unique entries per pair you can do:

要获得每对唯一条目的数量,您可以执行以下操作:

for p in g.groups: 
    print p, " has ", len(g.groups[p]), " entries"

which gives:

这使:

('peter', 8)  has  1  entries
('Hyman', 2)  has  1  entries
('sam', 8)  has  2  entries

update:

更新:

the OP asked for result in dataframe. One way to get this is to use aggregatewith the length function, which will return a dataframe with the number of unique entries per pair:

OP要求数据帧中的结果。获得它的一种方法是使用aggregatelength 函数,它将返回一个数据帧,其中包含每对唯一条目的数量:

d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})

gives:

给出:

    name  number  num_entries
0   Hyman       2           1
1  peter       8           1
2    sam       8           2

回答by Shen Huang

try

尝试

 df.groupby('id').apply(lambda x: x.drop('id', 
  axis=1).drop_duplicates().shape[0]).reset_index()

回答by sparrow

To get a list of unique values for column combinations:

要获取列组合的唯一值列表:

grouped= df.groupby('name').number.unique()
for k,v in grouped.items():
    print(k)
    print(v)

output:

输出:

Hyman
[2]
peter
[8]
sam
[76  8]

To get number of values of one column based on another:

要根据另一列获取一列的值数:

df.groupby('name').number.value_counts().unstack().fillna(0)

output:

输出:

number  2   8   76
name            
Hyman    1.0 0.0 0.0
peter   0.0 1.0 0.0
sam     0.0 1.0 1.0

回答by stedes

You can just combine two groupbys to get the desired result.

您可以将两个groupbys组合起来以获得所需的结果。

import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()

The first groupbywill count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupbywill count the unique occurences per the column you want (and you can use the fact that the first groupbyput that column in the index).

第一个groupby将计算完整的原始组合集(从而使您要计算的列唯一)。第二个groupby将计算您想要的每个列的唯一出现次数(并且您可以使用第一个groupby将该列放在索引中的事实)。

The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregatefunction:

结果将是一个系列。如果您想让 DataFrame 具有正确的列名(如您在所需的结果中所示),您可以使用该aggregate函数:

group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})