Python pandas：如何根据多列对唯一值进行分组和计数？

Question

提问by UserYmY

I have datafarme df:

我有数据农场 df：

id name number
1 sam   76
2 sam    8
2 peter  8 
4 Hyman   2

I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?

我想在“id”列上分组并根据（名称，数字）对计算唯一值的数量？

id count(name-number)
1    1
2    2
4    1

I have tried this, but it does not work:

我试过这个，但它不起作用：

df.groupby('id')[('number','name')].nunique().reset_index()

Answer 1

回答by mvd

You can do:

你可以做：

import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups

which gives:

这使：

{('Hyman', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}

to get number of unique entries per pair you can do:

要获得每对唯一条目的数量，您可以执行以下操作：

for p in g.groups: 
    print p, " has ", len(g.groups[p]), " entries"

which gives:

这使：

('peter', 8)  has  1  entries
('Hyman', 2)  has  1  entries
('sam', 8)  has  2  entries

update:

更新：

the OP asked for result in dataframe. One way to get this is to use aggregatewith the length function, which will return a dataframe with the number of unique entries per pair:

OP要求数据帧中的结果。获得它的一种方法是使用aggregatelength 函数，它将返回一个数据帧，其中包含每对唯一条目的数量：

d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})

gives:

给出：

    name  number  num_entries
0   Hyman       2           1
1  peter       8           1
2    sam       8           2

Answer 2

回答by Shen Huang

try

尝试

 df.groupby('id').apply(lambda x: x.drop('id', 
  axis=1).drop_duplicates().shape[0]).reset_index()

Answer 3

回答by sparrow

To get a list of unique values for column combinations:

要获取列组合的唯一值列表：

grouped= df.groupby('name').number.unique()
for k,v in grouped.items():
    print(k)
    print(v)

output:

输出：

Hyman
[2]
peter
[8]
sam
[76  8]

To get number of values of one column based on another:

要根据另一列获取一列的值数：

df.groupby('name').number.value_counts().unstack().fillna(0)

output:

输出：

number  2   8   76
name            
Hyman    1.0 0.0 0.0
peter   0.0 1.0 0.0
sam     0.0 1.0 1.0

Answer 4

回答by stedes

You can just combine two groupbys to get the desired result.

您可以将两个groupbys组合起来以获得所需的结果。

import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()

The first groupbywill count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupbywill count the unique occurences per the column you want (and you can use the fact that the first groupbyput that column in the index).

第一个groupby将计算完整的原始组合集（从而使您要计算的列唯一）。第二个groupby将计算您想要的每个列的唯一出现次数（并且您可以使用第一个groupby将该列放在索引中的事实）。

The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregatefunction:

结果将是一个系列。如果您想让 DataFrame 具有正确的列名（如您在所需的结果中所示），您可以使用该aggregate函数：

group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})

Python pandas：如何根据多列对唯一值进行分组和计数？

提问by UserYmY

回答by mvd

回答by Shen Huang

回答by sparrow

回答by stedes

相关推荐

最近更新

标签

Python pandas：如何根据多列对唯一值进行分组和计数？

提问by UserYmY

回答by mvd

回答by Shen Huang

回答by sparrow

回答by stedes

相关推荐

Python Django - 您是否忘记注册或加载此标签？

在python中获取系统本地时区

Python 如何在 Django 中更改 JsonResponse 的状态

Python Pandas：按索引值分组，然后计算分位数？

相关推荐

最近更新

标签