Python pandas:如何根据多列对唯一值进行分组和计数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35134507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas: How to group by and count unique values based on multiple columns?
提问by UserYmY
I have datafarme df:
我有数据农场 df:
id name number
1 sam 76
2 sam 8
2 peter 8
4 Hyman 2
I would like to group by on 'id' column and count the number of unique values based on the pair of (name,number)?
我想在“id”列上分组并根据(名称,数字)对计算唯一值的数量?
id count(name-number)
1 1
2 2
4 1
I have tried this, but it does not work:
我试过这个,但它不起作用:
df.groupby('id')[('number','name')].nunique().reset_index()
回答by mvd
You can do:
你可以做:
import pandas
df = pandas.DataFrame({"id": [1, 2, 3, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
g = df.groupby(["name", "number"])
print g.groups
which gives:
这使:
{('Hyman', 2): [3], ('peter', 8): [2], ('sam', 8): [0, 1]}
to get number of unique entries per pair you can do:
要获得每对唯一条目的数量,您可以执行以下操作:
for p in g.groups:
print p, " has ", len(g.groups[p]), " entries"
which gives:
这使:
('peter', 8) has 1 entries
('Hyman', 2) has 1 entries
('sam', 8) has 2 entries
update:
更新:
the OP asked for result in dataframe. One way to get this is to use aggregate
with the length function, which will return a dataframe with the number of unique entries per pair:
OP要求数据帧中的结果。获得它的一种方法是使用aggregate
length 函数,它将返回一个数据帧,其中包含每对唯一条目的数量:
d = g.aggregate(len)
print d.reset_index().rename(columns={"id": "num_entries"})
gives:
给出:
name number num_entries
0 Hyman 2 1
1 peter 8 1
2 sam 8 2
回答by Shen Huang
try
尝试
df.groupby('id').apply(lambda x: x.drop('id',
axis=1).drop_duplicates().shape[0]).reset_index()
回答by sparrow
To get a list of unique values for column combinations:
要获取列组合的唯一值列表:
grouped= df.groupby('name').number.unique()
for k,v in grouped.items():
print(k)
print(v)
output:
输出:
Hyman
[2]
peter
[8]
sam
[76 8]
To get number of values of one column based on another:
要根据另一列获取一列的值数:
df.groupby('name').number.value_counts().unstack().fillna(0)
output:
输出:
number 2 8 76
name
Hyman 1.0 0.0 0.0
peter 0.0 1.0 0.0
sam 0.0 1.0 1.0
回答by stedes
You can just combine two groupby
s to get the desired result.
您可以将两个groupby
s组合起来以获得所需的结果。
import pandas
df = pandas.DataFrame({"id": [1, 2, 2, 4], "name": ["sam", "sam", "peter", "Hyman"], "number": [8, 8, 8, 2]})
group = df.groupby(['id','name','number']).size().groupby(level=0).size()
The first groupby
will count the complete set of original combinations (and thereby make the columns you want to count unique). The second groupby
will count the unique occurences per the column you want (and you can use the fact that the first groupby
put that column in the index).
第一个groupby
将计算完整的原始组合集(从而使您要计算的列唯一)。第二个groupby
将计算您想要的每个列的唯一出现次数(并且您可以使用第一个groupby
将该列放在索引中的事实)。
The result will be a Series. If you want to have DataFrame with the right column name (as you showed in your desired result) you can use the aggregate
function:
结果将是一个系列。如果您想让 DataFrame 具有正确的列名(如您在所需的结果中所示),您可以使用该aggregate
函数:
group = df.groupby(['id','name','number']).size().groupby(level=0).agg({'count(name-number':'size'})