pandas python:计算列中重复条目的数量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48713347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python: count number of duplicate entries in column
提问by freddy888
I have the following df:
我有以下 df:
date id
2000 1
2001 1
2002 1
2000 2
2001 2
2002 2
2000 1
2001 1
2002 1
I want to count per date how many duplicates of id there are. The result should look like this because on every date the id 1 exists twice:
我想计算每个日期有多少 id 重复。结果应该是这样的,因为在每个日期 id 1 都存在两次:
date id count
2000 1 2
2001 1 2
2002 1 2
2000 2 2
2001 2 2
2002 2 2
2000 1 2
2001 1 2
2002 1 2
I tried something like this, but this gives me 1s when id is 2.
我尝试过这样的事情,但是当 id 为 2 时,这给了我 1s。
df["count"] = df.groupby(["date", "id"])["count"].transform("count")
回答by BrokenRobot
The problem with your original code was a simple fix.
原始代码的问题是一个简单的修复。
df['count'] = df.groupby(['date', 'id']).transform('count')
If I use group and transform it to a new column it will result in:
如果我使用 group 并将其转换为新列,它将导致:
df = pd.DataFrame(np.random.randint(0,3,size=(10, 3)), columns=['A', 'B', 'C'])
df['count'] = df.groupby(['A', 'B'])['C'].transform('count')
print(df)
Resulting in:
导致:
A B C count
0 1 2 0 1
1 0 0 0 2
2 2 0 2 4
3 2 0 1 4
4 2 0 2 4
5 2 0 1 4
6 0 0 0 2
7 2 2 0 3
8 2 2 1 3
9 2 2 2 3
回答by YOBEN_S
You can using duplicated
你可以使用 duplicated
df.groupby('date').id.transform(lambda x : x.duplicated(keep=False).sum())
Out[208]:
0 2
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
Name: id, dtype: int64
回答by Riley J. Graham
Another simple solution: Try combining columns for date and ID into a third column "date"+"ID". Now you can use count to find the number of duplicates for each entry in the new 3rd column.
另一个简单的解决方案:尝试将日期和 ID 列组合到第三列“日期”+“ID”中。现在,您可以使用 count 来查找新的第 3 列中每个条目的重复项数。
>>> dateID = [20001,20011,20021,20002,20012,20022,...]
>>> dateID.count("20001")
>>> 2
>>> dateID.count("20002")
>>> 2
You can count occurrences of each item in dateID using
您可以使用以下方法计算 dateID 中每个项目的出现次数
[[x,dateID.count(x)] for x in set(dateID)]
Perhaps even easier, is to use counter:
也许更简单的是使用计数器:
>>> dateID=[x,y,z,x,y,z,z]
>>> from collections import Counter
>>> counter(dateID)
Counter({'x': 2, 'y': 2, 'z': 3})