pandas python：计算列中重复条目的数量

Question

提问by freddy888

I have the following df:

我有以下 df：

date       id 
2000        1
2001        1 
2002        1
2000        2
2001        2
2002        2
2000        1
2001        1
2002        1

I want to count per date how many duplicates of id there are. The result should look like this because on every date the id 1 exists twice:

我想计算每个日期有多少 id 重复。结果应该是这样的，因为在每个日期 id 1 都存在两次：

date       id        count
2000        1          2
2001        1          2
2002        1          2
2000        2          2
2001        2          2
2002        2          2
2000        1          2
2001        1          2
2002        1          2

I tried something like this, but this gives me 1s when id is 2.

我尝试过这样的事情，但是当 id 为 2 时，这给了我 1s。

df["count"] = df.groupby(["date", "id"])["count"].transform("count")

Answer 1

回答by BrokenRobot

The problem with your original code was a simple fix.

原始代码的问题是一个简单的修复。

df['count'] = df.groupby(['date', 'id']).transform('count')

If I use group and transform it to a new column it will result in:

如果我使用 group 并将其转换为新列，它将导致：

df = pd.DataFrame(np.random.randint(0,3,size=(10, 3)), columns=['A', 'B', 'C'])
df['count'] = df.groupby(['A', 'B'])['C'].transform('count')
print(df)

Resulting in:

导致：

   A  B  C  count
0  1  2  0      1
1  0  0  0      2
2  2  0  2      4
3  2  0  1      4
4  2  0  2      4
5  2  0  1      4
6  0  0  0      2
7  2  2  0      3
8  2  2  1      3
9  2  2  2      3

Answer 2

回答by YOBEN_S

You can using duplicated

你可以使用 duplicated

df.groupby('date').id.transform(lambda x : x.duplicated(keep=False).sum())
Out[208]: 
0    2
1    2
2    2
3    2
4    2
5    2
6    2
7    2
8    2
Name: id, dtype: int64

Answer 3

回答by Riley J. Graham

Another simple solution: Try combining columns for date and ID into a third column "date"+"ID". Now you can use count to find the number of duplicates for each entry in the new 3rd column.

另一个简单的解决方案：尝试将日期和 ID 列组合到第三列“日期”+“ID”中。现在，您可以使用 count 来查找新的第 3 列中每个条目的重复项数。

>>> dateID = [20001,20011,20021,20002,20012,20022,...]
>>> dateID.count("20001")
>>> 2
>>> dateID.count("20002")
>>> 2

You can count occurrences of each item in dateID using

您可以使用以下方法计算 dateID 中每个项目的出现次数

[[x,dateID.count(x)] for x in set(dateID)]

Perhaps even easier, is to use counter:

也许更简单的是使用计数器：

>>> dateID=[x,y,z,x,y,z,z]
>>> from collections import Counter
>>> counter(dateID)
Counter({'x': 2, 'y': 2, 'z': 3})

pandas python：计算列中重复条目的数量

提问by freddy888

回答by BrokenRobot

回答by YOBEN_S

回答by Riley J. Graham

相关推荐

最近更新

标签

pandas python：计算列中重复条目的数量

提问by freddy888

回答by BrokenRobot

回答by YOBEN_S

回答by Riley J. Graham

相关推荐

pandas 忽略nan的Python比较

具有冗余 nan 类别的 Pandas groupby

pandas ValueError：DataFrame 的真值不明确

pandas.Panel 弃用警告实际上推荐什么？

相关推荐

最近更新

标签