pandas 如何使用熊猫选择重复的行？

Question

提问by Federico Gentile

I have a dataframe like this:

我有一个这样的数据框：

import pandas as pd
dic = {'A':[100,200,250,300],
       'B':['ci','ci','po','pa'],
       'C':['s','t','p','w']}
df = pd.DataFrame(dic)

My goal is to separate the row in 2 dataframes:

我的目标是将行分成 2 个数据帧：

df1 = contains all the rows that do not repeat values along column B(unque rows).
df2 = containts only the rows who repeat themeselves.

df1 = 包含沿列不重复值的所有行（唯一B行）。
df2 = 只包含重复自己的行。

The result should look like this:

结果应如下所示：

df1 =      A  B C         df2 =     A  B C
      0  250 po p               0  100 ci s 
      1  300 pa w               1  250 ci t

Note:

笔记：

the dataframes could be in general very big and have many values that repeat in column B so the answer should be as generic as possible
- if there are no duplicates, df2 should be empty! all the results should be in df1

数据框通常可能非常大，并且有许多值在 B 列中重复，因此答案应尽可能通用
- 如果没有重复，df2 应该是空的！所有结果都应该在 df1 中

Answer 1

回答by jezrael

You can use Series.duplicatedwith parameter keep=Falseto create a mask for all duplicates and then boolean indexing, ~to invert the mask:

您可以使用Series.duplicated与参数keep=False创建所有重复一个面具，然后boolean indexing，~反转mask：

mask = df.B.duplicated(keep=False)
print (mask)
0     True
1     True
2    False
3    False
Name: B, dtype: bool

print (df[mask])
     A   B  C
0  100  ci  s
1  200  ci  t

print (df[~mask])
     A   B  C
2  250  po  p
3  300  pa  w

pandas 如何使用熊猫选择重复的行？

提问by Federico Gentile

回答by jezrael

相关推荐

最近更新

标签

pandas 如何使用熊猫选择重复的行？

提问by Federico Gentile

回答by jezrael

相关推荐

带有 WHERE 子句的 JOIN 的 Pandas 模拟

pandas 如何删除熊猫数据框中的行？

pandas AttributeError: 'DataFrame' 对象没有属性 'Address'

如何使用来自用户输入的 Pandas 数据框

相关推荐

最近更新

标签