如何删除 Pandas 中两个数据框中的公共行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38681340/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:42:35  来源:igfitidea点击:

How to remove common rows in two dataframes in Pandas?

python-2.7pandasscikit-learn

提问by user3243499

I have two dataframes - df1and df2.

我有两个数据框 -df1df2.

df1 has row1,row2,row3,row4,row5
df2 has row2,row5

I want to have a new dataframe such that df1-df2. That is, the resultant dataframe should have rows as - row1,row3,row4.

我想要一个新的数据框,这样df1-df2. 也就是说,结果数据帧的行应该是 - row1,row3,row4

采纳答案by Nickil Maveli

You can use pandas.concatto concatenate the two dataframes rowwise, followed by drop_duplicatesto remove all the duplicated rows in them.

您可以使用pandas.concat逐行连接两个数据框,然后drop_duplicates删除其中的所有重复行。

In [1]: import pandas as pd
df_1 = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})
df_2 = pd.DataFrame({"A":["foo", "bar", "foo", "bar"], "B":[1,0,1,0], "C":["A","B","A","B"]})

In [2]: df = pd.concat([df_1, df_2])

In [3]: df
Out[3]: 
     A  B  C
0  foo  0  A
1  foo  1  A
2  foo  1  B
3  bar  1  A
0  foo  1  A
1  bar  0  B
2  foo  1  A
3  bar  0  B

In [4]: df.drop_duplicates(keep=False)
Out[4]: 
     A  B  C
0  foo  0  A
2  foo  1  B
3  bar  1  A

回答by Olivier Ma

You can use the index.difference()function

您可以使用该index.difference()功能

import numpy as np
import pandas as pd

df1 = pd.DataFrame(np.random.randn(5, 2), index= ['row' + str(i) for i in range(1, 6)])
df1

        0             1
row1    0.249451    -0.107651
row2    1.295390    -1.773707
row3    -0.893647   -0.683306
row4    -1.090551   0.016833
row5    0.864612    0.369138

df2 = pd.DataFrame(np.random.randn(2, 2), index= ['row' + str(i) for i in [2, 5]])
df2

        0           1
row2    0.549396    -0.675574
row5    1.348785    0.942216

df1.loc[df1.index.difference(df2.index), ]

        0           1
row1    0.249451    -0.107651
row3    -0.893647   -0.683306
row4    -1.090551   0.016833

回答by Manideep Karthik

For these kind of questions, see left join in pandas.

对于此类问题,请参阅 left join in pandas。