使用 Pandas 从另一个数据帧中删除一个数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44546086/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:47:08  来源:igfitidea点击:

Remove one dataframe from another with Pandas

pythonpandasdataframecomparedifference

提问by Federico Gentile

I have two dataframes of different size (df1nad df2). I would like to remove from df1all the rows which are stored within df2.

我有两个不同大小的数据框(df1nad df2)。我想从df1存储在df2.

So if I have df2equals to:

所以如果我有df2等于:

     A  B
0  wer  6
1  tyu  7

And df1equals to:

并且df1等于:

     A  B  C
0  qwe  5  a
1  wer  6  s
2  wer  6  d
3  rty  9  f
4  tyu  7  g
5  tyu  7  h
6  tyu  7  j
7  iop  1  k

The final result should be like so:

最终结果应该是这样的:

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

I was able to achieve my goal by using a for loop but I would like to know if there is a better and more elegant and efficient way to perform such operation.

我能够通过使用 for 循环来实现我的目标,但我想知道是否有更好、更优雅、更有效的方法来执行此类操作。

Here is the code I wrote in case you need it: import pandas as pd

这是我写的代码,以防你需要它:import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

for i, row in df2.iterrows():
    df1 = df1[(df1['A']!=row['A']) & (df1['B']!=row['B'])].reset_index(drop=True)

回答by jezrael

Use mergewith outer join with filter by query, last remove helper column by drop:

merge与带过滤器的外连接一起使用query,最后删除辅助列drop

df = pd.merge(df1, df2, on=['A','B'], how='outer', indicator=True)
       .query("_merge != 'both'")
       .drop('_merge', axis=1)
       .reset_index(drop=True)
print (df)
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答by Allen

You can use np.in1d to check if any row in df1 exists in df2. And then use it as a reversed mask to select rows from df1.

您可以使用 np.in1d 检查 df1 中的任何行是否存在于 df2 中。然后将其用作反向掩码以从 df1 中选择行。

df1[~df1[['A','B']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\
                   .reset_index(drop=True)
Out[115]: 
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答by asongtoruin

pandashas a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A'and 'B'of df1and df2. We then negate this (as we want the values not in df2) and reset the index:

pandas有一个方法叫做isin,但是这依赖于唯一的索引。我们可以定义一个lambda函数从现有的创建,我们可以在此使用的列'A''B'df1df2。然后我们否定这个(因为我们想要不在 中的值df2)并重置索引:

import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)

printing:

印刷:

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答by Elliot Ben

The cleanest way I found was to use drop from pandas using the index of the dataframe you want to drop:

我发现的最干净的方法是使用要删除的数据帧的索引从 Pandas 中删除:

df1.drop(df2.index, axis=0,inplace=True)