使用 Pandas 从另一个数据帧中删除一个数据帧

Question

提问by Federico Gentile

I have two dataframes of different size (df1nad df2). I would like to remove from df1all the rows which are stored within df2.

我有两个不同大小的数据框（df1nad df2）。我想从df1存储在df2.

So if I have df2equals to:

所以如果我有df2等于：

     A  B
0  wer  6
1  tyu  7

And df1equals to:

并且df1等于：

     A  B  C
0  qwe  5  a
1  wer  6  s
2  wer  6  d
3  rty  9  f
4  tyu  7  g
5  tyu  7  h
6  tyu  7  j
7  iop  1  k

The final result should be like so:

最终结果应该是这样的：

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

I was able to achieve my goal by using a for loop but I would like to know if there is a better and more elegant and efficient way to perform such operation.

我能够通过使用 for 循环来实现我的目标，但我想知道是否有更好、更优雅、更有效的方法来执行此类操作。

Here is the code I wrote in case you need it: import pandas as pd

这是我写的代码，以防你需要它：import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

for i, row in df2.iterrows():
    df1 = df1[(df1['A']!=row['A']) & (df1['B']!=row['B'])].reset_index(drop=True)

Answer 1

回答by jezrael

Use mergewith outer join with filter by query, last remove helper column by drop:

merge与带过滤器的外连接一起使用query，最后删除辅助列drop：

df = pd.merge(df1, df2, on=['A','B'], how='outer', indicator=True)
       .query("_merge != 'both'")
       .drop('_merge', axis=1)
       .reset_index(drop=True)
print (df)
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

Answer 2

回答by Allen

You can use np.in1d to check if any row in df1 exists in df2. And then use it as a reversed mask to select rows from df1.

您可以使用 np.in1d 检查 df1 中的任何行是否存在于 df2 中。然后将其用作反向掩码以从 df1 中选择行。

df1[~df1[['A','B']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\
                   .reset_index(drop=True)
Out[115]: 
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

Answer 3

回答by asongtoruin

pandashas a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A'and 'B'of df1and df2. We then negate this (as we want the values not in df2) and reset the index:

pandas有一个方法叫做isin，但是这依赖于唯一的索引。我们可以定义一个lambda函数从现有的创建，我们可以在此使用的列'A'和'B'中df1和df2。然后我们否定这个（因为我们想要不在中的值df2）并重置索引：

import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)

printing:

印刷：

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

Answer 4

回答by Elliot Ben

The cleanest way I found was to use drop from pandas using the index of the dataframe you want to drop:

我发现的最干净的方法是使用要删除的数据帧的索引从 Pandas 中删除：

df1.drop(df2.index, axis=0,inplace=True)

使用 Pandas 从另一个数据帧中删除一个数据帧

提问by Federico Gentile

回答by jezrael

回答by Allen

回答by asongtoruin

回答by Elliot Ben

相关推荐

最近更新

标签

使用 Pandas 从另一个数据帧中删除一个数据帧

提问by Federico Gentile

回答by jezrael

回答by Allen

回答by asongtoruin

回答by Elliot Ben

相关推荐

AttributeError: 'module' 对象在 Pandas 中没有属性 'to_numeric'

Python pandas 数据框和 excel：添加单元格背景色

pandas 在熊猫中按范围加入/合并的最佳方式

Pandas 滚动回归：循环的替代方案

相关推荐

最近更新

标签