pandas 如何在 Python 中删除数据框的子集？

Question

提问by XUTADO

My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.

我的数据帧 df 是 3020x4。我想从原始文件中删除一个子集 df1 20x4。换句话说，我只想得到形状为 3000x4 的差异。我尝试了以下但没有奏效。它准确地返回了 df。你能帮忙吗？谢谢。

new_df = df.drop(df1)

Answer 1

回答by EdChum

As you seem to be unable to post a representative example I will demonstrate one approach using mergewith param indicator=True:

由于您似乎无法发布具有代表性的示例，因此我将演示一种使用mergewith param 的方法indicator=True：

So generate some data:

所以生成一些数据：

In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[116]:
          a         b         c
0 -0.134933 -0.664799 -1.611790
1  1.457741  0.652709 -1.154430
2  0.534560 -0.781352  1.978084
3  0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092  1.179237

take a subset:

取一个子集：

In [118]:
df_subset=df.iloc[2:3]
df_subset

Out[118]:
         a         b         c
2  0.53456 -0.781352  1.978084

now perform a left mergewith param indicator=Truethis will add _mergecolumn which indicates whether the row is left_only, bothor right_only(the latter won't appear in this example) and we filter the merged df to show only left_only:

现在merge使用 param执行 leftindicator=True这将添加_merge列，该列指示该行是left_only,both还是right_only（后者不会出现在本例中），我们过滤合并的 df 以仅显示left_only：

In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new

Out[121]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

here is the original merged df:

这是原始合并的 df：

In [122]:
df.merge(df_subset, how='left', indicator=True)

Out[122]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
2  0.534560 -0.781352  1.978084       both
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

Answer 2

回答by gciriani

The pandas cheat sheetsuggests also the following technique

的Pandas速查表还提出了以下技术

adf[~adf.x1.isin(bdf.x1)]

where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.

其中 x1 是要比较的列，adf 是从中取出出现在数据帧 bdf 中的相应行的数据帧。

The particular question asked by the OP can also be solved by

OP提出的特定问题也可以通过以下方式解决

new_df = df.drop(df1.index)

pandas 如何在 Python 中删除数据框的子集？

提问by XUTADO

回答by EdChum

回答by gciriani

相关推荐

最近更新

标签

pandas 如何在 Python 中删除数据框的子集？

提问by XUTADO

回答by EdChum

回答by gciriani

相关推荐

pandas 展平熊猫数据透视表

按特定顺序排序（情况：pandas DataFrame Groupby）

pandas 根据对象的类型（即 str ）从 DataFrame 中选择行

Python：使用给定的列为带有 x 轴的 Pandas 数据框绘制条形图

相关推荐

最近更新

标签