pandas 如何在 Python 中删除数据框的子集?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39408109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:59:15  来源:igfitidea点击:

How to remove a subset of a data frame in Python?

pythonpandassubset

提问by XUTADO

My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.

我的数据帧 df 是 3020x4。我想从原始文件中删除一个子集 df1 20x4。换句话说,我只想得到形状为 3000x4 的差异。我尝试了以下但没有奏效。它准确地返回了 df。你能帮忙吗?谢谢。

new_df = df.drop(df1)

回答by EdChum

As you seem to be unable to post a representative example I will demonstrate one approach using mergewith param indicator=True:

由于您似乎无法发布具有代表性的示例,因此我将演示一种使用mergewith param 的方法indicator=True

So generate some data:

所以生成一些数据:

In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[116]:
          a         b         c
0 -0.134933 -0.664799 -1.611790
1  1.457741  0.652709 -1.154430
2  0.534560 -0.781352  1.978084
3  0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092  1.179237

take a subset:

取一个子集:

In [118]:
df_subset=df.iloc[2:3]
df_subset

Out[118]:
         a         b         c
2  0.53456 -0.781352  1.978084

now perform a left mergewith param indicator=Truethis will add _mergecolumn which indicates whether the row is left_only, bothor right_only(the latter won't appear in this example) and we filter the merged df to show only left_only:

现在merge使用 param执行 leftindicator=True这将添加_merge列,该列指示该行是left_only,both还是right_only(后者不会出现在本例中),我们过滤合并的 df 以仅显示left_only

In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new

Out[121]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

here is the original merged df:

这是原始合并的 df:

In [122]:
df.merge(df_subset, how='left', indicator=True)

Out[122]:
          a         b         c     _merge
0 -0.134933 -0.664799 -1.611790  left_only
1  1.457741  0.652709 -1.154430  left_only
2  0.534560 -0.781352  1.978084       both
3  0.844243 -0.234208 -2.415347  left_only
4 -0.118761 -0.287092  1.179237  left_only

回答by gciriani

The pandas cheat sheetsuggests also the following technique

Pandas速查表还提出了以下技术

adf[~adf.x1.isin(bdf.x1)]

where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.

其中 x1 是要比较的列,adf 是从中取出出现在数据帧 bdf 中的相应行的数据帧。

The particular question asked by the OP can also be solved by

OP提出的特定问题也可以通过以下方式解决

new_df = df.drop(df1.index)