pandas 如何在 Python 中删除数据框的子集?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39408109/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove a subset of a data frame in Python?
提问by XUTADO
My dataframe df is 3020x4. I'd like to remove a subset df1 20x4 out of the original. In other words, I just want to get the difference whose shape is 3000x4. I tried the below but it did not work. It returned exactly df. Would you please help? Thanks.
我的数据帧 df 是 3020x4。我想从原始文件中删除一个子集 df1 20x4。换句话说,我只想得到形状为 3000x4 的差异。我尝试了以下但没有奏效。它准确地返回了 df。你能帮忙吗?谢谢。
new_df = df.drop(df1)
回答by EdChum
As you seem to be unable to post a representative example I will demonstrate one approach using merge
with param indicator=True
:
由于您似乎无法发布具有代表性的示例,因此我将演示一种使用merge
with param 的方法indicator=True
:
So generate some data:
所以生成一些数据:
In [116]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df
Out[116]:
a b c
0 -0.134933 -0.664799 -1.611790
1 1.457741 0.652709 -1.154430
2 0.534560 -0.781352 1.978084
3 0.844243 -0.234208 -2.415347
4 -0.118761 -0.287092 1.179237
take a subset:
取一个子集:
In [118]:
df_subset=df.iloc[2:3]
df_subset
Out[118]:
a b c
2 0.53456 -0.781352 1.978084
now perform a left merge
with param indicator=True
this will add _merge
column which indicates whether the row is left_only
, both
or right_only
(the latter won't appear in this example) and we filter the merged df to show only left_only
:
现在merge
使用 param执行 leftindicator=True
这将添加_merge
列,该列指示该行是left_only
,both
还是right_only
(后者不会出现在本例中),我们过滤合并的 df 以仅显示left_only
:
In [121]:
df_new = df.merge(df_subset, how='left', indicator=True)
df_new = df_new[df_new['_merge'] == 'left_only']
df_new
Out[121]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
here is the original merged df:
这是原始合并的 df:
In [122]:
df.merge(df_subset, how='left', indicator=True)
Out[122]:
a b c _merge
0 -0.134933 -0.664799 -1.611790 left_only
1 1.457741 0.652709 -1.154430 left_only
2 0.534560 -0.781352 1.978084 both
3 0.844243 -0.234208 -2.415347 left_only
4 -0.118761 -0.287092 1.179237 left_only
回答by gciriani
The pandas cheat sheetsuggests also the following technique
的Pandas速查表还提出了以下技术
adf[~adf.x1.isin(bdf.x1)]
where x1 is the column being compared, adf is the dataframe from which the corresponding rows appearing in dataframe bdf are taken out.
其中 x1 是要比较的列,adf 是从中取出出现在数据帧 bdf 中的相应行的数据帧。
The particular question asked by the OP can also be solved by
OP提出的特定问题也可以通过以下方式解决
new_df = df.drop(df1.index)