pandas “子集”不适用于 drop_duplicates 熊猫数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40670438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:27:39  来源:igfitidea点击:

'subset' not working for drop_duplicates pandas dataframe

pythonpandasdataframeduplicates

提问by unasalusvictis

I have a df that looks like this:

我有一个 df 看起来像这样:

    A                B    C               D     NEW
0   1       Adhoc_Task  WID          WI_DTL      []  
1   1  Arun_adhoc_load  ATT           IXN_1  (IXN,)
2   1  Arun_adhoc_load  ATT          IXN_10  (IXN,)
3   1  Arun_adhoc_load  ATT         IXN_100  (IXN,)
4   1  Arun_adhoc_load  ATT         IXN_101  (IXN,)
5   2    Batch_Support  ATT      CDS_STATUS      []
6   2    Batch_Support  ATT     CDS_CONTROL      []
7   2    Batch_Support  ATT  CDS_ORA_STATUS      []
8   2    Batch_Support  ATT      REP_FILTER      []
9   1      online_load  ATT           TAX_3  (TAX,)
10  1      online_load  ATT           TAX_4  (TAX,)
11  1      online_load  ATT           TAX_8  (TAX,)
12  1      online_load  ATT          TAX_11  (TAX,)

Desired output would look like this:

所需的输出如下所示:

    A                B    C               D     NEW
0   1       Adhoc_Task  WID          WI_DTL      []  
1   1  Arun_adhoc_load  ATT           IXN_1  (IXN,)
5   2    Batch_Support  ATT      CDS_STATUS      []
9   1      online_load  ATT           TAX_3  (TAX,)

I'm trying to drop duplicate rows based off column B. However, when I run

我正在尝试删除基于 B 列的重复行。但是,当我运行时

df.drop_duplicates(subset = ['B'], keep='first', inplace=True)

I get the following error:

我收到以下错误:

TypeError: drop_duplicates() got an unexpected keyword argument 'subset'

I'm running pandas 0.19.1 from python 3, so I took a look at the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

我从python 3运行pandas 0.19.1,所以我看了一下这里的文档:http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

I haven't the foggiest of what I'm doing wrong with subset. How would I drop duplicates from the DataFrame based off the values in one column?

我对我做错的事情一无所知subset。如何根据一列中的值从 DataFrame 中删除重复项?

回答by Boud

For whatever reason in your code, df became a Series object. Check type(df)just before the failing drop_duplicatescall. That function has no subsetargument for the Series.

无论出于何种原因,在您的代码中,df 都变成了一个 Series 对象。type(df)在失败的drop_duplicates呼叫之前检查。该函数subset对 Series没有参数。