pandas “子集”不适用于 drop_duplicates 熊猫数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40670438/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
'subset' not working for drop_duplicates pandas dataframe
提问by unasalusvictis
I have a df that looks like this:
我有一个 df 看起来像这样:
A B C D NEW
0 1 Adhoc_Task WID WI_DTL []
1 1 Arun_adhoc_load ATT IXN_1 (IXN,)
2 1 Arun_adhoc_load ATT IXN_10 (IXN,)
3 1 Arun_adhoc_load ATT IXN_100 (IXN,)
4 1 Arun_adhoc_load ATT IXN_101 (IXN,)
5 2 Batch_Support ATT CDS_STATUS []
6 2 Batch_Support ATT CDS_CONTROL []
7 2 Batch_Support ATT CDS_ORA_STATUS []
8 2 Batch_Support ATT REP_FILTER []
9 1 online_load ATT TAX_3 (TAX,)
10 1 online_load ATT TAX_4 (TAX,)
11 1 online_load ATT TAX_8 (TAX,)
12 1 online_load ATT TAX_11 (TAX,)
Desired output would look like this:
所需的输出如下所示:
A B C D NEW
0 1 Adhoc_Task WID WI_DTL []
1 1 Arun_adhoc_load ATT IXN_1 (IXN,)
5 2 Batch_Support ATT CDS_STATUS []
9 1 online_load ATT TAX_3 (TAX,)
I'm trying to drop duplicate rows based off column B. However, when I run
我正在尝试删除基于 B 列的重复行。但是,当我运行时
df.drop_duplicates(subset = ['B'], keep='first', inplace=True)
I get the following error:
我收到以下错误:
TypeError: drop_duplicates() got an unexpected keyword argument 'subset'
I'm running pandas 0.19.1 from python 3, so I took a look at the documentation here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
我从python 3运行pandas 0.19.1,所以我看了一下这里的文档:http: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html
I haven't the foggiest of what I'm doing wrong with subset
. How would I drop duplicates from the DataFrame based off the values in one column?
我对我做错的事情一无所知subset
。如何根据一列中的值从 DataFrame 中删除重复项?
回答by Boud
For whatever reason in your code, df became a Series object. Check type(df)
just before the failing drop_duplicates
call. That function has no subset
argument for the Series.
无论出于何种原因,在您的代码中,df 都变成了一个 Series 对象。type(df)
在失败的drop_duplicates
呼叫之前检查。该函数subset
对 Series没有参数。