Python 在 Pandas DataFrame 中删除重复索引的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22918212/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:58:49  来源:igfitidea点击:

Fastest Way to Drop Duplicated Index in a Pandas DataFrame

pythonpandasduplicate-removal

提问by RukTech

If I want to drop duplicated index in a dataframe the following doesn't work for obvious reasons:

如果我想删除数据帧中的重复索引,以下原因显然不起作用:

myDF.drop_duplicates(cols=index)

and

myDF.drop_duplicates(cols='index') 

looks for a column named 'index'

查找名为“index”的列

If I want to drop an index I have to do:

如果我想删除索引,我必须这样做:

myDF['index'] = myDF.index
myDF= myDF.drop_duplicates(cols='index')
myDF.set_index = myDF['index']
myDF= myDF.drop('index', axis =1)

Is there a more efficient way?

有没有更有效的方法?

采纳答案by CT Zhu

Simply: DF.groupby(DF.index).first()

简单地: DF.groupby(DF.index).first()

回答by behzad.nouri

You can use numpy.uniqueto obtain the index of unique values and use ilocto get those indices:

您可以使用numpy.unique来获取唯一值的索引并使用iloc来获取这些索引:

>>> df
        val
A  0.021372
B  1.229482
D -1.571025
D -0.110083
C  0.547076
B -0.824754
A -1.378705
B -0.234095
C -1.559653
B -0.531421

[10 rows x 1 columns]

>>> idx = np.unique(df.index, return_index=True)[1]
>>> df.iloc[idx]
        val
A  0.021372
B  1.229482
C  0.547076
D -1.571025

[4 rows x 1 columns]

回答by danielstn

The 'duplicated' method works for dataframes and for series. Just select on those rows which aren't marked as having a duplicate index:

“重复”方法适用于数据框和系列。只需选择那些没有标记为具有重复索引的行:

df[~df.index.duplicated()]