Python 在 Pandas DataFrame 中删除重复索引的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22918212/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest Way to Drop Duplicated Index in a Pandas DataFrame
提问by RukTech
If I want to drop duplicated index in a dataframe the following doesn't work for obvious reasons:
如果我想删除数据帧中的重复索引,以下原因显然不起作用:
myDF.drop_duplicates(cols=index)
and
和
myDF.drop_duplicates(cols='index')
looks for a column named 'index'
查找名为“index”的列
If I want to drop an index I have to do:
如果我想删除索引,我必须这样做:
myDF['index'] = myDF.index
myDF= myDF.drop_duplicates(cols='index')
myDF.set_index = myDF['index']
myDF= myDF.drop('index', axis =1)
Is there a more efficient way?
有没有更有效的方法?
采纳答案by CT Zhu
Simply: DF.groupby(DF.index).first()
简单地: DF.groupby(DF.index).first()
回答by behzad.nouri
You can use numpy.unique
to obtain the index of unique values and use iloc
to get those indices:
您可以使用numpy.unique
来获取唯一值的索引并使用iloc
来获取这些索引:
>>> df
val
A 0.021372
B 1.229482
D -1.571025
D -0.110083
C 0.547076
B -0.824754
A -1.378705
B -0.234095
C -1.559653
B -0.531421
[10 rows x 1 columns]
>>> idx = np.unique(df.index, return_index=True)[1]
>>> df.iloc[idx]
val
A 0.021372
B 1.229482
C 0.547076
D -1.571025
[4 rows x 1 columns]
回答by danielstn
The 'duplicated' method works for dataframes and for series. Just select on those rows which aren't marked as having a duplicate index:
“重复”方法适用于数据框和系列。只需选择那些没有标记为具有重复索引的行:
df[~df.index.duplicated()]