Python 在 Pandas DataFrame 中删除重复索引的最快方法

Question

提问by RukTech

If I want to drop duplicated index in a dataframe the following doesn't work for obvious reasons:

如果我想删除数据帧中的重复索引，以下原因显然不起作用：

myDF.drop_duplicates(cols=index)

and

和

myDF.drop_duplicates(cols='index')

looks for a column named 'index'

查找名为“index”的列

If I want to drop an index I have to do:

如果我想删除索引，我必须这样做：

myDF['index'] = myDF.index
myDF= myDF.drop_duplicates(cols='index')
myDF.set_index = myDF['index']
myDF= myDF.drop('index', axis =1)

Is there a more efficient way?

有没有更有效的方法？

Answer 1

采纳答案by CT Zhu

Simply: DF.groupby(DF.index).first()

简单地： DF.groupby(DF.index).first()

Answer 2

回答by behzad.nouri

You can use numpy.uniqueto obtain the index of unique values and use ilocto get those indices:

您可以使用numpy.unique来获取唯一值的索引并使用iloc来获取这些索引：

>>> df
        val
A  0.021372
B  1.229482
D -1.571025
D -0.110083
C  0.547076
B -0.824754
A -1.378705
B -0.234095
C -1.559653
B -0.531421

[10 rows x 1 columns]

>>> idx = np.unique(df.index, return_index=True)[1]
>>> df.iloc[idx]
        val
A  0.021372
B  1.229482
C  0.547076
D -1.571025

[4 rows x 1 columns]

Answer 3

回答by danielstn

The 'duplicated' method works for dataframes and for series. Just select on those rows which aren't marked as having a duplicate index:

“重复”方法适用于数据框和系列。只需选择那些没有标记为具有重复索引的行：

df[~df.index.duplicated()]

Python 在 Pandas DataFrame 中删除重复索引的最快方法

提问by RukTech

采纳答案by CT Zhu

回答by behzad.nouri

回答by danielstn

相关推荐

最近更新

标签

Python 在 Pandas DataFrame 中删除重复索引的最快方法

提问by RukTech

采纳答案by CT Zhu

回答by behzad.nouri

回答by danielstn

相关推荐

Python 3.3 中的哈希函数在会话之间返回不同的结果

Python 在 numpy 数组中设置空值

python中的Dijkstra算法

在 Python 中读取 BSON 文件？

相关推荐

最近更新

标签