pandas 使用重复的索引值重新索引数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30986989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reindex a dataframe with duplicate index values
提问by Justin H
So I imported and merged 4 csv's into one dataframe called data. However, upon inspecting the dataframe's index with:
因此,我将 4 个 csv 导入并合并到一个名为 data.csv 的数据框中。但是,在检查数据帧的索引时:
index_series = pd.Series(data.index.values)
index_series.value_counts()
I see that multiple index entries have 4 counts. I want to completely reindex the data dataframe so each row now has a unique index value. I tried:
我看到多个索引条目有 4 个计数。我想完全重新索引数据数据框,这样每一行现在都有一个唯一的索引值。我试过:
data.reindex(np.arange(len(data)))
which gave the error "ValueError: cannot reindex from a duplicate axis." A google search leads me to think this error is because the there are up to 4 rows that share a same index value. Any idea how I can do this reindexing without dropping any rows? I don't particularly care about the order of the rows either as I can always sort it.
这给出了错误“ValueError:无法从重复轴重新索引”。谷歌搜索让我认为这个错误是因为最多有 4 行共享相同的索引值。知道如何在不删除任何行的情况下进行重新索引吗?我也不特别关心行的顺序,因为我总是可以对其进行排序。
UPDATE: So in the end I did find a way to reindex like I wanted.
更新:所以最后我确实找到了一种像我想要的那样重新索引的方法。
data['index'] = np.arange(len(data))
data = data.set_index('index')
As I understand it, I just added a new column called 'index' to my data frame, and then set that column as my index. As for my csv's, they were the four csv's under "download loan data" on this page of Lending Club loan stats.
据我了解,我只是在我的数据框中添加了一个名为“index”的新列,然后将该列设置为我的索引。至于我的 csv,它们是Lending Club 贷款统计页面上“下载贷款数据”下的四个 csv 。
回答by JohnE
It's pretty easy to replicate your error with this sample data:
使用此示例数据很容易复制您的错误:
In [92]: data = pd.DataFrame( [33,55,88,22], columns=['x'], index=[0,0,1,2] )
In [93]: data.index.is_unique
Out[93]: False
In [94:] data.reindex(np.arange(len(data))) # same error message
The problem is because reindexrequires unique index values. In this case, you don't want to preserve the old index values, you merely want new index values that are unique. The easiest way to do that is:
问题是因为reindex需要唯一的索引值。在这种情况下,您不想保留旧的索引值,而只需要唯一的新索引值。最简单的方法是:
In [95]: data.reset_index(drop=True)
Out[72]:
x
0 33
1 55
2 88
3 22
Note that you can leave off drop=Trueif you want to retain the old index values.
请注意,drop=True如果您想保留旧的索引值,您可以取消。

