pandas 使用重复的索引值重新索引数据框

Question

提问by Justin H

So I imported and merged 4 csv's into one dataframe called data. However, upon inspecting the dataframe's index with:

因此，我将 4 个 csv 导入并合并到一个名为 data.csv 的数据框中。但是，在检查数据帧的索引时：

index_series = pd.Series(data.index.values)
index_series.value_counts()

I see that multiple index entries have 4 counts. I want to completely reindex the data dataframe so each row now has a unique index value. I tried:

我看到多个索引条目有 4 个计数。我想完全重新索引数据数据框，这样每一行现在都有一个唯一的索引值。我试过：

data.reindex(np.arange(len(data)))

which gave the error "ValueError: cannot reindex from a duplicate axis." A google search leads me to think this error is because the there are up to 4 rows that share a same index value. Any idea how I can do this reindexing without dropping any rows? I don't particularly care about the order of the rows either as I can always sort it.

这给出了错误“ValueError：无法从重复轴重新索引”。谷歌搜索让我认为这个错误是因为最多有 4 行共享相同的索引值。知道如何在不删除任何行的情况下进行重新索引吗？我也不特别关心行的顺序，因为我总是可以对其进行排序。

UPDATE: So in the end I did find a way to reindex like I wanted.

更新：所以最后我确实找到了一种像我想要的那样重新索引的方法。

data['index'] = np.arange(len(data))
data = data.set_index('index')

As I understand it, I just added a new column called 'index' to my data frame, and then set that column as my index. As for my csv's, they were the four csv's under "download loan data" on this page of Lending Club loan stats.

据我了解，我只是在我的数据框中添加了一个名为“index”的新列，然后将该列设置为我的索引。至于我的 csv，它们是Lending Club 贷款统计页面上“下载贷款数据”下的四个 csv 。

Answer 1

回答by JohnE

It's pretty easy to replicate your error with this sample data:

使用此示例数据很容易复制您的错误：

In [92]: data = pd.DataFrame( [33,55,88,22], columns=['x'], index=[0,0,1,2] )

In [93]: data.index.is_unique
Out[93]: False

In [94:] data.reindex(np.arange(len(data)))  # same error message

The problem is because reindexrequires unique index values. In this case, you don't want to preserve the old index values, you merely want new index values that are unique. The easiest way to do that is:

问题是因为reindex需要唯一的索引值。在这种情况下，您不想保留旧的索引值，而只需要唯一的新索引值。最简单的方法是：

In [95]: data.reset_index(drop=True)
Out[72]: 
    x
0  33
1  55
2  88
3  22

Note that you can leave off drop=Trueif you want to retain the old index values.

请注意，drop=True如果您想保留旧的索引值，您可以取消。

pandas 使用重复的索引值重新索引数据框

提问by Justin H

回答by JohnE

相关推荐

最近更新

标签

pandas 使用重复的索引值重新索引数据框

提问by Justin H

回答by JohnE

相关推荐

使用 h5py 保存 Pandas DataFrame 以便与其他 hdf5 阅读器进行互操作

返回 inf 的 Pandas DataFrame 列的 mean()：我该如何解决这个问题？

ValueError：无法使用 isin 和 pandas 从重复轴重新索引

pandas 如何通过不包含子字符串的单元格过滤熊猫数据框？

相关推荐

最近更新

标签