Python Pandas 在连接后重新计算索引

Question

提问by Christopher

I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

我有一个问题，我通过沿行轴连接（垂直堆叠）来生成熊猫数据框。

Each of the constituent dataframes has an autogenerated index (ascending numbers).

每个组成数据帧都有一个自动生成的索引（升序数字）。

After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

连接后，我的索引被搞砸了：它计数到 n（其中 n 是相应数据帧的形状 [0]），并在下一个数据帧处从零重新开始。

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindexseems to be doing.

我正在尝试“根据当前订单重新计算索引”或“重新索引”（或者我认为）。事实证明，这并不完全是DataFrame.reindex看起来在做的事情。

Here is what I tried to do:

这是我尝试做的：

train_df = pd.concat(train_class_df_list)
train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])

It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

它因“无法从重复轴重新索引”而失败。我不想改变我的数据的顺序......只需要删除旧索引并设置一个新索引，并保留行的顺序。

Answer 1

采纳答案by Ami Tavory

After vertical concatenation, if you get an index of [0, n)followed by [0, m), all you need to do is call reset_index:

垂直串联后，如果您得到[0, n)后跟[0, m)的索引，您需要做的就是调用reset_index：

train_df.reset_index(drop=True)

(you can do this in place using inplace=True).

（您可以使用就地执行此操作inplace=True）。

import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

Answer 2

回答by Mike Müller

This should work:

这应该有效：

train_df.reset_index(inplace=True, drop=True)

Set dropto Trueto avoid an additional column in your dataframe.

将drop设置True为避免在数据框中添加额外的列。

Answer 3

回答by ilmarinen

If your index is autogenerated and you don't want to keep it, you can use the ignore_indexoption. `

如果您的索引是自动生成的并且您不想保留它，则可以使用该ignore_index选项。`

train_df = pd.concat(train_class_df_list, ignore_index=True)

This will autogenerate a new index for you, and my guess is that this is exactly what you are after.

这将为您自动生成一个新索引，我猜这正是您所追求的。

Python Pandas 在连接后重新计算索引

提问by Christopher

采纳答案by Ami Tavory

回答by Mike Müller

回答by ilmarinen

相关推荐

最近更新

标签

Python Pandas 在连接后重新计算索引

提问by Christopher

采纳答案by Ami Tavory

回答by Mike Müller

回答by ilmarinen

相关推荐

Python将列表重塑为ndim数组

为 python3 创建别名

Python 如何将模块添加到 Anaconda

python pandas - 将列除以另一列

相关推荐

最近更新

标签