Python Pandas 在连接后重新计算索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35528119/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:33:19  来源:igfitidea点击:

Pandas recalculate index after a concatenation

pythonpandas

提问by Christopher

I have a problem where I produce a pandas dataframe by concatenating along the row axis (stacking vertically).

我有一个问题,我通过沿行轴连接(垂直堆叠)来生成熊猫数据框。

Each of the constituent dataframes has an autogenerated index (ascending numbers).

每个组成数据帧都有一个自动生成的索引(升序数字)。

After concatenation, my index is screwed up: it counts up to n (where n is the shape[0] of the corresponding dataframe), and restarts at zero at the next dataframe.

连接后,我的索引被搞砸了:它计数到 n(其中 n 是相应数据帧的形状 [0]),并在下一个数据帧处从零重新开始。

I am trying to "re-calculate the index, given the current order", or "re-index" (or so I thought). Turns out that isn't exactly what DataFrame.reindexseems to be doing.

我正在尝试“根据当前订单重新计算索引”或“重新索引”(或者我认为)。事实证明,这并不完全是DataFrame.reindex看起来在做的事情。



Here is what I tried to do:

这是我尝试做的:

train_df = pd.concat(train_class_df_list)
train_df = train_df.reindex(index=[i for i in range(train_df.shape[0])])

It failed with "cannot reindex from a duplicate axis." I don't want to change the order of my data... just need to delete the old index and set up a new one, with the order of rows preserved.

它因“无法从重复轴重新索引”而失败。我不想改变我的数据的顺序......只需要删除旧索引并设置一个新索引,并保留行的顺序。

采纳答案by Ami Tavory

After vertical concatenation, if you get an index of [0, n)followed by [0, m), all you need to do is call reset_index:

垂直串联后,如果您得到[0, n)后跟[0, m)的索引,您需要做的就是调用reset_index

train_df.reset_index(drop=True)

(you can do this in place using inplace=True).

(您可以使用 就地执行此操作inplace=True)。



import pandas as pd

>>> pd.concat([
    pd.DataFrame({'a': [1, 2]}), 
    pd.DataFrame({'a': [1, 2]})]).reset_index(drop=True)
    a
0   1
1   2
2   1
3   2

回答by Mike Müller

This should work:

这应该有效:

train_df.reset_index(inplace=True, drop=True) 

Set dropto Trueto avoid an additional column in your dataframe.

drop设置True为避免在数据框中添加额外的列。

回答by ilmarinen

If your index is autogenerated and you don't want to keep it, you can use the ignore_indexoption. `

如果您的索引是自动生成的并且您不想保留它,则可以使用该ignore_index选项。`

train_df = pd.concat(train_class_df_list, ignore_index=True)

This will autogenerate a new index for you, and my guess is that this is exactly what you are after.

这将为您自动生成一个新索引,我猜这正是您所追求的。