Python 熊猫重新索引数据框问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20037966/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:20:41  来源:igfitidea点击:

Pandas reindexing data frame issue

pythonpandas

提问by Edgar Aroutiounian

Say I have the following data frame,

假设我有以下数据框,

         A       B
0  1986-87  232131
1  1987-88  564564
2  1988-89  123125
               ...

And so on.

等等。

I'm trying to reindex, with <myFrame>.set_index('A'), so that I get

我正在尝试使用 重新索引<myFrame>.set_index('A'),以便我得到

                B
  1986-87  232131
  1987-88  564564
  1988-89  123125

but I keep getting this instead:

但我一直得到这个:

               B
       A       
 1986-87  232131
 1987-88  564564
 1988-89  123125

and its annoying as heck cause I tried the other reindexing methods. I'm not sure what the Ais actually representing because it doesn't appear in <myFrame>.columnsor <myFrame>.indexand doing <myFrame>['B'][0]gives me 232131, so what is Ain this reindexed data frame and how can I index correctly from the beginning or get rid of this strange Ain the incorrectly reindex data frame.

这很烦人,因为我尝试了其他重新索引方法。我不确定A实际代表什么,因为它没有出现在<myFrame>.columns<myFrame>.index正在做<myFrame>['B'][0]给我232131,所以A这个重新索引的数据框中有什么以及我如何从头开始A正确索引或在不正确的重新索引中摆脱这个奇怪的数据框。

采纳答案by Andy Hayden

You need to reset the name/names attribute of the index:

您需要重置索引的 name/names 属性:

df.index.names = [None]

Example:

例子:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B']).set_index('A')

In [12]: df
Out[12]: 
   B
A   
1  2
3  4

In [13]: df.index.names = [None]

In [14]: df
Out[14]: 
   B
1  2
3  4

The names describe the index, and give some meaning to the index, and also distinguishes between different levels in the index (in a MultiIndex).

名称描述索引,并赋予索引一些含义,还区分索引中的不同级别(在 MultiIndex 中)。

As @DSM points out, do so at your own peril, this loses info if you want to reset_index back:

正如@DSM 指出的那样,这样做后果自负,如果您想重新设置 reset_index,这会丢失信息:

In [15]: df.reset_index() # col_fill=['A', 'B'])
Out[15]: 
   index  B
0      1  2
1      3  4

However, you can col_fill in the names manually:

但是,您可以手动 col_fill 名称:

In [16]: df.reset_index(col_fill=['A'])
Out[16]: 
   A  B
0  1  2
1  3  4

回答by Marius

I think your main problem is that you need to actually save the result of set_index, or use inplace=True, for the index to be set:

我认为您的主要问题是您需要实际保存set_index或使用的结果inplace=True以设置索引:

# Either
df.set_index('A', inplace=True)
# Or:
# df = df.set_index('A')

The output you were seeing was correct, it was a dataframe indexed by A, but you just hadn't stored it in a variable. Once you have stored it, things should work like you expect:

您看到的输出是正确的,它是一个由 A 索引的数据帧,但您只是没有将其存储在变量中。一旦你存储了它,事情应该像你期望的那样工作:

df.index
Out[6]: Index([u'1986-87', u'1987-88', u'1988-89'], dtype=object)

df.loc[u'1987-88']
Out[8]: 
B    564564
Name: 1987-88, dtype: int64

回答by J_yang

I have a dataframe that is generated from appending multiple dataframe together into a long list. As shown in figure, the default index is a loop between 0 ~ 7 because each original df has this index. The total row number is 240. So how can reindex the new df into 0~239 instead of 30 x 0~7.

我有一个数据帧,它是通过将多个数据帧附加到一个长列表中而生成的。如图,默认索引是0~7之间的循环,因为每个原始df都有这个索引。总行数为 240。那么如何将新的 df 重新索引为 0~239 而不是 30 x 0~7。

I tried df.reset_index(drop=True), but it doesn't seem to work. I also tried:df.reindex(np.arange(240))but it returned error

我试过了df.reset_index(drop=True),但似乎不起作用。我也试过:df.reindex(np.arange(240))但它返回错误

ValueError: cannot reindex from a duplicate axis

enter image description here

在此处输入图片说明