pandas 使用数据透视表熊猫后如何摆脱多级索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38951345/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:49:26  来源:igfitidea点击:

How to get rid of multilevel index after using pivot table pandas?

pythonpandasdataframepivot-tabledata-analysis

提问by chessosapiens

I had following data frame (the real data frame is much more larger than this one ) :

我有以下数据框(实际数据框比这个大得多):

sale_user_id    sale_product_id count
1                 1              1
1                 8              1
1                 52             1
1                 312            5
1                 315            1

Then reshaped it to move the values in sale_product_id as column headers using the following code:

然后使用以下代码重塑它以将 sale_product_id 中的值移动为列标题:

reshaped_df=id_product_count.pivot(index='sale_user_id',columns='sale_product_id',values='count')

and the resulting data frame is:

结果数据框是:

sale_product_id -1057   1   2   3   4   5   6   8   9   10  ... 98  980 981 982 983 984 985 986 987 99
sale_user_id                                                                                    
1                NaN    1.0 NaN NaN NaN NaN NaN 1.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3                NaN    1.0 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4                NaN    NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

as you can see we have a multililevel index , what i need is to have sale_user_is in the first column without multilevel indexing:

正如你所看到的,我们有一个多级索引,我需要的是在没有多级索引的第一列中有 sale_user_is :

i take the following approach :

我采取以下方法:

reshaped_df.reset_index()

the the result would be like this i still have the sale_product_id column , but i do not need it anymore:

结果会是这样我仍然有 sale_product_id 列,但我不再需要它了:

sale_product_id sale_user_id    -1057   1   2   3   4   5   6   8   9   ... 98  980 981 982 983 984 985 986 987 99
0                          1    NaN 1.0 NaN NaN NaN NaN NaN 1.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1                          3    NaN 1.0 NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2                          4    NaN NaN 1.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 

i can subset this data frame to get rid of sale_product_id but i don't think it would be efficient.I am looking for an efficient way to get rid of multilevel indexing while reshaping the original data frame

我可以对这个数据框进行子集化以摆脱 sale_product_id,但我认为它不会有效。我正在寻找一种有效的方法来摆脱多级索引,同时重塑原始数据框

回答by jezrael

You need remove only index name, use rename_axis(new in pandas0.18.0):

你只需要删除index name,使用rename_axis(新的pandas0.18.0):

print (reshaped_df)
sale_product_id  1    8    52   312  315
sale_user_id                            
1                  1    1    1    5    1

print (reshaped_df.index.name)
sale_user_id

print (reshaped_df.rename_axis(None))
sale_product_id  1    8    52   312  315
1                  1    1    1    5    1

Another solution working in pandas below 0.18.0:

下面在Pandas中工作的另一个解决方案0.18.0

reshaped_df.index.name = None
print (reshaped_df)

sale_product_id  1    8    52   312  315
1                  1    1    1    5    1


If need remove columns namealso:

如果需要columns name也删除:

print (reshaped_df.columns.name)
sale_product_id

print (reshaped_df.rename_axis(None).rename_axis(None, axis=1))
   1    8    52   312  315
1    1    1    1    5    1

Another solution:

另一种解决方案:

reshaped_df.columns.name = None
reshaped_df.index.name = None
print (reshaped_df)
   1    8    52   312  315
1    1    1    1    5    1

EDIT by comment:

通过评论编辑:

You need reset_indexwith parameter drop=True:

您需要reset_index带参数drop=True

reshaped_df = reshaped_df.reset_index(drop=True)
print (reshaped_df)
sale_product_id  1    8    52   312  315
0                  1    1    1    5    1

#if need reset index nad remove column name
reshaped_df = reshaped_df.reset_index(drop=True).rename_axis(None, axis=1)
print (reshaped_df)
   1    8    52   312  315
0    1    1    1    5    1

Of if need remove only column name:

如果只需要删除列名:

reshaped_df = reshaped_df.rename_axis(None, axis=1)
print (reshaped_df)
              1    8    52   312  315
sale_user_id                         
1               1    1    1    5    1

Edit1:

编辑1:

So if need create new column from indexand remove columns names:

因此,如果需要从中创建新列index并删除columns names

reshaped_df =  reshaped_df.rename_axis(None, axis=1).reset_index() 
print (reshaped_df)
   sale_user_id  1  8  52  312  315
0             1  1  1   1    5    1

回答by Yury Wallet

The way it works for me is

它对我有用的方式是

df_cross=pd.DataFrame(pd.crosstab(df[c1], df[c2]).to_dict()).reset_index()