pandas 从现有数据帧创建多索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44442831/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:45:08  来源:igfitidea点击:

Create multiindex from existing dataframe

pythonpandasdataframemulti-indexreindex

提问by puifais

I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

我现在已经花了几个小时到处浏览,以尝试从 Pandas 中的数据帧创建一个多索引。这是我拥有的数据框(发布 excel 表模型。我在 Pandas 数据框中确实有这个):

have

有

And this is what I want:

这就是我想要的:

want

想

I have tried

我试过了

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!

但它返回一个数据帧,而不是一个多索引。此外,我无法弄清楚如何将“user_id”级别设置为 0 和“account_num”级别设置为 1。我认为这一定是微不足道的,但我已经阅读了很多帖子、教程等,但仍然无法弄清楚。部分是因为我是一个非常注重视觉的人,而大多数帖子都不是。请帮忙!

回答by Alexander

You could simply use groupbyin this case, which will create the multi-index automatically when it sums the sales along the requested columns.

groupby在这种情况下,您可以简单地使用它,它会在对所请求列的销售额求和时自动创建多索引。

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

您还应该能够简单地执行以下操作:

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_numand datevalues but different sales figures) by summing them, which is why I recommended using groupby.

虽然你可能要避免任何重复(例如,两个或多个行具有相同的user_idaccount_numdate通过总结他们,这就是为什么我建议使用值,但不同的销售数字)groupby

If you need the multi-index, you can simply access viat new_df.indexwhere new_dfis the new dataframe created from either of the two operations above.

如果您需要多指标,你可以简单地访问viatnew_df.index其中new_df从上述两种操作所造成的新的数据帧。

And user_idwill be level 0 and account_numwill be level 1.

并且user_id将是 0 级和account_num1 级。

回答by Eulenfuchswiesel

For clarification of future users I would like to add the following:

为了澄清未来的用户,我想添加以下内容:

As said by Alexander,

正如亚历山大所说,

df.set_index(['user_id', 'account_num', 'dates'])

with a possible inplace=Truedoes the job.

有一个可能inplace=True的工作。

The type(df)gives

type(df)

pandas.core.frame.DataFrame

whereas type(df.index)is indeed the expected

type(df.index)确实是预期的

pandas.core.indexes.multi.MultiIndex

回答by piRSquared

Use pd.MultiIndex.from_arrays

pd.MultiIndex.from_arrays

lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values

midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])

回答by Casey Van Buren

The DataFrame returned by currentDataFrame.set_index(['user_id','account_num'])has it's index set to ['user_id','account_num']

返回的 DataFramecurrentDataFrame.set_index(['user_id','account_num'])的索引设置为['user_id','account_num']

newmulti.indexwill return the MultiIndex object.

newmulti.index将返回 MultiIndex 对象。