pandas 从现有数据帧创建多索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44442831/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Create multiindex from existing dataframe
提问by puifais
I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):
我现在已经花了几个小时到处浏览,以尝试从 Pandas 中的数据帧创建一个多索引。这是我拥有的数据框(发布 excel 表模型。我在 Pandas 数据框中确实有这个):
And this is what I want:
这就是我想要的:
I have tried
我试过了
newmulti = currentDataFrame.set_index(['user_id','account_num'])
But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!
但它返回一个数据帧,而不是一个多索引。此外,我无法弄清楚如何将“user_id”级别设置为 0 和“account_num”级别设置为 1。我认为这一定是微不足道的,但我已经阅读了很多帖子、教程等,但仍然无法弄清楚。部分是因为我是一个非常注重视觉的人,而大多数帖子都不是。请帮忙!
回答by Alexander
You could simply use groupby
in this case, which will create the multi-index automatically when it sums the sales along the requested columns.
groupby
在这种情况下,您可以简单地使用它,它会在对所请求列的销售额求和时自动创建多索引。
df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()
You should also be able to simply do this:
您还应该能够简单地执行以下操作:
df.set_index(['user_id', 'account_num', 'dates'])
Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id
, account_num
and date
values but different sales figures) by summing them, which is why I recommended using groupby
.
虽然你可能要避免任何重复(例如,两个或多个行具有相同的user_id
,account_num
并date
通过总结他们,这就是为什么我建议使用值,但不同的销售数字)groupby
。
If you need the multi-index, you can simply access viat new_df.index
where new_df
is the new dataframe created from either of the two operations above.
如果您需要多指标,你可以简单地访问viatnew_df.index
其中new_df
从上述两种操作所造成的新的数据帧。
And user_id
will be level 0 and account_num
will be level 1.
并且user_id
将是 0 级和account_num
1 级。
回答by Eulenfuchswiesel
For clarification of future users I would like to add the following:
为了澄清未来的用户,我想添加以下内容:
As said by Alexander,
正如亚历山大所说,
df.set_index(['user_id', 'account_num', 'dates'])
with a possible inplace=True
does the job.
有一个可能inplace=True
的工作。
The type(df)
gives
该type(df)
给
pandas.core.frame.DataFrame
whereas type(df.index)
is indeed the expected
而type(df.index)
确实是预期的
pandas.core.indexes.multi.MultiIndex
回答by piRSquared
lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values
midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])
回答by Casey Van Buren
The DataFrame returned by currentDataFrame.set_index(['user_id','account_num'])
has it's index set to ['user_id','account_num']
返回的 DataFramecurrentDataFrame.set_index(['user_id','account_num'])
的索引设置为['user_id','account_num']
newmulti.index
will return the MultiIndex object.
newmulti.index
将返回 MultiIndex 对象。