pandas 从现有数据帧创建多索引

Question

提问by puifais

I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

我现在已经花了几个小时到处浏览，以尝试从 Pandas 中的数据帧创建一个多索引。这是我拥有的数据框（发布 excel 表模型。我在 Pandas 数据框中确实有这个）：

And this is what I want:

这就是我想要的：

I have tried

我试过了

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!

但它返回一个数据帧，而不是一个多索引。此外，我无法弄清楚如何将“user_id”级别设置为 0 和“account_num”级别设置为 1。我认为这一定是微不足道的，但我已经阅读了很多帖子、教程等，但仍然无法弄清楚。部分是因为我是一个非常注重视觉的人，而大多数帖子都不是。请帮忙！

Answer 1

回答by Alexander

You could simply use groupbyin this case, which will create the multi-index automatically when it sums the sales along the requested columns.

groupby在这种情况下，您可以简单地使用它，它会在对所请求列的销售额求和时自动创建多索引。

df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()

You should also be able to simply do this:

您还应该能够简单地执行以下操作：

df.set_index(['user_id', 'account_num', 'dates'])

Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_numand datevalues but different sales figures) by summing them, which is why I recommended using groupby.

虽然你可能要避免任何重复（例如，两个或多个行具有相同的user_id，account_num并date通过总结他们，这就是为什么我建议使用值，但不同的销售数字）groupby。

If you need the multi-index, you can simply access viat new_df.indexwhere new_dfis the new dataframe created from either of the two operations above.

如果您需要多指标，你可以简单地访问viatnew_df.index其中new_df从上述两种操作所造成的新的数据帧。

And user_idwill be level 0 and account_numwill be level 1.

并且user_id将是 0 级和account_num1 级。

Answer 2

回答by Eulenfuchswiesel

For clarification of future users I would like to add the following:

为了澄清未来的用户，我想添加以下内容：

As said by Alexander,

正如亚历山大所说，

df.set_index(['user_id', 'account_num', 'dates'])

with a possible inplace=Truedoes the job.

有一个可能inplace=True的工作。

The type(df)gives

该type(df)给

pandas.core.frame.DataFrame

whereas type(df.index)is indeed the expected

而type(df.index)确实是预期的

pandas.core.indexes.multi.MultiIndex

Answer 3

回答by piRSquared

Use pd.MultiIndex.from_arrays

用 pd.MultiIndex.from_arrays

lvl0 = currentDataFrame.user_id.values
lvl1 = currentDataFrame.account_num.values

midx = pd.MultiIndex.from_arrays([lvl0, lvl1], names=['level 0', 'level 1'])

Answer 4

回答by Casey Van Buren

The DataFrame returned by currentDataFrame.set_index(['user_id','account_num'])has it's index set to ['user_id','account_num']

返回的 DataFramecurrentDataFrame.set_index(['user_id','account_num'])的索引设置为['user_id','account_num']

newmulti.indexwill return the MultiIndex object.

newmulti.index将返回 MultiIndex 对象。

pandas 从现有数据帧创建多索引

提问by puifais

回答by Alexander

回答by Eulenfuchswiesel

回答by piRSquared

回答by Casey Van Buren

相关推荐

最近更新

标签

pandas 从现有数据帧创建多索引

提问by puifais

回答by Alexander

回答by Eulenfuchswiesel

回答by piRSquared

回答by Casey Van Buren

相关推荐

Pandas Dataframe 线图在 xaxis 上显示日期

pandas 数据框列值与列表的比较

pandas 用于存储对象的 Python DataFrame 或列表

pandas 熊猫缺少必需的依赖项 ['numpy']

相关推荐

最近更新

标签