Pandas:使用数据帧的多列作为另一个的索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21846163/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:42:33  来源:igfitidea点击:

Pandas: Use multiple columns of a dataframe as index of another

pythonnumpypandasscipyscikit-learn

提问by choldgraf

I've got a large dataframe with my data in it, and another dataframe of the same first dimension that contains metadata about each point in time (e.g., what trial number it was, what trial type it was).

我有一个包含我的数据的大型数据框,以及另一个相同第一维的数据框,其中包含有关每个时间点的元数据(例如,它是什么试验编号,它是什么试验类型)。

What I want to do is slice the large dataframe using the values of the "metadataframe". I want to keep these separate (rather than storing the metadataframe as a multi-index of the larger one).

我想要做的是使用“元数据帧”的值对大数据帧进行切片。我想将它们分开(而不是将元数据帧存储为较大的多索引)。

Right now, I am trying to do something like this:

现在,我正在尝试做这样的事情:

def my_func(container):
   container.big_df.set_index(container.meta_df[['col1', 'col2']])
   container.big_df.loc['col1val', 'col2val'].plot()

However, this returns the following error:

但是,这会返回以下错误:

ValueError: Must pass DataFrame with boolean values only

Note that this works fine if I only pass a single column to set_index.

请注意,如果我只将一列传递给 set_index,这可以正常工作。

Can anyone figure out what's going wrong here? Alternatively, can someone tell me that I'm doing this in a totally stupid and hacky way, and that there's a much better way to go about it? :)

谁能弄清楚这里出了什么问题?或者,有人可以告诉我,我正在以一种完全愚蠢和骇人听闻的方式来做这件事,并且有更好的方法来解决这个问题吗?:)

MY SOLUTION

我的解决方案

Thanks for the ideas. I played around with the indexing a little bit, and this seems to be the easiest / fastest. I didn't like having to strip the index of its name, and transposing the values etc. seemed cumbersome. I realized something interesting (and probably worth easily fixing):

谢谢你的想法。我玩了一点索引,这似乎是最简单/最快的。我不喜欢必须剥离其名称的索引,并且转置值等似乎很麻烦。我意识到了一些有趣的事情(可能值得轻松解决):

dfa.set_index(dfb[['col1', 'col2']]) 

doesn't work, but

不起作用,但是

dfa.set_index([dfb.col1, dfb.col2])

does.

做。

So, you can basically turn dfb into a list of columns, making set_index work, by the following convention:

因此,您基本上可以通过以下约定将 dfb 转换为列列表,使 set_index 工作:

dfa.set_index([dfb[col] for col in ['col1', 'col2']])

回答by HYRY

Use MultiIndex.from_arrays()to create the index object:

使用MultiIndex.from_arrays()创建索引对象:

import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3], "B":["a","b","c"]})
df2 = pd.DataFrame({"C":[100,200,300]})
df2.index = pd.MultiIndex.from_arrays(df1.values.T)

print df2

the result:

结果:

       C
1 a  100
2 b  200
3 c  300

回答by CT Zhu

change your first line to:

将您的第一行更改为:

container.big_df.index=pd.MultiIndex.from_arrays(container.meta_df[['col1', 'col2']].values.T, names=['i1','i2'])