pandas 创建一个空的 MultiIndex

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28289440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:54:20  来源:igfitidea点击:

Creating an empty MultiIndex

pythonpandasmulti-index

提问by dmvianna

I would like to create an emptyDataFramewith a MultiIndexbefore assigning rows to it. I already found that empty DataFrames don't like to be assigned MultiIndexes on the fly, so I'm setting the MultiIndex namesduring creation. However, I don't want to assign levels, as this will be done later. This is the best code I got to so far:

我想在为其分配行之前创建一个带有MultiIndexDataFrame。我已经发现空的 DataFrame 不喜欢动态分配 MultiIndex,所以我在创建过程中设置了 MultiIndex名称。但是,我不想分配levels,因为这将在以后完成。这是迄今为止我得到的最好的代码:

def empty_multiindex(names):
    """
    Creates empty MultiIndex from a list of level names.
    """
    return MultiIndex.from_tuples(tuples=[(None,) * len(names)], names=names)

Which gives me

这给了我

In [2]:

empty_multiindex(['one','two', 'three'])

Out[2]:

MultiIndex(levels=[[], [], []],
           labels=[[-1, -1, -1], [-1, -1, -1], [-1, -1, -1]],
           names=[u'one', u'two', u'three'])

and

In [3]:
DataFrame(index=empty_multiindex(['one','two', 'three']))

Out[3]:
one two three
NaN NaN NaN

Well, I have no use for these NaNs. I can easily drop them later, but this is obviously a hackish solution. Anyone has a better one?

好吧,我对这些 NaN 没有用。稍后我可以轻松地删除它们,但这显然是一种骇人听闻的解决方案。有人有更好的吗?

回答by RoG

The solution is to leave out the labels. This works fine for me:

解决办法是去掉标签。这对我来说很好用:

>>> my_index = pd.MultiIndex(levels=[[],[],[]],
                             labels=[[],[],[]],
                             names=[u'one', u'two', u'three'])
>>> my_index
MultiIndex(levels=[[], [], []],
           labels=[[], [], []],
           names=[u'one', u'two', u'three'])
>>> my_columns = [u'alpha', u'beta']
>>> df = pd.DataFrame(index=my_index, columns=my_columns)
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three            
apple banana cherry   0.1  0.2

Hope that helps!

希望有帮助!

回答by Jean Paul

Another solution which is maybe a little simpler is to use the function set_index:

另一个可能更简单的解决方案是使用该函数set_index

>>> import pandas as pd
>>> df = pd.DataFrame(columns=['one', 'two', 'three', 'alpha', 'beta'])
>>> df = df.set_index(['one', 'two', 'three'])
>>> df
Empty DataFrame
Columns: [alpha, beta]
Index: []
>>> df.loc[('apple','banana','cherry'),:] = [0.1, 0.2]
>>> df
                    alpha beta
one   two    three            
apple banana cherry   0.1  0.2

回答by mcsoini

Using pd.MultiIndex.from_arraysallows for a slightly more concise solution when defining the index explicitly:

pd.MultiIndex.from_arrays在显式定义索引时,使用允许稍微更简洁的解决方案:

import pandas as pd
ind = pd.MultiIndex.from_arrays([[]] * 3, names=(u'one', u'two', u'three'))
df = pd.DataFrame(columns=['alpha', 'beta'], index=ind)
df.loc[('apple','banana','cherry'), :] = [4, 3]

                     alpha  beta
one   two    three              
apple banana cherry      4     3