在 Pandas 中合并多索引与单索引数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14149156/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:34:02  来源:igfitidea点击:

Merge multi-indexed with single-indexed data frames in pandas

pythonpandas

提问by

I have two dataframes. df1 is multi-indexed:

我有两个数据框。df1 是多索引的:

                value
first second    
a     x         0.471780
      y         0.774908
      z         0.563634
b     x         -0.353756
      y         0.368062
      z         -1.721840

and df2:

和 df2:

      value
first   
a     10
b     20

How can I merge the two data frames with only one of the multi-indexes, in this case the 'first' index? The desired output would be:

如何仅将两个数据框与多索引中的一个合并,在这种情况下为“第一个”索引?所需的输出是:

                value1      value2
first second    
a     x         0.471780    10
      y         0.774908    10
      z         0.563634    10
b     x         -0.353756   20
      y         0.368062    20
      z         -1.721840   20

采纳答案by Andy Hayden

You could use get_level_values:

你可以使用get_level_values

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

Note: you are almostdoing a joinhere (except the df1 is MultiIndex)... so there may be a neater way to describe this...

注意:你几乎join这里做一个(除了 df1 是 MultiIndex)......所以可能有一种更简洁的方式来描述这个......

.

.

In an example (similar to what you have):

在一个例子中(类似于你所拥有的):

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

In [5]: df1
Out[5]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

回答by Matt M

According to the documentation, as of pandas 0.14, you can simply join single-index and multiindex dataframes. It will match on the common index name. The howargument works as expected with 'inner'and 'outer', though interestingly it seems to be reversed for 'left'and 'right'(could this be a bug?).

根据文档,从 pandas 0.14 开始,您可以简单地加入单索引和多索引数据帧。它将匹配公共索引名称。该how参数按预期工作与'inner''outer',但有趣的是它似乎是颠倒的'left''right'(可能这是一个错误?)。

df1 = pd.DataFrame([['a', 'x', 0.471780], ['a','y', 0.774908], ['a', 'z', 0.563634],
                    ['b', 'x', -0.353756], ['b', 'y', 0.368062], ['b', 'z', -1.721840],
                    ['c', 'x', 1], ['c', 'y', 2], ['c', 'z', 3],
                   ],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10], ['b', 20]],
                   columns=['first', 'value2']).set_index(['first'])

print(df1.join(df2, how='inner'))
                value1  value2
first second                  
a     x       0.471780      10
      y       0.774908      10
      z       0.563634      10
b     x      -0.353756      20
      y       0.368062      20
      z      -1.721840      20

回答by K.-Michael Aye

As the .ixsyntax is a powerful shortcut to reindexing, but in this case you are actually not doing any combined rows/column reindexing, this can be done a bit more elegantly (for my humble taste buds) with just using reindexing:

由于.ix语法是重新索引的强大快捷方式,但在这种情况下,您实际上没有进行任何组合的行/列重新索引,只需使用重新索引就可以更优雅地完成(对于我卑微的味蕾):

Preparation from hayden:

海登的准备:

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

Then this looks like this in iPython:

然后在 iPython 中看起来像这样:

In [4]: df1
Out[4]: 
              value1
first second        
a     x        0.123
      x        0.234
      y        0.451
b     x        0.453

In [5]: df2
Out[5]: 
       value
first       
a         10
b         20

In [7]: df2.reindex(df1.index, level=0)
Out[7]: 
              value
first second       
a     x          10
      x          10
      y          10
b     x          20

In [8]: df1['value2'] = df2.reindex(df1.index, level=0)

In [9]: df1
Out[9]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

The mnemotechnic for what level you have to use in the reindex method: It states for the level that you already covered in the bigger index. So, in this case df2 already had level 0 covered of the df1.index.

在 reindex 方法中必须使用什么级别的助记符:它说明您已经在更大的索引中涵盖的级别。因此,在这种情况下,df2 已经覆盖了 df1.index 的 0 级。