在 Pandas 中合并多索引与单索引数据框

Question

提问by

I have two dataframes. df1 is multi-indexed:

我有两个数据框。df1 是多索引的：

                value
first second    
a     x         0.471780
      y         0.774908
      z         0.563634
b     x         -0.353756
      y         0.368062
      z         -1.721840

and df2:

和 df2：

      value
first   
a     10
b     20

How can I merge the two data frames with only one of the multi-indexes, in this case the 'first' index? The desired output would be:

如何仅将两个数据框与多索引中的一个合并，在这种情况下为“第一个”索引？所需的输出是：

                value1      value2
first second    
a     x         0.471780    10
      y         0.774908    10
      z         0.563634    10
b     x         -0.353756   20
      y         0.368062    20
      z         -1.721840   20

Answer 1

采纳答案by Andy Hayden

You could use get_level_values:

你可以使用get_level_values：

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

Note: you are almostdoing a joinhere (except the df1 is MultiIndex)... so there may be a neater way to describe this...

注意：你几乎在join这里做一个（除了 df1 是 MultiIndex）......所以可能有一种更简洁的方式来描述这个......

.

In an example (similar to what you have):

在一个例子中（类似于你所拥有的）：

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

firsts = df1.index.get_level_values('first')
df1['value2'] = df2.loc[firsts].values

In [5]: df1
Out[5]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

Answer 2

回答by Matt M

According to the documentation, as of pandas 0.14, you can simply join single-index and multiindex dataframes. It will match on the common index name. The howargument works as expected with 'inner'and 'outer', though interestingly it seems to be reversed for 'left'and 'right'(could this be a bug?).

根据文档，从 pandas 0.14 开始，您可以简单地加入单索引和多索引数据帧。它将匹配公共索引名称。该how参数按预期工作与'inner'和'outer'，但有趣的是它似乎是颠倒的'left'和'right'（可能这是一个错误？）。

df1 = pd.DataFrame([['a', 'x', 0.471780], ['a','y', 0.774908], ['a', 'z', 0.563634],
                    ['b', 'x', -0.353756], ['b', 'y', 0.368062], ['b', 'z', -1.721840],
                    ['c', 'x', 1], ['c', 'y', 2], ['c', 'z', 3],
                   ],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10], ['b', 20]],
                   columns=['first', 'value2']).set_index(['first'])

print(df1.join(df2, how='inner'))
                value1  value2
first second                  
a     x       0.471780      10
      y       0.774908      10
      z       0.563634      10
b     x      -0.353756      20
      y       0.368062      20
      z      -1.721840      20

Answer 3

回答by K.-Michael Aye

As the .ixsyntax is a powerful shortcut to reindexing, but in this case you are actually not doing any combined rows/column reindexing, this can be done a bit more elegantly (for my humble taste buds) with just using reindexing:

由于.ix语法是重新索引的强大快捷方式，但在这种情况下，您实际上没有进行任何组合的行/列重新索引，只需使用重新索引就可以更优雅地完成（对于我卑微的味蕾）：

Preparation from hayden:

海登的准备：

df1 = pd.DataFrame([['a', 'x', 0.123], ['a','x', 0.234],
                    ['a', 'y', 0.451], ['b', 'x', 0.453]],
                   columns=['first', 'second', 'value1']
                   ).set_index(['first', 'second'])
df2 = pd.DataFrame([['a', 10],['b', 20]],
                   columns=['first', 'value']).set_index(['first'])

Then this looks like this in iPython:

然后在 iPython 中看起来像这样：

In [4]: df1
Out[4]: 
              value1
first second        
a     x        0.123
      x        0.234
      y        0.451
b     x        0.453

In [5]: df2
Out[5]: 
       value
first       
a         10
b         20

In [7]: df2.reindex(df1.index, level=0)
Out[7]: 
              value
first second       
a     x          10
      x          10
      y          10
b     x          20

In [8]: df1['value2'] = df2.reindex(df1.index, level=0)

In [9]: df1
Out[9]: 
              value1  value2
first second                
a     x        0.123      10
      x        0.234      10
      y        0.451      10
b     x        0.453      20

The mnemotechnic for what level you have to use in the reindex method: It states for the level that you already covered in the bigger index. So, in this case df2 already had level 0 covered of the df1.index.

在 reindex 方法中必须使用什么级别的助记符：它说明您已经在更大的索引中涵盖的级别。因此，在这种情况下，df2 已经覆盖了 df1.index 的 0 级。

在 Pandas 中合并多索引与单索引数据框

提问by

采纳答案by Andy Hayden

回答by Matt M

回答by K.-Michael Aye

相关推荐

最近更新

标签

在 Pandas 中合并多索引与单索引数据框

提问by

采纳答案by Andy Hayden

回答by Matt M

回答by K.-Michael Aye

相关推荐

绘制包含 NaN 的 Pandas 数据框

Pandas：计算所有仅在两列中不同的条目的平均值

pandas 熊猫 read_csv 和 UTF-16

pandas 大熊猫系列上的 numpy 差异

相关推荐

最近更新

标签