Pandas 数据帧多索引合并

Question

提问by learningToCode

I wanted to ask a questions regarding merging multiindex dataframe in pandas, here is a hypothetical scenario:

我想问一个关于在Pandas中合并多索引数据框的问题，这是一个假设的场景：

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
            ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index1 = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
index2 = pd.MultiIndex.from_tuples(tuples, names=['third', 'fourth'])

s1 = pd.DataFrame(np.random.randn(8), index=index1, columns=['s1'])
s2 = pd.DataFrame(np.random.randn(8), index=index2, columns=['s2'])

Then either

那么要么

s1.merge(s2, how='left', left_index=True, right_index=True)

or

或者

s1.merge(s2, how='left', left_on=['first', 'second'], right_on=['third', 'fourth'])

will result in error.

会导致错误。

Do I have to do reset_index() on either s1/s2 to make this work?

我是否必须在 s1/s2 上执行 reset_index() 才能使其工作？

Thanks

谢谢

Answer 1

回答by ALollz

Seems like you need to use a combination of them.

似乎您需要结合使用它们。

s1.merge(s2, left_index=True, right_on=['third', 'fourth'])
#s1.merge(s2, right_index=True, left_on=['first', 'second'])

Output:

输出：

               s1        s2
bar one  0.765385 -0.365508
    two  1.462860  0.751862
baz one  0.304163  0.761663
    two -0.816658 -1.810634
foo one  1.891434  1.450081
    two  0.571294  1.116862
qux one  1.056516 -0.052927
    two -0.574916 -1.197596

Answer 2

回答by YOBEN_S

Assign it by combine_first

分配给 combine_first

s1.combine_first(s2)
Out[19]: 
                    s1        s2
first second                    
bar   one     0.039203  0.795963
      two     0.454782 -0.222806
baz   one     3.101120 -0.645474
      two    -1.174929 -0.875561
foo   one    -0.887226  1.078218
      two     1.507546 -1.078564
qux   one     0.028048  0.042462
      two     0.826544 -0.375351

# s2.combine_first(s1)

Answer 3

回答by rafaelc

Other than using the indexes names as pointed by @ALollz, you can simply use loc, which will match indexes automatically

除了使用@ALollz 指向的索引名称之外，您可以简单地使用loc，它将自动匹配索引

s1.loc[:, 's2'] = s2   # Or explicitly, s2['s2']

                s1           s2
first   second      
bar     one     -0.111384   -2.341803
        two     -1.226569    1.308240
baz     one      1.880835    0.697946
        two     -0.008979   -0.247896
foo     one      0.103864   -1.039990
        two      0.836931    0.000811
qux     one     -0.859005   -1.199615
        two     -0.321341   -1.098691

A general formula would be

一个通用的公式是

s1.loc[:, s2.columns] = s2

Answer 4

回答by piRSquared

`rename_axis`

You can rename the index levels of one and let joindo its thing

您可以重命名一个的索引级别并让它join做它的事情

s1.join(s2.rename_axis(s1.index.names))

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

`concat`

pd.concat([s1, s2], axis=1)

                    s1        s2
first second                    
bar   one    -0.696420 -1.040463
      two     0.640891  1.483262
baz   one     1.598837  0.097424
      two     0.003994 -0.948419
foo   one    -0.717401  1.190019
      two    -1.201237 -0.000738
qux   one     0.559684 -0.505640
      two     1.979700  0.186013

Pandas 数据帧多索引合并

提问by learningToCode

回答by ALollz

Output:

输出：

回答by YOBEN_S

回答by rafaelc

回答by piRSquared

`rename_axis`

`rename_axis`

`concat`

`concat`

相关推荐

最近更新

标签

Pandas 数据帧多索引合并

提问by learningToCode

回答by ALollz

Output:

输出：

回答by YOBEN_S

回答by rafaelc

回答by piRSquared

rename_axis

rename_axis

concat

concat

相关推荐

将新列添加到 Pandas 数据框的有效方法

pandas 类型错误：一元操作数类型错误~：浮点数

pandas 将数据帧转换为多列的系列

pandas 熊猫在执行 groupby 后重置索引并保留选择性列

相关推荐

最近更新

标签

`rename_axis`

`rename_axis`

`concat`

`concat`