pandas 加入数据帧 - 一个有多索引列,另一个没有

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43223615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:20:48  来源:igfitidea点击:

Join dataframes - one with multiindex columns and the other without

pythonpandasjoinmulti-index

提问by Eyal S.

I'm trying to join two dataframes - one with multiindex columns and the other with a single column name. They have similar index.

我正在尝试加入两个数据框 - 一个具有多索引列,另一个具有单个列名。他们有相似的指数。

I get the following warning: "UserWarning: merging between different levels can give an unintended result (3 levels on the left, 1 on the right)"

我收到以下警告:“用户警告:不同级别之间的合并可能会产生意外结果(左侧 3 个级别,右侧 1 个级别)”

For example:

例如:

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(3, 8), index=['A', 'B', 'C'], columns=index)
df2 = pd.DataFrame(np.random.randn(3), index=['A', 'B', 'C'],columns=['w'])
df3 = df.join(df2)

What is the best way to join these two dataframes?

加入这两个数据框的最佳方法是什么?

回答by piRSquared

It depends on what you want! Do you want the column from df2to be aligned with the 1st or second level of columns from df?

这取决于你想要什么!您希望 from 的列与 fromdf2的第一级或第二级列对齐df吗?

You have to add a level to the columns of df2

您必须在列中添加一个级别 df2

Super cheezy with pd.concat

超级俗气的 pd.concat

df.join(pd.concat([df2], axis=1, keys=['a']))

Better way

更好的方法

df2.columns = pd.MultiIndex.from_product([['a'], df2.columns])

df.join(df2)

enter image description here

在此处输入图片说明

回答by jezrael

I think simpliest is create MultiIndexin df2and then use concator join:

我认为simpliest是建立MultiIndexdf2,然后使用concatjoin

df2.columns = pd.MultiIndex.from_tuples([('a','w')])
print (df2)
          a
          w
A -0.562729
B -0.212032
C  0.102451
df2.columns = [['a'], df2.columns]
print (df2)
          a
          w
A -1.253881
B -0.637752
C  0.907105


df3 = pd.concat([df, df2], axis=1)

Or:

或者:

df3 = df.join(df2)

print (df3)
first        bar                 baz                 foo                 qux  \
second       one       two       one       two       one       two       one   
A      -0.269667  0.221566  1.138393  0.871762 -0.063132 -1.995682 -0.797885   
B      -0.456878  0.293350 -1.040748 -1.307871  0.002462  1.580711 -0.198943   
C      -0.691755 -0.279445 -0.809215 -0.006658  1.452484  0.516414 -0.295961   

first                    a  
second       two         w  
A       1.068843 -0.562729  
B       1.247057 -0.212032  
C      -0.345300  0.102451