Pandas:连接数据框并保留重复的索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24684441/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Concatenate dataframe and keep duplicate indices
提问by andbeonetraveler
I have two dataframes that I would like to concatenate column-wise (axis=1) with an inner join. One of the dataframes has some duplicate indices, but the rows are not duplicates, and I don't want to lose the data from those :
我有两个数据框,我想将它们按列(轴 = 1)与内部连接连接起来。其中一个数据帧有一些重复的索引,但行不是重复的,我不想丢失这些数据:
df1 = pd.DataFrame([{'a':1,'b':2},{'a':1,'b':3},{'a':2,'b':4}],
columns = ['a','b']).set_index('a')
df2 = pd.DataFrame([{'a':1,'c':5},{'a':2,'c':6}],columns = ['a','c']).set_index('a')
>>> df1
b
a
1 2
1 3
2 4
8 9
>>> df2
c
a
1 5
2 6
The default concatbehavior is to fill missing values with NaNs:
默认concat行为是用 NaN 填充缺失值:
>>> pd.concat([df1,df2])
b c
a
1 2 NaN
1 3 NaN
2 4 NaN
1 NaN 5
2 NaN 6
I want to keep the duplicate indices from df1 and fill them with duplicated values from df2, but in pandas 0.13.1 an inner join on the columns produces an error. In more recent versions of pandas concat does what I want:
我想保留来自 df1 的重复索引并用来自 df2 的重复值填充它们,但在 Pandas 0.13.1 中,列上的内部连接会产生错误。在最新版本的Pandas concat 做我想要的:
>>> pd.concat([df1, df2], axis=1, join='inner')
b c
a
1 2 5
1 3 5
2 4 6
What's the best way to achieve the result I want? Is there a groupby solution? Or maybe I shouldn't be using concatat all?
达到我想要的结果的最佳方法是什么?有groupby解决方案吗?或者也许我根本不应该使用concat?
采纳答案by EdChum
You can perform a merge and set the params to use the index from the lhs and rhs:
您可以执行合并并设置参数以使用 lhs 和 rhs 中的索引:
In [4]:
df1.merge(df2, left_index=True, right_index=True)
Out[4]:
b c
a
1 2 5
1 3 5
2 4 6
[3 rows x 2 columns]
Concat should've worked, it worked for me:
Concat 应该有用,它对我有用:
In [5]:
pd.concat([df1,df2], join='inner', axis=1)
Out[5]:
b c
a
1 2 5
1 3 5
2 4 6
[3 rows x 2 columns]

