在 Pandas 中合并 2 个数据框:加入一些列,总结其他列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16583668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
merge 2 dataframes in Pandas: join on some columns, sum up others
提问by Laurie
I want to merge two dataframes on specific columns (key1, key2) and sum up the values for another column (value).
我想合并特定列(key1、key2)上的两个数据框,并对另一列(值)的值求和。
>>> df1 = pd.DataFrame({'key1': range(4), 'key2': range(4), 'value': range(4)})
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
>>> df2 = pd.DataFrame({'key1': range(2, 6), 'key2': range(2, 6), 'noise': range(2, 6), 'value': range(10, 14)})
key1 key2 noise value
0 2 2 2 10
1 3 3 3 11
2 4 4 4 12
3 5 5 5 13
I want this result:
我想要这个结果:
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 12
3 3 3 14
4 4 4 12
5 5 5 13
In SQL terms, I want:
在 SQL 术语中,我想要:
SELECT df1.key1, df1.key2, df1.value + df2.value AS value
FROM df1 OUTER JOIN df2 ON key1, key2
I tried two approaches:
我尝试了两种方法:
approach 1
方法一
concatenated = pd.concat([df1, df2])
grouped = concatenated.groupby(['key1', 'key2'], as_index=False)
summed = grouped.agg(np.sum)
result = summed[['key1', 'key2', 'value']]
approach 2
方法二
joined = pd.merge(df1, df2, how='outer', on=['key1', 'key2'], suffixes=['_1', '_2'])
joined = joined.fillna(0.0)
joined['value'] = joined['value_1'] + joined['value_2']
result = joined[['key1', 'key2', 'value']]
Both approaches give the result I want, but I wonder if there is a simpler way.
这两种方法都给出了我想要的结果,但我想知道是否有更简单的方法。
回答by DSM
I don't know about simpler, but you can get a little more concise:
我不知道更简单的,但你可以更简洁一点:
>>> pd.concat([df1, df2]).groupby(["key1", "key2"], as_index=False)["value"].sum()
key1 key2 value
0 0 0 0
1 1 1 1
2 2 2 12
3 3 3 14
4 4 4 12
5 5 5 13
Depending on your tolerance for chaining ops, you might want to break this onto multiple lines anyway, though (four tends to be close to my upper limit, in this case concat-groupby-select-sum).
但是,根据您对链接操作的容忍度,您可能希望将其分解为多行(四行往往接近我的上限,在这种情况下是 concat-groupby-select-sum)。

