Python pandas - 特定的合并/替换
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25762902/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas - particular merge/replacement
提问by Colonel Beauvel
new to pandas operations, I have these two dataframes:
Pandas操作的新手,我有这两个数据框:
import pandas as pd
df = pd.DataFrame({'name': ['a','a','b','b','c','c'], 'id':[1,2,1,2,1,2], 'val1':[0,0,0,0,0,0],'val2':[0,0,0,0,0,0],'val3':[0,0,0,0,0,0]})
id name val1 val2 val3
0 1 a 0 0 0
1 2 a 0 0 0
2 1 b 0 0 0
3 2 b 0 0 0
4 1 c 0 0 0
5 2 c 0 0 0
subdf = pd.DataFrame({'name': ['a','b','c'], 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]}
id name val1 val2
0 1 a 0.3 4
1 1 b 0.4 5
2 2 c 0.7 4
I would like to obtain as output:
我想获得作为输出:
id name val1 val2 val3
0 1 a 0.3 4 0
1 2 a 0.0 0 0
2 1 b 0.4 5 0
3 2 b 0.0 0 0
4 1 c 0.0 0 0
5 2 c 0.7 4 0
But I did not catch example of replacement, just additions of columns/rows from the tutorials I saw !
但是我没有发现替换的例子,只是从我看到的教程中添加了列/行!
回答by EdChum
This takes a couple steps, left mergeon the columns that match, this will create 'x' and 'y' where there are clashes:
这需要几个步骤,留merge在匹配的列上,这将在有冲突的地方创建 'x' 和 'y':
In [25]:
merged = df.merge(subdf, on=['id', 'name'], how='left')
merged
Out[25]:
id name val1_x val2_x val3 val1_y val2_y
0 1 a 0 0 0 0.3 4
1 2 a 0 0 0 NaN NaN
2 1 b 0 0 0 0.4 5
3 2 b 0 0 0 NaN NaN
4 1 c 0 0 0 NaN NaN
5 2 c 0 0 0 0.7 4
In [26]:
# take the values that of interest from the clashes
merged['val1'] = np.max(merged[['val1_x', 'val1_y']], axis=1)
merged['val2'] = np.max(merged[['val2_x', 'val2_y']], axis=1)
merged
Out[26]:
id name val1_x val2_x val3 val1_y val2_y val1 val2
0 1 a 0 0 0 0.3 4 0.3 4
1 2 a 0 0 0 NaN NaN 0.0 0
2 1 b 0 0 0 0.4 5 0.4 5
3 2 b 0 0 0 NaN NaN 0.0 0
4 1 c 0 0 0 NaN NaN 0.0 0
5 2 c 0 0 0 0.7 4 0.7 4
In [27]:
# drop the additional columns
merged = merged.drop(labels=['val1_x', 'val1_y','val2_x', 'val2_y'], axis=1)
merged
Out[27]:
id name val3 val1 val2
0 1 a 0 0.3 4
1 2 a 0 0.0 0
2 1 b 0 0.4 5
3 2 b 0 0.0 0
4 1 c 0 0.0 0
5 2 c 0 0.7 4
Another method would be to sort both df's on 'id' and 'name' and then call update:
另一种方法是对 'id' 和 'name' 上的 df 进行排序,然后调用update:
In [30]:
df = df.sort(columns=['id','name'])
subdf = subdf.sort(columns=['id','name'])
df.update(subdf)
df
Out[30]:
id name val1 val2 val3
0 1 a 0.3 4 0
2 2 c 0.7 4 0
4 1 c 0.0 0 0
1 1 b 0.4 5 0
3 2 b 0.0 0 0
5 2 c 0.0 0 0
回答by famaral42
Updatedversion with updatemethod. Inspired by Nic
使用update方法更新版本。灵感来自尼克
I managed to it with concatbut is not as elegant as this one below with updateand DataFrame are copied, which I believe with bigger tables could result in problems with memory and/or speed.
我设法做到了,concat但并不像下面的那样优雅,update并且复制了 DataFrame,我相信更大的表可能会导致内存和/或速度问题。
df = pd.DataFrame({'name': list('aabbcc'), 'id':[1,2]*3, 'val1':[0]*6,'val2':[0]*6,'val3':[0]*6})
subdf = pd.DataFrame({'name': list('abc'), 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]})
df.set_index(['name','id'], inplace=True)
df.update(subdf.set_index(['name','id']))
df.reset_index(inplace=True)
df
Result:
结果:
name id val1 val2 val3
0 a 1 0.3 4.0 0
1 a 2 0.0 0.0 0
2 b 1 0.4 5.0 0
3 b 2 0.0 0.0 0
4 c 1 0.0 0.0 0
5 c 2 0.7 4.0 0
Minor drawback is that pandas.DataFrame.updatechanges the dtypes pointed out by JAB
小缺点是pandas.DataFrame.update变化dtypes通过指出JAB
回答by KuboAndTwoStrings
The sortfunction in the second part of the above answer has been deprecated. The code for users using Pandas 0.20+ for achieving the same effect is:
在sort上述回答的第二部分功能已被弃用。使用 Pandas 0.20+ 的用户实现相同效果的代码是:
df1 = pd.DataFrames(usecols=['A', 'B']) # You want to merge TO this
df2 = pd.DataFrames(usecols=['A', 'B']) # You want to merge FROM this
df1 = df1.sort_values (by=['A', 'B'])
df2 = df2.sort_values (by=['A', 'B'])
df1.update(df2)
Refer to: Pandas Documentation
参考:Pandas 文档

