Python pandas - 特定的合并/替换

Question

提问by Colonel Beauvel

new to pandas operations, I have these two dataframes:

Pandas操作的新手，我有这两个数据框：

import pandas as pd 

df = pd.DataFrame({'name': ['a','a','b','b','c','c'], 'id':[1,2,1,2,1,2], 'val1':[0,0,0,0,0,0],'val2':[0,0,0,0,0,0],'val3':[0,0,0,0,0,0]})

   id name  val1  val2  val3
0   1    a     0     0     0
1   2    a     0     0     0
2   1    b     0     0     0
3   2    b     0     0     0
4   1    c     0     0     0
5   2    c     0     0     0

subdf = pd.DataFrame({'name': ['a','b','c'], 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]}

   id name  val1  val2
0   1    a   0.3     4
1   1    b   0.4     5
2   2    c   0.7     4

I would like to obtain as output:

我想获得作为输出：

   id name  val1  val2  val3
0   1    a   0.3     4     0
1   2    a   0.0     0     0
2   1    b   0.4     5     0
3   2    b   0.0     0     0
4   1    c   0.0     0     0
5   2    c   0.7     4     0

But I did not catch example of replacement, just additions of columns/rows from the tutorials I saw !

但是我没有发现替换的例子，只是从我看到的教程中添加了列/行！

Answer 1

回答by EdChum

This takes a couple steps, left mergeon the columns that match, this will create 'x' and 'y' where there are clashes:

这需要几个步骤，留merge在匹配的列上，这将在有冲突的地方创建 'x' 和 'y'：

In [25]:

merged = df.merge(subdf, on=['id', 'name'], how='left')
merged
Out[25]:
   id name  val1_x  val2_x  val3  val1_y  val2_y
0   1    a       0       0     0     0.3       4
1   2    a       0       0     0     NaN     NaN
2   1    b       0       0     0     0.4       5
3   2    b       0       0     0     NaN     NaN
4   1    c       0       0     0     NaN     NaN
5   2    c       0       0     0     0.7       4
In [26]:
# take the values that of interest from the clashes
merged['val1'] = np.max(merged[['val1_x', 'val1_y']], axis=1)
merged['val2'] = np.max(merged[['val2_x', 'val2_y']], axis=1)
merged
Out[26]:
   id name  val1_x  val2_x  val3  val1_y  val2_y  val1  val2
0   1    a       0       0     0     0.3       4   0.3     4
1   2    a       0       0     0     NaN     NaN   0.0     0
2   1    b       0       0     0     0.4       5   0.4     5
3   2    b       0       0     0     NaN     NaN   0.0     0
4   1    c       0       0     0     NaN     NaN   0.0     0
5   2    c       0       0     0     0.7       4   0.7     4
In [27]:
# drop the additional columns
merged = merged.drop(labels=['val1_x', 'val1_y','val2_x', 'val2_y'], axis=1)
merged
Out[27]:
   id name  val3  val1  val2
0   1    a     0   0.3     4
1   2    a     0   0.0     0
2   1    b     0   0.4     5
3   2    b     0   0.0     0
4   1    c     0   0.0     0
5   2    c     0   0.7     4

Another method would be to sort both df's on 'id' and 'name' and then call update:

另一种方法是对 'id' 和 'name' 上的 df 进行排序，然后调用update：

In [30]:

df = df.sort(columns=['id','name'])
subdf = subdf.sort(columns=['id','name'])
df.update(subdf)
df
Out[30]:
   id name  val1  val2  val3
0   1    a   0.3     4     0
2   2    c   0.7     4     0
4   1    c   0.0     0     0
1   1    b   0.4     5     0
3   2    b   0.0     0     0
5   2    c   0.0     0     0

Answer 2

回答by famaral42

Updatedversion with updatemethod. Inspired by Nic

使用update方法更新版本。灵感来自尼克

I managed to it with concatbut is not as elegant as this one below with updateand DataFrame are copied, which I believe with bigger tables could result in problems with memory and/or speed.

我设法做到了，concat但并不像下面的那样优雅，update并且复制了 DataFrame，我相信更大的表可能会导致内存和/或速度问题。

df = pd.DataFrame({'name': list('aabbcc'), 'id':[1,2]*3, 'val1':[0]*6,'val2':[0]*6,'val3':[0]*6})

subdf = pd.DataFrame({'name': list('abc'), 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]})

df.set_index(['name','id'], inplace=True)
df.update(subdf.set_index(['name','id']))
df.reset_index(inplace=True)
df

Result:

结果：

    name    id  val1    val2    val3
0   a       1   0.3     4.0     0
1   a       2   0.0     0.0     0
2   b       1   0.4     5.0     0
3   b       2   0.0     0.0     0
4   c       1   0.0     0.0     0
5   c       2   0.7     4.0     0

Minor drawback is that pandas.DataFrame.updatechanges the dtypes pointed out by JAB

小缺点是pandas.DataFrame.update变化dtypes通过指出JAB

Answer 3

回答by KuboAndTwoStrings

The sortfunction in the second part of the above answer has been deprecated. The code for users using Pandas 0.20+ for achieving the same effect is:

在sort上述回答的第二部分功能已被弃用。使用 Pandas 0.20+ 的用户实现相同效果的代码是：

df1 = pd.DataFrames(usecols=['A', 'B']) # You want to merge TO this
df2 = pd.DataFrames(usecols=['A', 'B']) # You want to merge FROM this 
df1 = df1.sort_values (by=['A', 'B'])
df2 = df2.sort_values (by=['A', 'B'])
df1.update(df2)

Refer to: Pandas Documentation

参考：Pandas 文档

Python pandas - 特定的合并/替换

提问by Colonel Beauvel

回答by EdChum

回答by famaral42

回答by KuboAndTwoStrings

相关推荐

最近更新

标签

Python pandas - 特定的合并/替换

提问by Colonel Beauvel

回答by EdChum

回答by famaral42

回答by KuboAndTwoStrings

相关推荐

从 Pandas 数组中获取 N 个最大值，索引和列标题完好无损

Pandas/Statsmodel OLS 预测未来值

pandas read_csv 编码列表列表

pandas read_csv 导入为列提供混合类型

相关推荐

最近更新

标签