pandas 基于另一个 DataFrame 更新 DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22027533/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Updating a DataFrame based on another DataFrame
提问by nutship
Given DataFrame df:
给定数据帧df:
    Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN
and update:
和update:
    Id Sex  Group  Time
0  21   M      2  2.36
1   2   F      2  2.09
2   3   F      1  1.79
I want to match on Id, Sexand Groupand either update Time!with Timevalue (from the updatedf) if match, or insert if a new record.
我想匹配Id,Sex并且Group,要么更新Time!与Time值(从update如果要是一个新的记录,DF)的比赛,或插入。
Here is how I do it:
这是我如何做到的:
df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])
for i, row in update.iterrows():
    if i in df.index:  # update
        df.ix[i, 'Time!'] = row['Time']
    else:              # insert new record
        cols = up.columns.values 
        row = np.array(row).reshape(1, len(row))
        _ = pd.DataFrame(row, index=[i], columns=cols)
       df = df.append(_)
print df
              Time  Time!
Id Sex Group             
21 M   2      2.31   2.36
2  F   2      2.29   2.09
3  F   1      1.79    NaN
The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional
该代码似乎有效,我希望的结果与上述相符。但是,我注意到这在大数据集上表现不佳,条件是
if i in df.index:
    ...
else:
    ...
working obviously wrong (it would proceed to elseand vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow). 
工作显然是错误的(它会继续进行else,反之亦然,我想,这个 MultiIndex 可能是某种原因)。
So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?
所以我的问题是,你知道任何其他方式,或者我的更强大的版本,基于另一个 df 更新一个 df 吗?
回答by Andy Hayden
I think I would do this with a merge, and then update the columns with a where. First remove the Time column from up:
我想我会通过合并来做到这一点,然后用 where 更新列。首先从上面删除时间列:
In [11]: times = up.pop('Time')  # up = the update DataFrame
In [12]: df1 = df.merge(up, how='outer')
In [13]: df1
Out[13]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31    NaN
1   2   F      2  2.29    NaN
2   3   F      1   NaN    NaN
Update Time if it's not NaN and Time! if it's NaN:
如果不是 NaN 和时间,则更新时间!如果是 NaN:
In [14]: df1['Time!'] = df1['Time'].where(df1['Time'].isnull(), times)
In [15]: df1['Time'] = df1['Time'].where(df1['Time'].notnull(), times)
In [16]: df1
Out[16]:
   Id Sex  Group  Time  Time!
0  21   M      2  2.31   2.36
1   2   F      2  2.29   2.09
2   3   F      1  1.79    NaN

