向 Pandas 数据框添加新行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22648591/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Add new rows to a pandas dataframe
提问by msakya
I have two dataframes df1and df2that that were computed from the same source, but with different methods, thus most of the values are same, with some differences. Now, I want to update df1based on values in df2.
我有两个dataframesdf1并df2从同一个源进行了计算的是,但具有不同的方法,从而大部分值是相同的,具有一定的差异。现在,我想df1根据df2.
For example:
例如:
df1 = pd.DataFrame({'name':['john','deb','john','deb'], 'col1':[490,500,425,678], 'col2':[456,625,578,789],'col3':['TN','OK','OK','NY']})
 name col1 col2 col3
 john  490  456  TN
 deb   500  625  OK
 john  425  578  OK
 deb   678  789  NY
df2 = pd.DataFrame({'name':['deb','john','deb','john','deb'], 'col1':[400,490,500,425,678], 'col2':[225,456,625,578,789],'col3':['TN','TN','OK','OK','NY']})
 name col1 col2 col3
  deb  400  225  TN
 john  490  456  TN
  deb  500  625  OK
 john  425  578  OK
 deb   678  789  NY
So, in this case .appendshould append only the first row from df2to df1. So, only if there is a new row in df2that is not present in df1(based on name and col3) that column will be added/updated, else it wont be.
因此,在这种情况下,.append应该只附加从df2to的第一行df1。因此,只有df2在df1(基于名称和col3)中不存在新行时,才会添加/更新该列,否则不会。
This almost seems like something that concatshould do.
这几乎似乎是concat应该做的事情。
回答by firelynx
There are two ways of acheiving your result.
有两种方法可以获得结果。
- Concat both dataframes, then drop duplicates
- Using an outer join/merge, then drop duplicates
- 连接两个数据帧,然后删除重复项
- 使用外部连接/合并,然后删除重复项
I will show you both.
我会告诉你们两个。
Concat then Drop
连接然后删除
This should be more CPU friendly
这应该对CPU更友好
df3 = pd.concat([df1,df2])
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')
This method is possibly more memory intensive than an outer join because at one point you are holding df1, df2and the result of the concatination of both [df1, df2](df3) in memory.
这种方法可能比外连接占用更多内存,因为在某一时刻您持有df1,df2以及内存中两个[df1, df2]( df3) 连接的结果。
Outer join then Drop
外连接然后丢弃
This should be more memory friendly.
这应该对内存更友好。
df3 = df1.merge(df2, on=list(df1), how='outer')
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')
Doing an outerjoin will make sure you get all entries from both dataframes, but df3will be smaller than in the case where we use concat.
进行outer连接将确保您从两个数据帧中获取所有条目,但df3会比我们使用concat.
Version 0.15 and older note:
0.15 及更早版本注意:
The keyword keep='last'used to be take_last=True
关键字keep='last'曾经是take_last=True

