向 Pandas 数据框添加新行

Question

提问by msakya

I have two dataframes df1and df2that that were computed from the same source, but with different methods, thus most of the values are same, with some differences. Now, I want to update df1based on values in df2.

我有两个dataframesdf1并df2从同一个源进行了计算的是，但具有不同的方法，从而大部分值是相同的，具有一定的差异。现在，我想df1根据df2.

For example:

例如：

df1 = pd.DataFrame({'name':['john','deb','john','deb'], 'col1':[490,500,425,678], 'col2':[456,625,578,789],'col3':['TN','OK','OK','NY']})
 name col1 col2 col3
 john  490  456  TN
 deb   500  625  OK
 john  425  578  OK
 deb   678  789  NY

df2 = pd.DataFrame({'name':['deb','john','deb','john','deb'], 'col1':[400,490,500,425,678], 'col2':[225,456,625,578,789],'col3':['TN','TN','OK','OK','NY']})
 name col1 col2 col3
  deb  400  225  TN
 john  490  456  TN
  deb  500  625  OK
 john  425  578  OK
 deb   678  789  NY

So, in this case .appendshould append only the first row from df2to df1. So, only if there is a new row in df2that is not present in df1(based on name and col3) that column will be added/updated, else it wont be.

因此，在这种情况下，.append应该只附加从df2to的第一行df1。因此，只有df2在df1（基于名称和col3）中不存在新行时，才会添加/更新该列，否则不会。

This almost seems like something that concatshould do.

这几乎似乎是concat应该做的事情。

Answer 1

回答by firelynx

There are two ways of acheiving your result.

有两种方法可以获得结果。

Concat both dataframes, then drop duplicates
Using an outer join/merge, then drop duplicates

连接两个数据帧，然后删除重复项
使用外部连接/合并，然后删除重复项

I will show you both.

我会告诉你们两个。

Concat then Drop

连接然后删除

This should be more CPU friendly

这应该对CPU更友好

df3 = pd.concat([df1,df2])
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')

This method is possibly more memory intensive than an outer join because at one point you are holding df1, df2and the result of the concatination of both [df1, df2](df3) in memory.

这种方法可能比外连接占用更多内存，因为在某一时刻您持有df1，df2以及内存中两个[df1, df2]( df3) 连接的结果。

Outer join then Drop

外连接然后丢弃

This should be more memory friendly.

这应该对内存更友好。

df3 = df1.merge(df2, on=list(df1), how='outer')
df3.drop_duplicates(subset=['name', 'col3'], inplace=True, keep='last')

Doing an outerjoin will make sure you get all entries from both dataframes, but df3will be smaller than in the case where we use concat.

进行outer连接将确保您从两个数据帧中获取所有条目，但df3会比我们使用concat.

Version 0.15 and older note:

0.15 及更早版本注意：

The keyword keep='last'used to be take_last=True

关键字keep='last'曾经是take_last=True

向 Pandas 数据框添加新行

提问by msakya

回答by firelynx

Version 0.15 and older note:

0.15 及更早版本注意：

相关推荐

最近更新

标签

向 Pandas 数据框添加新行

提问by msakya

回答by firelynx

Version 0.15 and older note:

0.15 及更早版本注意：

相关推荐

根据 Pandas 中的组大小对分组数据进行排序

使用 pandas 和 matplotlib 的词频

pandas Matplotlib 条形图选择颜色，如果值是正值 vs 值是负值

pandas 大熊猫在群体中的百分位排名

相关推荐

最近更新

标签