仅当 Pandas 中的值为空/Null 时才合并

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30296798/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:21:48  来源:igfitidea点击:

Merge Only When Value is Empty/Null in Pandas

pythonpandasmerge

提问by Jonathan Kennedy

I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.

我在 Pandas 中有两个数据帧,它们正在合并在一起 df.A 和 df.B,df.A 是原始数据,而 df.B 有我想带来的新数据。合并工作正常,正如预期的那样,我在合并的 df 中得到两列 col_x 和 col_y。

However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?

但是,在某些行中,原始 df.A 具有其他 df.B 没有的值。我的问题是,如何有选择地从 col_x 和 col_y 获取值并将它们放入新的 col 中,例如 col_z ?

Here's what I mean, how can I merge df.A:

这就是我的意思,我如何合并 df.A:

date   impressions    spend    col
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    (null)

with df.B

与 df.B

date    col
1/1/15  (null)
1/2/15  (null)
1/3/15  DEF123456

To get:

要得到:

date   impressions    spend    col_z
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    DEF123456

Any help or point in the right direction would be really appreciated!

任何帮助或指向正确方向的点都将不胜感激!

Thanks

谢谢

采纳答案by EdChum

OK assuming that your (null) values are in fact NaN values and not that string then the following works:

好的,假设您的(空)值实际上是 NaN 值而不是那个字符串,那么以下工作:

In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged

Out[10]:
        date  impressions  spend      col_x      col_y
0 2015-01-01       100000      3  ABC123456        NaN
1 2015-01-02       145000      5  ABCD00000        NaN
2 2015-01-03       300000     15        NaN  DEF123456

You can use whereto conditionally assign a value from the _x and _y columns:

您可以使用where_x 和 _y 列有条件地分配一个值:

In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged

Out[11]:
        date  impressions  spend      col_x      col_y      col_z
0 2015-01-01       100000      3  ABC123456        NaN  ABC123456
1 2015-01-02       145000      5  ABCD00000        NaN  ABCD00000
2 2015-01-03       300000     15        NaN  DEF123456  DEF123456

You can then dropthe extraneous columns:

然后drop,您可以使用无关的列:

In [13]:

merged = merged.drop(['col_x','col_y'],axis=1)
merged

Out[13]:
        date  impressions  spend      col_z
0 2015-01-01       100000      3  ABC123456
1 2015-01-02       145000      5  ABCD00000
2 2015-01-03       300000     15  DEF123456

回答by Oskar_U

IMO the shortest and yet readable solution is something like that:

IMO 最短但可读的解决方案是这样的:

df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']

What it basically does is assigning values from merged table column col_yto primary df.Atable, for those rows in colcolumn, which are empty (.isna()condition).

它的主要作用是将合并表列中的值分配col_y给主df.A表,对于列中col为空的那些行(.isna()条件)。