仅当 Pandas 中的值为空/Null 时才合并

Question

提问by Jonathan Kennedy

I have two dataframes in Pandas which are being merged together df.A and df.B, df.A is the original, and df.B has the new data I want to bring over. The merge works fine and as expected I get two columns col_x and col_y in the merged df.

我在 Pandas 中有两个数据帧，它们正在合并在一起 df.A 和 df.B，df.A 是原始数据，而 df.B 有我想带来的新数据。合并工作正常，正如预期的那样，我在合并的 df 中得到两列 col_x 和 col_y。

However, in some rows, the original df.A has values where the other df.B does not. My question is, how can I selectively take the values from col_x and col_y and place them into a new col such as col_z ?

但是，在某些行中，原始 df.A 具有其他 df.B 没有的值。我的问题是，如何有选择地从 col_x 和 col_y 获取值并将它们放入新的 col 中，例如 col_z ？

Here's what I mean, how can I merge df.A:

这就是我的意思，我如何合并 df.A：

date   impressions    spend    col
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    (null)

with df.B

与 df.B

date    col
1/1/15  (null)
1/2/15  (null)
1/3/15  DEF123456

To get:

要得到：

date   impressions    spend    col_z
1/1/15 100000         3.00     ABC123456
1/2/15 145000         5.00     ABCD00000
1/3/15 300000         15.00    DEF123456

Any help or point in the right direction would be really appreciated!

任何帮助或指向正确方向的点都将不胜感激！

Thanks

谢谢

Answer 1

采纳答案by EdChum

OK assuming that your (null) values are in fact NaN values and not that string then the following works:

好的，假设您的（空）值实际上是 NaN 值而不是那个字符串，那么以下工作：

In [10]:
# create the merged df
merged = dfA.merge(dfB, on='date')
merged

Out[10]:
        date  impressions  spend      col_x      col_y
0 2015-01-01       100000      3  ABC123456        NaN
1 2015-01-02       145000      5  ABCD00000        NaN
2 2015-01-03       300000     15        NaN  DEF123456

You can use whereto conditionally assign a value from the _x and _y columns:

您可以使用where_x 和 _y 列有条件地分配一个值：

In [11]:
# now create col_z using where
merged['col_z'] = merged['col_x'].where(merged['col_x'].notnull(), merged['col_y'])
merged

Out[11]:
        date  impressions  spend      col_x      col_y      col_z
0 2015-01-01       100000      3  ABC123456        NaN  ABC123456
1 2015-01-02       145000      5  ABCD00000        NaN  ABCD00000
2 2015-01-03       300000     15        NaN  DEF123456  DEF123456

You can then dropthe extraneous columns:

然后drop，您可以使用无关的列：

In [13]:

merged = merged.drop(['col_x','col_y'],axis=1)
merged

Out[13]:
        date  impressions  spend      col_z
0 2015-01-01       100000      3  ABC123456
1 2015-01-02       145000      5  ABCD00000
2 2015-01-03       300000     15  DEF123456

Answer 2

回答by Oskar_U

IMO the shortest and yet readable solution is something like that:

IMO 最短但可读的解决方案是这样的：

df.A.loc[df.A['col'].isna(), 'col'] = df.A.merge(df.B, how='left', on='date')['col_y']

What it basically does is assigning values from merged table column col_yto primary df.Atable, for those rows in colcolumn, which are empty (.isna()condition).

它的主要作用是将合并表列中的值分配col_y给主df.A表，对于列中col为空的那些行（.isna()条件）。

仅当 Pandas 中的值为空/Null 时才合并

提问by Jonathan Kennedy

采纳答案by EdChum

回答by Oskar_U

相关推荐

最近更新

标签

仅当 Pandas 中的值为空/Null 时才合并

提问by Jonathan Kennedy

采纳答案by EdChum

回答by Oskar_U

相关推荐

pandas sklearn.cross_validation.StratifiedShuffleSplit - 错误：“索引越界”

pandas NetworkX From_Pandas_dataframe

pandas 熊猫离开加入并更新现有列

pandas 使用日期时间索引插入和填充熊猫数据框

相关推荐

最近更新

标签