如何在 Pandas 中合并两个数据框以替换 nan

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25095971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:19:27  来源:igfitidea点击:

How to merge two dataframe in pandas to replace nan

pythonpandasmergenan

提问by Camilo Avella

I want to do this in pandas:

我想在Pandas中做到这一点:

I have 2 dataframes, A and B, I want to replace only NaN of A with B values.

我有 2 个数据帧,A 和 B,我只想用 B 值替换 A 的 NaN。

A                                                
2014-04-17 12:59:00  146.06250  146.0625  145.93750  145.93750
2014-04-17 13:00:00  145.90625  145.9375  145.87500  145.90625
2014-04-17 13:01:00  145.90625       NaN  145.90625        NaN
2014-04-17 13:02:00        NaN       NaN  145.93750  145.96875

B
2014-04-17 12:59:00   146 2/32   146 2/32  145 30/32  145 30/32
2014-04-17 13:00:00  145 29/32  145 30/32  145 28/32  145 29/32
2014-04-17 13:01:00  145 29/32        146  145 29/32        147
2014-04-17 13:02:00        146        146  145 30/32  145 31/32

Result:
2014-04-17 12:59:00  146.06250  146.0625  145.93750  145.93750
2014-04-17 13:00:00  145.90625  145.9375  145.87500  145.90625
2014-04-17 13:01:00  145.90625       146  145.90625        147
2014-04-17 13:02:00        146       146  145.93750  145.96875

Thx in advance

提前谢谢

回答by FooBar

The official way promoted exactly to do this is A.combine_first(B). Further information are in the official documentation.

官方推广的方式正是做到这一点A.combine_first(B)。更多信息在官方文档中

However, it gets outperformed massively with large databases from A.fillna(B)(performed tests with 25000 elements):

但是,它在大型数据库中的表现大大优于A.fillna(B)(使用 25000 个元素进行了测试):

In[891]: %timeit df.fillna(df2)
1000 loops, best of 3: 333 μs per loop
In[892]: %timeit df.combine_first(df2)
100 loops, best of 3: 2.15 ms per loop
In[894]: (df.fillna(df2) == df.combine_first(df2)).all().all()
Out[890]: True

回答by wwii

  • Get the numpy arrays for A and B.
  • Make a mask of A where A == numpy.NaN
  • Assign B to A using the mask as a boolean index for both.
  • 获取 A 和 B 的 numpy 数组。
  • 制作 A 的掩码,其中 A == numpy.NaN
  • 使用掩码作为两者的布尔索引将 B 分配给 A。

Similar to this:

与此类似:

>>> a
array([[  0.,   1.,   2.],
       [  3.,  nan,   5.],
       [  6.,   7.,   8.]], dtype=float16)
>>> b
array([[ 1000.,  1000.,  1000.],
       [ 1000.,  1000.,  1000.],
       [ 1000.,  1000.,  1000.]])
>>> mask = np.isnan(a)
>>> mask
array([[False, False, False],
       [False,  True, False],
       [False, False, False]], dtype=bool)
>>> a[mask] = b[mask]
>>> a
array([[    0.,     1.,     2.],
       [    3.,  1000.,     5.],
       [    6.,     7.,     8.]], dtype=float16)

Alternatively, use numpy.where():

或者,使用numpy.where()

>>> a
array([[  0.,   1.,   2.],
       [  3.,  nan,   5.],
       [  6.,   7.,   8.]], dtype=float16)
>>> a = np.where(np.isnan(a), b, a)
>>> a
array([[    0.,     1.,     2.],
       [    3.,  1000.,     5.],
       [    6.,     7.,     8.]])
>>>

https://stackoverflow.com/a/13062410/2823755suggests the first (boolean indexing) method may work with the dataframe itself. ... and it does (wasn't satisfied, so i installed pandas):

https://stackoverflow.com/a/13062410/2823755建议第一种(布尔索引)方法可能适用于数据帧本身。...它确实(不满意,所以我安装了Pandas):

>>> a = pandas.DataFrame(np.arange(25, dtype = np.float16).reshape(5,5))
>>> a.values[3,2] = np.NaN
>>> b = pandas.DataFrame(np.arange(1000, 1025, dtype = np.float16).reshape(5,5))
>>> a[np.isnan(a)] = b[np.isnan(a)]
>>> a
    0   1     2   3   4
0   0   1     2   3   4
1   5   6     7   8   9
2  10  11    12  13  14
3  15  16  1017  18  19
4  20  21    22  23  24
>>> 

pandas.DataFrame.wherealso works.

pandas.DataFrame.where也有效。

a.where(~np.isnan(a), other = b, inplace = True)