pandas 熊猫 - 与缺失值合并

Question

提问by aensm

There appears to be a quirk with the pandas merge function. It considers NaNvalues to be equal, and will merge NaNs with other NaNs:

Pandas合并功能似乎有一个怪癖。它认为NaN值相等，并将NaNs 与其他NaNs合并：

>>> foo = DataFrame([
    ['a',1,2],
    ['b',4,5],
    ['c',7,8],
    [np.NaN,10,11]
], columns=['id','x','y'])

>>> bar = DataFrame([
    ['a',3],
    ['c',9],
    [np.NaN,12]
], columns=['id','z'])

>>> pd.merge(foo, bar, how='left', on='id')
Out[428]: 
    id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11  12

[4 rows x 4 columns]

This is unlike any RDB I've seen, normally missing values are treated with agnosticism and won't be merged together as if they are equal. This is especially problematic for datasets with sparse data (every NaN will be merged to every other NaN, resulting in a huge DataFrame!)

这与我见过的任何 RDB 都不一样，通常缺失值会被视为不可知论，并且不会被合并在一起，就好像它们是相等的一样。这对于具有稀疏数据的数据集尤其成问题（每个 NaN 都将合并到每个其他 NaN，从而产生一个巨大的 DataFrame！）

Is there a way to ignore missing values during a merge without first slicing them out?

有没有办法在合并过程中忽略缺失值而不先将它们切掉？

Answer 1

回答by meloncholy

You could exclude values from bar(and indeed fooif you wanted) where idis null during the merge. Not sure it's what you're after, though, as they are sliced out.

您可以在合并期间从bar（并且确实foo如果您愿意） where idis null 中排除值。不过，不确定这是你想要的，因为它们被切掉了。

(I've assumed from your left join that you're interested in retaining all of foo, but only want to merge the parts of barthat match and are not null.)

（我从您的左连接假设您有兴趣保留所有foo，但只想合并该bar匹配项的部分并且不为空。）

foo.merge(bar[pd.notnull(bar.id)], how='left', on='id')

Out[11]: 
id   x   y   z
0    a   1   2   3
1    b   4   5 NaN
2    c   7   8   9
3  NaN  10  11 NaN

Answer 2

回答by Liang

if do not need NaN in both left and right DF, use

如果在左右 DF 中都不需要 NaN，请使用

pd.merge(foo.dropna(), bar.dropna(), how='left', on='id')

else if need NaN in left DF, use

否则，如果在左 DF 中需要 NaN，请使用

pd.merge(foo, bar.dropna(), how='left', on='id')

Answer 3

回答by yosemite_k

If You want to preserve the NaNs from both tables without slicing them out, you could use the outer join method as follows:

如果您想保留两个表中的 NaN 而不将它们切掉，您可以使用外连接方法，如下所示：

pd.merge(foo, bar.dropna(), how='outer', on='id')

It basically returns the union of fooand bar

它基本上返回foo和bar

pandas 熊猫 - 与缺失值合并

提问by aensm

回答by meloncholy

回答by Liang

回答by yosemite_k

相关推荐

最近更新

标签

pandas 熊猫 - 与缺失值合并

提问by aensm

回答by meloncholy

回答by Liang

回答by yosemite_k

相关推荐

pandas 熊猫：.groupby().size() 和百分比

使用 Numba 处理 Pandas DataFrame 时间序列的有效方法

按单列对 Pandas 数据框进行总和分组

pandas 从 Yahoo! 加载数据 熊猫理财

相关推荐

最近更新

标签

pandas 从 Yahoo! 加载数据熊猫理财