在多列上使用 pandas fillna()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18000019/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:03:21  来源:igfitidea点击:

Using pandas fillna() on multiple columns

pythonpandas

提问by Blas

I'm a new pandas user (as of yesterday), and have found it at times both convenient and frustrating.

我是 Pandas 的新用户(截至昨天),有时发现它既方便又令人沮丧。

My current frustration is in trying to use df.fillna() on multiple columns of a dataframe. For example, I've got two sets of data (a newer set and an older set) which partially overlap. For the cases where we have new data, I just use that, but I also want to use the older data if there isn't anything newer. It seems I should be able to use fillna() to fill the newer columns with the older ones, but I'm having trouble getting that to work.

我目前的挫败感是尝试在数据框的多列上使用 df.fillna()。例如,我有两组数据(一组较新的一组和一组较旧的)部分重叠。对于我们有新数据的情况,我只是使用它,但如果没有新数据,我也想使用旧数据。似乎我应该能够使用 fillna() 用旧的列填充新的列,但我无法让它工作。

Attempt at a specific example:

尝试在一个特定的例子:

df.ix[:,['newcolumn1','newcolumn2']].fillna(df.ix[:,['oldcolumn1','oldcolumn2']], inplace=True)

But this doesn't work as expected - numbers show up in the new columns that had been NaNs, but not the ones that were in the old columns (in fact, looking through the data, I have no idea where the numbers it picked came from, as they don't exist in either the new or old data anywhere).

但这并没有像预期的那样工作 - 数字显示在新的 NaN 列中,而不是旧列中的那些(事实上,查看数据,我不知道它选择的数字来自哪里来自,因为它们不存在于任何地方的新数据或旧数据中)。

Is there a way to fill in NaNs of specific columns in a DataFrame with vales from other specific columns of the DataFrame?

有没有办法用来自 DataFrame 的其他特定列的值填充 DataFrame 中特定列的 NaN?

采纳答案by TomAugspurger

To answer your question: yes. Look at using the valueargument of fillna. Along with the to_dict()method on the other dataframe.

回答你的问题:是的。看看使用valuefillna的参数。与to_dict()其他数据帧上的方法一起。

But to really solve your problem, have a look at the update()method of the DataFrame. Assuming your two dataframes are similarly indexed, I think it's exactly what you want.

但是要真正解决您的问题,请查看update()DataFrame的方法。假设您的两个数据帧具有类似的索引,我认为这正是您想要的。

In [36]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})

In [37]: df
Out[37]: 
    A   B
0   0   1
1 NaN   0
2   2   1
3   3 NaN
4 NaN NaN
5   5   1

In [38]: df2 = pd.DataFrame({'A': [0, np.nan, 2, 3, 4, 5], 'B': [1, 0, 1, 1, 0, 0]})

In [40]: df2
Out[40]: 
    A  B
0   0  1
1 NaN  0
2   2  1
3   3  1
4   4  0
5   5  0

In [52]: df.update(df2, overwrite=False)

In [53]: df
Out[53]: 
    A  B
0   0  1
1 NaN  0
2   2  1
3   3  1
4   4  0
5   5  1

Notice that all the NaNs in dfwere replaced except for (1, A)since that was also NaNin df2. Also some of the values like (5, B)differed between dfand df2. By using overwrite=Falseit keeps the value from df.

请注意,所有的NaNsdf都被替换了,(1, A)因为它也在NaNdf2。还有一些值(5, B)df和之间有所不同df2。通过使用overwrite=False它可以保持值从df.

EDIT: Based on comments it seems like your looking for a solution where the column names don't match over the two DataFrames (It'd be helpful if you posted sample data). Let's try that, replacing column A with C and B with D.

编辑:根据评论,您似乎正在寻找列名在两个 DataFrame 上不匹配的解决方案(如果您发布示例数据会有所帮助)。让我们尝试一下,将 A 列替换为 C,将 B 列替换为 D。

In [33]: df = pd.DataFrame({'A': [0, np.nan, 2, 3, np.nan, 5], 'B': [1, 0, 1, np.nan, np.nan, 1]})

In [34]: df2 = pd.DataFrame({'C': [0, np.nan, 2, 3, 4, 5], 'D': [1, 0, 1, 1, 0, 0]})

In [35]: df
Out[35]: 
    A   B
0   0   1
1 NaN   0
2   2   1
3   3 NaN
4 NaN NaN
5   5   1

In [36]: df2
Out[36]: 
    C  D
0   0  1
1 NaN  0
2   2  1
3   3  1
4   4  0
5   5  0

In [37]: d = {'A': df2.C, 'B': df2.D}  # pass this values in fillna

In [38]: df
Out[38]: 
    A   B
0   0   1
1 NaN   0
2   2   1
3   3 NaN
4 NaN NaN
5   5   1

In [40]: df.fillna(value=d)
Out[40]: 
    A  B
0   0  1
1 NaN  0
2   2  1
3   3  1
4   4  0
5   5  1

I think if you invest the time to learn pandas you'll hit fewer moments of frustration. It's a massive library though, so it takes time.

我认为如果你花时间学习Pandas,你会遇到更少的挫折。不过,这是一个庞大的图书馆,因此需要时间。

回答by Justin

fillnais generally for carrying an observation forward or backward. Instead, I'd use np.where... If I understand what you're asking.

fillna通常用于向前或向后进行观察。相反,我会使用np.where......如果我明白你在问什么。

import numpy as np
np.where(np.isnan(df['newcolumn1']), df['oldcolumn1'], df['newcolumn1'])