Python 如何将另一整列作为参数传递给 pandas fillna()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30357276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:19:05  来源:igfitidea点击:

How to pass another entire column as argument to pandas fillna()

pythonpandasfillna

提问by xav

I would like to fill missing values in one column with values from another column, using fillnamethod.

我想使用fillna方法用另一列中的值填充一列中的缺失值。

(I read that looping through each row would be very bad practice and that it would be better to do everything in one go but I could not find out how to do it with fillna.)

(我读到循环遍历每一行是非常糟糕的做法,最好一次性完成所有事情,但我不知道如何使用fillna.)

Data before:

之前的数据:

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    NaN   ant

Data after:

之后的数据:

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    ant   ant

采纳答案by joris

You can provide this column to fillna(see docs), it will use those values on matching indexes to fill:

您可以将此列提供给fillna(请参阅文档),它将使用匹配索引上的这些值来填充:

In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

回答by Ami Tavory

You could do

你可以做

df.Cat1 = np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)

The overall construct on the RHS uses the ternary pattern from the pandascookbook(which it pays to read in any case). It's a vector version of a? b: c.

RHS 上的整体构造使用pandas食谱中的三元模式(无论如何都值得阅读)。它是 的矢量版本a? b: c

回答by chrisaycock

Just use the valueparameter instead of method:

只需使用value参数而不是method

In [20]: df
Out[20]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  NaN       ant    4

In [21]: df.Cat1 = df.Cat1.fillna(value=df.Cat2)

In [22]: df
Out[22]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  ant       ant    4

回答by sparrow

Here is a more general approach (fillna method is probably better)

这是一个更通用的方法(fillna 方法可能更好)

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

回答by Jeremy Z

pandas.DataFrame.combine_firstalso works.

pandas.DataFrame.combine_first也有效。

(Attention: since "Result index columns will be the union of the respective indexes and columns", you should check the index and columns are matched.)

注意:由于“结果索引列将是各自索引和列的并集”,您应该检查索引和列是否匹配。

import numpy as np
import pandas as pd
df = pd.DataFrame([["1","cat","mouse"],
    ["2","dog","elephant"],
    ["3","cat","giraf"],
    ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])

In: df["Cat1"].combine_first(df["Cat2"])
Out: 
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

Compare with other answers:

与其他答案比较:

%timeit df["Cat1"].combine_first(df["Cat2"])
181 μs ± 11.3 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Cat1'].fillna(df['Cat2'])
253 μs ± 10.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
88.1 μs ± 793 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I didn't use this method below:

我没有在下面使用这个方法:

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

because it will raise an Exception:

因为它会引发异常:

TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')

which means np.isnan can be applied to NumPy arrays of native dtype (such as np.float64), but raises TypeError when applied to objectarrays.

这意味着 np.isnan 可以应用于本机 dtype 的 NumPy 数组(例如 np.float64),但在应用于对象数组时会引发 TypeError 。

So I revise the method:

所以我修改了方法:

def is_missing(Cat1,Cat2):    
    if pd.isnull(Cat1):        
        return Cat2
    else:
        return Cat1

%timeit df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
701 μs ± 7.38 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)