Python 如何将另一整列作为参数传递给 pandas fillna()

Question

提问by xav

I would like to fill missing values in one column with values from another column, using fillnamethod.

我想使用fillna方法用另一列中的值填充一列中的缺失值。

(I read that looping through each row would be very bad practice and that it would be better to do everything in one go but I could not find out how to do it with fillna.)

（我读到循环遍历每一行是非常糟糕的做法，最好一次性完成所有事情，但我不知道如何使用fillna.）

Data before:

之前的数据：

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    NaN   ant

Data after:

之后的数据：

Day  Cat1  Cat2
1    cat   mouse
2    dog   elephant
3    cat   giraf
4    ant   ant

Answer 1

采纳答案by joris

You can provide this column to fillna(see docs), it will use those values on matching indexes to fill:

您可以将此列提供给fillna（请参阅文档），它将使用匹配索引上的这些值来填充：

In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

Answer 2

回答by Ami Tavory

You could do

你可以做

df.Cat1 = np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)

The overall construct on the RHS uses the ternary pattern from the pandascookbook(which it pays to read in any case). It's a vector version of a? b: c.

RHS 上的整体构造使用了pandas食谱中的三元模式（无论如何都值得阅读）。它是的矢量版本a? b: c。

Answer 3

回答by chrisaycock

Just use the valueparameter instead of method:

只需使用value参数而不是method：

In [20]: df
Out[20]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  NaN       ant    4

In [21]: df.Cat1 = df.Cat1.fillna(value=df.Cat2)

In [22]: df
Out[22]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  ant       ant    4

Answer 4

回答by sparrow

Here is a more general approach (fillna method is probably better)

这是一个更通用的方法（fillna 方法可能更好）

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

Answer 5

回答by Jeremy Z

pandas.DataFrame.combine_firstalso works.

pandas.DataFrame.combine_first也有效。

(Attention: since "Result index columns will be the union of the respective indexes and columns", you should check the index and columns are matched.)

（注意：由于“结果索引列将是各自索引和列的并集”，您应该检查索引和列是否匹配。）

import numpy as np
import pandas as pd
df = pd.DataFrame([["1","cat","mouse"],
    ["2","dog","elephant"],
    ["3","cat","giraf"],
    ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])

In: df["Cat1"].combine_first(df["Cat2"])
Out: 
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

Compare with other answers:

与其他答案比较：

%timeit df["Cat1"].combine_first(df["Cat2"])
181 μs ± 11.3 μs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Cat1'].fillna(df['Cat2'])
253 μs ± 10.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
88.1 μs ± 793 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I didn't use this method below:

我没有在下面使用这个方法：

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

because it will raise an Exception:

因为它会引发异常：

TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')

which means np.isnan can be applied to NumPy arrays of native dtype (such as np.float64), but raises TypeError when applied to objectarrays.

这意味着 np.isnan 可以应用于本机 dtype 的 NumPy 数组（例如 np.float64），但在应用于对象数组时会引发 TypeError 。

So I revise the method:

所以我修改了方法：

def is_missing(Cat1,Cat2):    
    if pd.isnull(Cat1):        
        return Cat2
    else:
        return Cat1

%timeit df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
701 μs ± 7.38 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Python 如何将另一整列作为参数传递给 pandas fillna()

提问by xav

采纳答案by joris

回答by Ami Tavory

回答by chrisaycock

回答by sparrow

回答by Jeremy Z

相关推荐

最近更新

标签

Python 如何将另一整列作为参数传递给 pandas fillna()

提问by xav

采纳答案by joris

回答by Ami Tavory

回答by chrisaycock

回答by sparrow

回答by Jeremy Z

相关推荐

Python Pandas：分组和平均？

如何在 IDLE 中清除 Python Shell

Python numpy 数组的最快保存和加载选项

如何在 tkinter、Python 3.2.5 的文本框中打印并让用户输入？

相关推荐

最近更新

标签