pandas combine_first 和 fillna 有什么区别？

Question

提问by kjmerf

These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use one over the other?

这两个功能在我看来是等价的。您可以在下面的代码中看到它们实现了相同的目标，因为 c 列和 d 列相等。那么我什么时候应该使用一个？

Here is an example:

下面是一个例子：

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))
df.loc[::2, 'a'] = np.nan

Returns:

返回：

     a  b
0  NaN  4
1  2.0  6
2  NaN  8
3  0.0  4
4  NaN  4
5  0.0  8
6  NaN  7
7  2.0  2
8  NaN  9
9  7.0  2

This is my starting point. Now I will add two columns, one using combine_first and one using fillna, and they will produce the same result:

这是我的出发点。现在我将添加两列，一列使用 combine_first，另一列使用 fillna，它们将产生相同的结果：

df['c'] = df.a.combine_first(df.b)
df['d'] = df['a'].fillna(df['b'])

Returns:

返回：

     a  b    c    d
0  NaN  4  4.0  4.0
1  8.0  7  8.0  8.0
2  NaN  2  2.0  2.0
3  3.0  0  3.0  3.0
4  NaN  0  0.0  0.0
5  2.0  4  2.0  2.0
6  NaN  0  0.0  0.0
7  2.0  6  2.0  2.0
8  NaN  4  4.0  4.0
9  4.0  6  4.0  4.0

Credit to this question for the data set: Combine Pandas data frame column values into new column

归功于数据集的这个问题：Combining Pandas data frame column values into new column

Answer 1

回答by piRSquared

combine_firstis intended to be used when there is exists non-overlapping indices. It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.

combine_first旨在在存在非重叠索引时使用。它将有效地填充空值以及为第一个中不存在的索引和列提供值。

dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])

     w    x    y  
a  1.0  2.0  3.0  
b  4.0  NaN  5.0  

dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])

     x    y    z
b  1.0  2.0  3.0
c  3.0  4.0  5.0

dfa.combine_first(dfb)

     w    x    y    z
a  1.0  2.0  3.0  NaN
b  4.0  1.0  5.0  3.0  # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
c  NaN  3.0  4.0  5.0  # whole new index

Notice that all indices and columns are included in the results

请注意，所有索引和列都包含在结果中

Now if we fillna

现在如果我们 fillna

dfa.fillna(dfb)

   w    x  y
a  1  2.0  3
b  4  1.0  5  # 1.0 filled in from `dfb`

Notice no new columns or indices from dfbare included. We only filled in the null value where dfashared index and column information.

请注意，不包含新的列或索引dfb。我们只在dfa共享索引和列信息的地方填空值。

In your case, you use fillnaand combine_firston one column with the same index. These translate to effectively the same thing.

在您的情况下，您在具有相同索引的一列上使用fillna和combine_first。这些转化为有效的同一件事。

pandas combine_first 和 fillna 有什么区别？

提问by kjmerf

回答by piRSquared

相关推荐

最近更新

标签

pandas combine_first 和 fillna 有什么区别？

提问by kjmerf

回答by piRSquared

相关推荐

Pandas 枢轴产生“ValueError：索引包含重复条目，无法重塑”

pandas 错误 'AttributeError:'DataFrameGroupBy' 对象没有属性'而数据帧上的 groupby 功能

pandas ParserError：标记数据时出错。C 错误：第 2624 行预期有 2503 个字段，看到 52523

如何绘制 pandas.crosstab() 列

相关推荐

最近更新

标签