pandas combine_first 和 fillna 有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46676134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between combine_first and fillna?
提问by kjmerf
These two functions seem equivalent to me. You can see that they accomplish the same goal in the code below, as columns c and d are equal. So when should I use one over the other?
这两个功能在我看来是等价的。您可以在下面的代码中看到它们实现了相同的目标,因为 c 列和 d 列相等。那么我什么时候应该使用一个?
Here is an example:
下面是一个例子:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 10, size=(10, 2)), columns=list('ab'))
df.loc[::2, 'a'] = np.nan
Returns:
返回:
a b
0 NaN 4
1 2.0 6
2 NaN 8
3 0.0 4
4 NaN 4
5 0.0 8
6 NaN 7
7 2.0 2
8 NaN 9
9 7.0 2
This is my starting point. Now I will add two columns, one using combine_first and one using fillna, and they will produce the same result:
这是我的出发点。现在我将添加两列,一列使用 combine_first,另一列使用 fillna,它们将产生相同的结果:
df['c'] = df.a.combine_first(df.b)
df['d'] = df['a'].fillna(df['b'])
Returns:
返回:
a b c d
0 NaN 4 4.0 4.0
1 8.0 7 8.0 8.0
2 NaN 2 2.0 2.0
3 3.0 0 3.0 3.0
4 NaN 0 0.0 0.0
5 2.0 4 2.0 2.0
6 NaN 0 0.0 0.0
7 2.0 6 2.0 2.0
8 NaN 4 4.0 4.0
9 4.0 6 4.0 4.0
Credit to this question for the data set: Combine Pandas data frame column values into new column
归功于数据集的这个问题:Combining Pandas data frame column values into new column
回答by piRSquared
combine_first
is intended to be used when there is exists non-overlapping indices. It will effectively fill in nulls as well as supply values for indices and columns that didn't exist in the first.
combine_first
旨在在存在非重叠索引时使用。它将有效地填充空值以及为第一个中不存在的索引和列提供值。
dfa = pd.DataFrame([[1, 2, 3], [4, np.nan, 5]], ['a', 'b'], ['w', 'x', 'y'])
w x y
a 1.0 2.0 3.0
b 4.0 NaN 5.0
dfb = pd.DataFrame([[1, 2, 3], [3, 4, 5]], ['b', 'c'], ['x', 'y', 'z'])
x y z
b 1.0 2.0 3.0
c 3.0 4.0 5.0
dfa.combine_first(dfb)
w x y z
a 1.0 2.0 3.0 NaN
b 4.0 1.0 5.0 3.0 # 1.0 filled from `dfb`; 5.0 was in `dfa`; 3.0 new column
c NaN 3.0 4.0 5.0 # whole new index
Notice that all indices and columns are included in the results
请注意,所有索引和列都包含在结果中
Now if we fillna
现在如果我们 fillna
dfa.fillna(dfb)
w x y
a 1 2.0 3
b 4 1.0 5 # 1.0 filled in from `dfb`
Notice no new columns or indices from dfb
are included. We only filled in the null value where dfa
shared index and column information.
请注意,不包含新的列或索引dfb
。我们只在dfa
共享索引和列信息的地方填空值。
In your case, you use fillna
and combine_first
on one column with the same index. These translate to effectively the same thing.
在您的情况下,您在具有相同索引的一列上使用fillna
和combine_first
。这些转化为有效的同一件事。