pandas 如果不为空,熊猫使用值,否则使用下一列的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35530640/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Use Value if Not Null, Else Use Value From Next Column
提问by Dance Party
Given the following dataframe:
给定以下数据框:
import pandas as pd
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'A','A']})
df
COL1 COL2
0 A NaN
1 NaN A
2 A A
I would like to create a column ('COL3') that uses the value from COL1 per row unless that value is null (or NaN). If the value is null (or NaN), I'd like for it to use the value from COL2.
我想创建一个列('COL3'),该列使用每行 COL1 的值,除非该值为空(或 NaN)。如果该值为空(或 NaN),我希望它使用来自 COL2 的值。
The desired result is:
想要的结果是:
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
Thanks in advance!
提前致谢!
回答by Randy
In [8]: df
Out[8]:
COL1 COL2
0 A NaN
1 NaN B
2 A B
In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])
In [10]: df
Out[10]:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
回答by Alexander
You can use np.where
to conditionally set column values.
您可以使用np.where
有条件地设置列值。
df = df.assign(COL3=np.where(df.COL1.isnull(), df.COL2, df.COL1))
>>> df
COL1 COL2 COL3
0 A NaN A
1 NaN A A
2 A A A
If you don't mind mutating the values in COL2, you can update them directly to get your desired result.
如果您不介意改变 COL2 中的值,您可以直接更新它们以获得您想要的结果。
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
>>> df
COL1 COL2
0 A NaN
1 NaN B
2 A B
df.COL2.update(df.COL1)
>>> df
COL1 COL2
0 A A
1 NaN B
2 A A
回答by ALollz
Using .combine_first
, which gives precedence to non-null values in the Series or DataFrame calling it:
使用.combine_first
,它优先考虑调用它的 Series 或 DataFrame 中的非空值:
import pandas as pd
import numpy as np
df = pd.DataFrame({'COL1': ['A', np.nan,'A'],
'COL2' : [np.nan,'B','B']})
df['COL3'] = df.COL1.combine_first(df.COL2)
Output:
输出:
COL1 COL2 COL3
0 A NaN A
1 NaN B B
2 A B A
回答by EdChum
If we mod your df slightly then you will see that this works and in fact will work for any number of columns so long as there is a single valid value:
如果我们稍微修改您的 df ,那么您会看到这有效,并且实际上只要有一个有效值,它就可以用于任意数量的列:
In [5]:
df = pd.DataFrame({'COL1': ['B', np.nan,'B'],
'COL2' : [np.nan,'A','A']})
df
Out[5]:
COL1 COL2
0 B NaN
1 NaN A
2 B A
In [6]:
df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[6]:
0 B
1 A
2 B
dtype: object
first_valid_index
will return the index value (in this case column) that contains the first non-NaN value:
first_valid_index
将返回包含第一个非 NaN 值的索引值(在本例中为列):
In [7]:
df.apply(lambda x: x.first_valid_index(), axis=1)
Out[7]:
0 COL1
1 COL2
2 COL1
dtype: object
So we can use this to index into the series
所以我们可以用它来索引系列