pandas 如果不为空，熊猫使用值，否则使用下一列的值

Question

提问by Dance Party

Given the following dataframe:

给定以下数据框：

import pandas as pd
df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                   'COL2' : [np.nan,'A','A']})
df
    COL1    COL2
0    A      NaN
1    NaN    A
2    A      A

I would like to create a column ('COL3') that uses the value from COL1 per row unless that value is null (or NaN). If the value is null (or NaN), I'd like for it to use the value from COL2.

我想创建一个列（'COL3'），该列使用每行 COL1 的值，除非该值为空（或 NaN）。如果该值为空（或 NaN），我希望它使用来自 COL2 的值。

The desired result is:

想要的结果是：

    COL1    COL2   COL3
0    A      NaN    A
1    NaN    A      A
2    A      A      A

Thanks in advance!

提前致谢！

Answer 1

回答by Randy

In [8]: df
Out[8]:
  COL1 COL2
0    A  NaN
1  NaN    B
2    A    B

In [9]: df["COL3"] = df["COL1"].fillna(df["COL2"])

In [10]: df
Out[10]:
  COL1 COL2 COL3
0    A  NaN    A
1  NaN    B    B
2    A    B    A

Answer 2

回答by Alexander

You can use np.whereto conditionally set column values.

您可以使用np.where有条件地设置列值。

df = df.assign(COL3=np.where(df.COL1.isnull(), df.COL2, df.COL1))

>>> df
  COL1 COL2 COL3
0    A  NaN    A
1  NaN    A    A
2    A    A    A

If you don't mind mutating the values in COL2, you can update them directly to get your desired result.

如果您不介意改变 COL2 中的值，您可以直接更新它们以获得您想要的结果。

df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                   'COL2' : [np.nan,'B','B']})

>>> df
  COL1 COL2
0    A  NaN
1  NaN    B
2    A    B

df.COL2.update(df.COL1)

>>> df
  COL1 COL2
0    A    A
1  NaN    B
2    A    A

Answer 3

回答by ALollz

Using .combine_first, which gives precedence to non-null values in the Series or DataFrame calling it:

使用.combine_first，它优先考虑调用它的 Series 或 DataFrame 中的非空值：

import pandas as pd
import numpy as np

df = pd.DataFrame({'COL1': ['A', np.nan,'A'], 
                   'COL2' : [np.nan,'B','B']})

df['COL3'] = df.COL1.combine_first(df.COL2)

Output:

输出：

  COL1 COL2 COL3
0    A  NaN    A
1  NaN    B    B
2    A    B    A

Answer 4

回答by EdChum

If we mod your df slightly then you will see that this works and in fact will work for any number of columns so long as there is a single valid value:

如果我们稍微修改您的 df ，那么您会看到这有效，并且实际上只要有一个有效值，它就可以用于任意数量的列：

In [5]:
df = pd.DataFrame({'COL1': ['B', np.nan,'B'], 
                   'COL2' : [np.nan,'A','A']})
df

Out[5]:
  COL1 COL2
0    B  NaN
1  NaN    A
2    B    A

In [6]:    
df.apply(lambda x: x[x.first_valid_index()], axis=1)

Out[6]:
0    B
1    A
2    B
dtype: object

first_valid_indexwill return the index value (in this case column) that contains the first non-NaN value:

first_valid_index将返回包含第一个非 NaN 值的索引值（在本例中为列）：

In [7]:
df.apply(lambda x: x.first_valid_index(), axis=1)

Out[7]:
0    COL1
1    COL2
2    COL1
dtype: object

So we can use this to index into the series

所以我们可以用它来索引系列

pandas 如果不为空，熊猫使用值，否则使用下一列的值

提问by Dance Party

回答by Randy

回答by Alexander

回答by ALollz

回答by EdChum

相关推荐

最近更新

标签

pandas 如果不为空，熊猫使用值，否则使用下一列的值

提问by Dance Party

回答by Randy

回答by Alexander

回答by ALollz

回答by EdChum

相关推荐

在 Pandas 中按年份和 ID 求和

将两列设置为 Pandas 数据框中的索引以进行时间序列分析

pandas 将类别列表打印为列

pandas spark - 将数据帧转换为列表以提高性能

相关推荐

最近更新

标签