Pandas:从列列表中检查 df 中是否存在列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/52934960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:06:22  来源:igfitidea点击:

Pandas: Check if column exists in df from a list of columns

pythonpandas

提问by Raj

Goal here is to find the columns that does not exist in df and create them with null values.

这里的目标是找到 df 中不存在的列并用空值创建它们。

I have a list of column names like below:

我有一个列名列表,如下所示:

column_list = ('column_1', 'column_2', 'column_3')

When I try to check if the column exists, it gives out True for only columns that exist and do not get False for those that are missing.

当我尝试检查该列是否存在时,它仅对存在的列给出 True,对于那些缺失的列不给出 False。

for column in column_list:
    print df.columns.isin(column_list).any()

In PySpark, I can achieve this using the below:

在 PySpark 中,我可以使用以下方法实现这一点:

for column in column_list:
        if not column in df.columns:
            df = df.withColumn(column, lit(''))

How can I achieve the same using Pandas?

如何使用 Pandas 实现相同的目标?

回答by quest

Here is how I would approach:

这是我的方法:

import numpy as np

for col in column_list:
    if col not in df.columns:
        df[col] = np.nan

回答by rafaelc

Using np.isin, assignand unpacking kwargs

使用np.isinassign拆包kwargs

s = np.isin(column_list, df.columns)
df = df.assign(**{k:None for k in np.array(column_list)[~s]})