检查 Pandas 数据框列列表中的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25025621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:18:18  来源:igfitidea点击:

check for value in list of Pandas data frame columns

pythonpandas

提问by Vincent

I have a pandas dataframe data that looks like this

我有一个看起来像这样的Pandas数据框数据

    MED1    MED2    MED3    MED4    MED5
0   60735   24355   33843   16475   9995
1   10126   5789    17165   90000   90000
2   5789    19675   30553   90000   90000
3   60735   17865   34495   90000   90000
4   19675   5810    90000   90000   90000

?I want to create a new bool column "med" that has True/False based on ?60735 in the columns MED1...MED5 I am trying this and am not sure how to make it work.

?我想创建一个新的布尔列“med”,它基于 MED1...MED5 列中的 ?60735 具有 True/False 我正在尝试此操作,但不确定如何使其工作。

DF['med'] = (60735 in [DF['MED1'], DF['MED2']])

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

MED1..MED5 represent drugs being taken by a patient at a hospital visit. I have a list of about 20 drugs for which I need to know if the patien was taking them. Each drug is coded with a number but has a name. A nice solution would look something like (below) but how do I do this with pandas.

MED1..MED5 代表患者在医院就诊时服用的药物。我有一份大约 20 种药物的清单,我需要知道患者是否正在服用这些药物。每种药物都用一个数字编码,但有一个名称。一个不错的解决方案看起来像(如下),但我如何用Pandas来做到这一点。

drugs = {'drug1':60735, 'drug2':5789}  
for n in drugs.keys():
    DF[n] = drugs[n] in DF[['MED1', 'MED2', 'MED3', 'MED4', 'MED5']]

回答by chrisb

@Mai's answer will of course work - it may be a bit more standard to write it like this, with the |operator.

@Mai 的答案当然会奏效-与|操作员一起这样编写它可能会更标准一些。

df['med'] = (df['MED1'] == 60735) | (df['MED1'] == 60735)

If you want to check for a value in all (or many) columns, you could also use isinas below. The isinchecks whether the value in the list is in each cell, and the any(1)returns True if any element in each row is True.

如果要检查所有(或多个)列中的值,也可以使用isin如下方法。该isin检查在列表中的值是否是在每个小区中,并且any(1)返回true如果每行中的任何元素为True。

df['med'] = df.isin([60735]).any(1)

Edit: Based on your edited question, would this work?

编辑:根据您编辑的问题,这行得通吗?

for n in drugs:
    df[n] = df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)

回答by Mai

I am still confused. But part of what you want may be this:

我还是很困惑。但你想要的部分可能是这样的:

import numpy as np
DF['med'] = np.logical_or(DF['MED1'] == 60735, DF['MED2'] == 60735)

回答by cheekybastard

Here are a few %timeitcomparisons of some methods to return bools from a dataframe column.

以下是%timeit一些从数据框列返回布尔值的方法的比较。

In [2]: %timeit df['med'] = [bool(x) if int(60735) in x else False for x in enumerate(df['MED1'])]
1000 loops, best of 3: 379 μs per loop

In [3]: %timeit df['med'] = (df['MED1'] == 60735)
1000 loops, best of 3: 649 μs per loop

In [4]: %timeit df['med'] = df['MED1'].isin([60735])
1000 loops, best of 3: 404 μs per loop