检查 Pandas 数据框列列表中的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25025621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
check for value in list of Pandas data frame columns
提问by Vincent
I have a pandas dataframe data that looks like this
我有一个看起来像这样的Pandas数据框数据
MED1 MED2 MED3 MED4 MED5
0 60735 24355 33843 16475 9995
1 10126 5789 17165 90000 90000
2 5789 19675 30553 90000 90000
3 60735 17865 34495 90000 90000
4 19675 5810 90000 90000 90000
?I want to create a new bool column "med" that has True/False based on ?60735 in the columns MED1...MED5 I am trying this and am not sure how to make it work.
?我想创建一个新的布尔列“med”,它基于 MED1...MED5 列中的 ?60735 具有 True/False 我正在尝试此操作,但不确定如何使其工作。
DF['med'] = (60735 in [DF['MED1'], DF['MED2']])
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
MED1..MED5 represent drugs being taken by a patient at a hospital visit. I have a list of about 20 drugs for which I need to know if the patien was taking them. Each drug is coded with a number but has a name. A nice solution would look something like (below) but how do I do this with pandas.
MED1..MED5 代表患者在医院就诊时服用的药物。我有一份大约 20 种药物的清单,我需要知道患者是否正在服用这些药物。每种药物都用一个数字编码,但有一个名称。一个不错的解决方案看起来像(如下),但我如何用Pandas来做到这一点。
drugs = {'drug1':60735, 'drug2':5789}
for n in drugs.keys():
DF[n] = drugs[n] in DF[['MED1', 'MED2', 'MED3', 'MED4', 'MED5']]
回答by chrisb
@Mai's answer will of course work - it may be a bit more standard to write it like this, with the |operator.
@Mai 的答案当然会奏效-与|操作员一起这样编写它可能会更标准一些。
df['med'] = (df['MED1'] == 60735) | (df['MED1'] == 60735)
If you want to check for a value in all (or many) columns, you could also use isinas below. The isinchecks whether the value in the list is in each cell, and the any(1)returns True if any element in each row is True.
如果要检查所有(或多个)列中的值,也可以使用isin如下方法。该isin检查在列表中的值是否是在每个小区中,并且any(1)返回true如果每行中的任何元素为True。
df['med'] = df.isin([60735]).any(1)
Edit: Based on your edited question, would this work?
编辑:根据您编辑的问题,这行得通吗?
for n in drugs:
df[n] = df[['MED1','MED2','MED3','MED4','MED5']].isin([drugs[n]]).any(1)
回答by Mai
I am still confused. But part of what you want may be this:
我还是很困惑。但你想要的部分可能是这样的:
import numpy as np
DF['med'] = np.logical_or(DF['MED1'] == 60735, DF['MED2'] == 60735)
回答by cheekybastard
Here are a few %timeitcomparisons of some methods to return bools from a dataframe column.
以下是%timeit一些从数据框列返回布尔值的方法的比较。
In [2]: %timeit df['med'] = [bool(x) if int(60735) in x else False for x in enumerate(df['MED1'])]
1000 loops, best of 3: 379 μs per loop
In [3]: %timeit df['med'] = (df['MED1'] == 60735)
1000 loops, best of 3: 649 μs per loop
In [4]: %timeit df['med'] = df['MED1'].isin([60735])
1000 loops, best of 3: 404 μs per loop

