COUNTIF 在 Pandas python 中具有多个条件的多列

Question

提问by VictorHenry

I have a dataset wherein I am trying to determine the number of risk factors per person. So I have the following data:

我有一个数据集，我试图在其中确定每个人的风险因素数量。所以我有以下数据：

Person_ID  Age  Smoker  Diabetes
      001   30       Y         N
      002   45       N         N
      003   27       N         Y
      004   18       Y         Y
      005   55       Y         Y

Each attribute (Age, Smoker, Diabetes) has its own condition to determine whether it is a risk factor. So if Age >= 45, it's a risk factor. Smoker and Diabetes are risk factors if they are "Y". What I would like is to add a column that adds up the number of risk factors for each person based on those conditions. So the data would look like this:

每个属性（年龄、吸烟者、糖尿病）都有自己的条件来确定它是否是一个风险因素。因此，如果年龄 >= 45，这是一个风险因素。吸烟者和糖尿病是“Y”的危险因素。我想要添加一个列，根据这些条件将每个人的风险因素数量相加。所以数据看起来像这样：

Person_ID  Age  Smoker  Diabetes  Risk_Factors
      001   30       Y         N             1
      002   25       N         N             0
      003   27       N         Y             1
      004   18       Y         Y             2
      005   55       Y         Y             3

I have a sample dataset that I was fooling around with in Excel, and the way I did it there was to use the COUNTIF formula like so:

我有一个示例数据集，我在 Excel 中玩弄它，我这样做的方法是使用 COUNTIF 公式，如下所示：

=COUNTIF(B2,">45") + COUNTIF(C2,"=Y") + COUNTIF(D2,"=Y")

However, the actual dataset that I will be using is way too large for Excel, so I'm learning pandas for python. I wish I could provide examples of what I've already tried, but frankly I don't even know where to start. I looked at this question, but it doesn't really address what to do about applying it to an entire new column using different conditions from multiple columns. Any suggestions?

但是，我将使用的实际数据集对于 Excel 来说太大了，所以我正在学习 Python 的 Pandas。我希望我能提供一些我已经尝试过的例子，但坦率地说，我什至不知道从哪里开始。我查看了这个问题，但它并没有真正解决如何使用来自多个列的不同条件将其应用于整个新列。有什么建议？

Answer 1

采纳答案by ZJS

If you want to stick with pandas. You can use the following...

如果你想坚持使用熊猫。您可以使用以下...

Solution

解决方案

isY = lambda x:int(x=='Y')
countRiskFactors = lambda row: isY(row['Smoker']) + isY(row['Diabetes']) + int(row["Age"]>45)

df['Risk_Factors'] = df.apply(countRiskFactors,axis=1)

How it works

这个怎么运作

isY - is a stored lambda function that checks if the value of a cell is Y returns 1 if it is otherwise 0 countRiskFactors - adds up the risk factors

isY - 是一个存储的 lambda 函数，用于检查单元格的值是否为 Y，否则返回 1，否则为 0 countRiskFactors - 将风险因素相加

the final line uses the apply method, with the paramater key set to 1, which applies the method -first parameter - row wise along the DataFrame and Returns a Series which is appended to the DataFrame.

最后一行使用 apply 方法，参数键设置为 1，它应用方法 -first 参数 - 沿 DataFrame 逐行应用并返回附加到 DataFrame 的系列。

output of print df

打印 df 的输出

   Person_ID  Age Smoker Diabetes  Risk_Factors
0          1   30      Y        N             1
1          2   45      N        N             0
2          3   27      N        Y             1
3          4   18      Y        Y             2
4          5   55      Y        Y             3

Answer 2

回答by user3846155

If you are starting from excel and want to go to the next evolution then I would recommend MS access. It will be a lot easier then learning Panda for python. You should just replace the CountIf() with:

如果您是从 excel 开始并想要进入下一个演变，那么我会推荐 MS access。比为 Python 学习 Panda 会容易得多。您应该将 CountIf() 替换为：

Risk Factor: IIF(Age>45, 1, 0) + IIF(Smoker="Y", 1, 0) + IIF(Diabetes="Y", 1, 0)

风险因素：IIF(Age>45, 1, 0) + IIF(Smoker="Y", 1, 0) + IIF(Diabetes="Y", 1, 0)

Answer 3

回答by exp1orer

I would do this the following way.

我会通过以下方式做到这一点。

For each column, create a new boolean series using the column's condition
Add those series row-wise

对于每一列，使用列的条件创建一个新的布尔系列
逐行添加这些系列

(Note that this is simpler if your Smoker and Diabetes column is already boolean (True/False) instead of in strings.)

（请注意，如果您的 Smoker 和 Diabetes 列已经是布尔值（真/假）而不是字符串，这会更简单。）

It might look like this:

它可能看起来像这样：

df = pd.DataFrame({'Age': [30,45,27,18,55],
                   'Smoker':['Y','N','N','Y','Y'],
                   'Diabetes': ['N','N','Y','Y','Y']})

   Age Diabetes Smoker
0   30        N      Y
1   45        N      N
2   27        Y      N
3   18        Y      Y
4   55        Y      Y

#Step 1
risk1 = df.Age > 45
risk2 = df.Smoker == "Y"
risk3 = df.Diabetes == "Y"
risk_df = pd.concat([risk1,risk2,risk3],axis=1)

     Age Smoker Diabetes
0  False   True    False
1  False  False    False
2  False  False     True
3  False   True     True
4   True   True     True

df['Risk_Factors'] = risk_df.sum(axis=1)

   Age Diabetes Smoker  Risk_Factors
0   30        N      Y             1
1   45        N      N             0
2   27        Y      N             1
3   18        Y      Y             2
4   55        Y      Y             3

COUNTIF 在 Pandas python 中具有多个条件的多列

提问by VictorHenry

采纳答案by ZJS

Solution

解决方案

回答by user3846155

回答by exp1orer

相关推荐

最近更新

标签

COUNTIF 在 Pandas python 中具有多个条件的多列

提问by VictorHenry

采纳答案by ZJS

Solution

解决方案

回答by user3846155

回答by exp1orer

相关推荐

Python Django：AppRegistryNotReady()

Python DRF：带有嵌套序列化程序的简单外键分配？

Python 获取所有子元素

selenium python send_key 错误：列表对象没有属性

相关推荐

最近更新

标签