如何使用基于集合的逻辑将计算列添加到 Pandas 数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36749741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:05:18  来源:igfitidea点击:

How do you add a calculated column to a pandas dataframe using set-based logic?

pythonpandas

提问by Sevyns

I'm new to python (background in T-SQL and R) and looking for a set-based method for adding a calculated column to a Pandas Dataframe. R and T-SQL have simple implementations for this concept, but I've yet to find a solution to this in Python.

我是 python 的新手(T-SQL 和 R 的背景),正在寻找一种基于集合的方法来将计算列添加到 Pandas 数据框。R 和 T-SQL 对这个概念有简单的实现,但我还没有在 Python 中找到解决方案。

This questionis an iterative approach to what I'm looking for. I'm looking for something more set-based, and have yet to find a solution.

这个问题是我正在寻找的迭代方法。我正在寻找更基于集合的东西,但尚未找到解决方案。

Here is an example from R:

这是来自 R 的示例:

# New column that shows if the value in column A is greater than the value in column B

myDataFrame$CalculatedColumn = ifelse(myDataFrame$columnA > myDataFrame$columnB,TRUE,FALSE)

This statement will add a new calculated column without requiring row-by-row evaluation code.

此语句将添加一个新的计算列,而无需逐行计算代码。

Does Python (or any Python packages) support a concept like this? Or is the most practical solution way to call iterrows() in a for loop?

Python(或任何 Python 包)是否支持这样的概念?或者是在 for 循环中调用 iterrows() 的最实用的解决方法?

Let me know if any clarifications are needed - and thanks for the help!

如果需要任何说明,请告诉我 - 感谢您的帮助!

回答by ayhan

You can use np.wherefor that:

你可以使用np.where

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns = ["A", "B"])
df
Out[46]: 
   A  B
0  2  8
1  9  5
2  4  4
3  6  0
4  5  5
5  0  8
6  7  9
7  6  3
8  0  9
9  0  9


df["C"] = np.where(df["A"] > df["B"], True, False)
df
Out[48]: 
   A  B      C
0  2  8  False
1  9  5   True
2  4  4  False
3  6  0   True
4  5  5  False
5  0  8  False
6  7  9  False
7  6  3   True
8  0  9  False
9  0  9  False

回答by Alexander

You should simply be able to do a direct comparison.

您应该能够进行直接比较。

df['C'] = df.A > df.B

回答by flyingmeatball

There are a couple ways to do this in Pandas. If indeed there are only 2 values (True or False) then you might be best off just splitting it into two lines like so:

在 Pandas 中有几种方法可以做到这一点。如果确实只有 2 个值(True 或 False),那么您最好将它分成两行,如下所示:

df['newCol'] = False
df.loc[df['colA'] > df['colB'],'newCol'] = True

Typically, I try any way I can to not do iterrows. It's very slow.

通常,我会尽我所能不做 iterrows。这是非常缓慢的。

Hope that helps.

希望有帮助。