如何使用基于集合的逻辑将计算列添加到 Pandas 数据框？

Question

提问by Sevyns

I'm new to python (background in T-SQL and R) and looking for a set-based method for adding a calculated column to a Pandas Dataframe. R and T-SQL have simple implementations for this concept, but I've yet to find a solution to this in Python.

我是 python 的新手（T-SQL 和 R 的背景），正在寻找一种基于集合的方法来将计算列添加到 Pandas 数据框。R 和 T-SQL 对这个概念有简单的实现，但我还没有在 Python 中找到解决方案。

This questionis an iterative approach to what I'm looking for. I'm looking for something more set-based, and have yet to find a solution.

这个问题是我正在寻找的迭代方法。我正在寻找更基于集合的东西，但尚未找到解决方案。

Here is an example from R:

这是来自 R 的示例：

# New column that shows if the value in column A is greater than the value in column B

myDataFrame$CalculatedColumn = ifelse(myDataFrame$columnA > myDataFrame$columnB,TRUE,FALSE)

This statement will add a new calculated column without requiring row-by-row evaluation code.

此语句将添加一个新的计算列，而无需逐行计算代码。

Does Python (or any Python packages) support a concept like this? Or is the most practical solution way to call iterrows() in a for loop?

Python（或任何 Python 包）是否支持这样的概念？或者是在 for 循环中调用 iterrows() 的最实用的解决方法？

Let me know if any clarifications are needed - and thanks for the help!

如果需要任何说明，请告诉我 - 感谢您的帮助！

Answer 1

回答by ayhan

You can use np.wherefor that:

你可以使用np.where：

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns = ["A", "B"])
df
Out[46]: 
   A  B
0  2  8
1  9  5
2  4  4
3  6  0
4  5  5
5  0  8
6  7  9
7  6  3
8  0  9
9  0  9

df["C"] = np.where(df["A"] > df["B"], True, False)
df
Out[48]: 
   A  B      C
0  2  8  False
1  9  5   True
2  4  4  False
3  6  0   True
4  5  5  False
5  0  8  False
6  7  9  False
7  6  3   True
8  0  9  False
9  0  9  False

Answer 2

回答by Alexander

You should simply be able to do a direct comparison.

您应该能够进行直接比较。

df['C'] = df.A > df.B

Answer 3

回答by flyingmeatball

There are a couple ways to do this in Pandas. If indeed there are only 2 values (True or False) then you might be best off just splitting it into two lines like so:

在 Pandas 中有几种方法可以做到这一点。如果确实只有 2 个值（True 或 False），那么您最好将它分成两行，如下所示：

df['newCol'] = False
df.loc[df['colA'] > df['colB'],'newCol'] = True

Typically, I try any way I can to not do iterrows. It's very slow.

通常，我会尽我所能不做 iterrows。这是非常缓慢的。

Hope that helps.

希望有帮助。

如何使用基于集合的逻辑将计算列添加到 Pandas 数据框？

提问by Sevyns

回答by ayhan

回答by Alexander

回答by flyingmeatball

相关推荐

最近更新

标签

如何使用基于集合的逻辑将计算列添加到 Pandas 数据框？

提问by Sevyns

回答by ayhan

回答by Alexander

回答by flyingmeatball

相关推荐

pandas 计算 DateTimeIndex 的时差

在 Pandas 中使用多处理读取 csv 文件的最简单方法

Pandas 字符串替换

将稀疏矩阵 (csc_matrix) 转换为 Pandas 数据帧

相关推荐

最近更新

标签