如何使用基于集合的逻辑将计算列添加到 Pandas 数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36749741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you add a calculated column to a pandas dataframe using set-based logic?
提问by Sevyns
I'm new to python (background in T-SQL and R) and looking for a set-based method for adding a calculated column to a Pandas Dataframe. R and T-SQL have simple implementations for this concept, but I've yet to find a solution to this in Python.
我是 python 的新手(T-SQL 和 R 的背景),正在寻找一种基于集合的方法来将计算列添加到 Pandas 数据框。R 和 T-SQL 对这个概念有简单的实现,但我还没有在 Python 中找到解决方案。
This questionis an iterative approach to what I'm looking for. I'm looking for something more set-based, and have yet to find a solution.
这个问题是我正在寻找的迭代方法。我正在寻找更基于集合的东西,但尚未找到解决方案。
Here is an example from R:
这是来自 R 的示例:
# New column that shows if the value in column A is greater than the value in column B
myDataFrame$CalculatedColumn = ifelse(myDataFrame$columnA > myDataFrame$columnB,TRUE,FALSE)
This statement will add a new calculated column without requiring row-by-row evaluation code.
此语句将添加一个新的计算列,而无需逐行计算代码。
Does Python (or any Python packages) support a concept like this? Or is the most practical solution way to call iterrows() in a for loop?
Python(或任何 Python 包)是否支持这样的概念?或者是在 for 循环中调用 iterrows() 的最实用的解决方法?
Let me know if any clarifications are needed - and thanks for the help!
如果需要任何说明,请告诉我 - 感谢您的帮助!
回答by ayhan
You can use np.where
for that:
你可以使用np.where
:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(10,2)), columns = ["A", "B"])
df
Out[46]:
A B
0 2 8
1 9 5
2 4 4
3 6 0
4 5 5
5 0 8
6 7 9
7 6 3
8 0 9
9 0 9
df["C"] = np.where(df["A"] > df["B"], True, False)
df
Out[48]:
A B C
0 2 8 False
1 9 5 True
2 4 4 False
3 6 0 True
4 5 5 False
5 0 8 False
6 7 9 False
7 6 3 True
8 0 9 False
9 0 9 False
回答by Alexander
You should simply be able to do a direct comparison.
您应该能够进行直接比较。
df['C'] = df.A > df.B
回答by flyingmeatball
There are a couple ways to do this in Pandas. If indeed there are only 2 values (True or False) then you might be best off just splitting it into two lines like so:
在 Pandas 中有几种方法可以做到这一点。如果确实只有 2 个值(True 或 False),那么您最好将它分成两行,如下所示:
df['newCol'] = False
df.loc[df['colA'] > df['colB'],'newCol'] = True
Typically, I try any way I can to not do iterrows. It's very slow.
通常,我会尽我所能不做 iterrows。这是非常缓慢的。
Hope that helps.
希望有帮助。