pandas 根据其他列的条件在pandas中创建一个新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43160484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Making a new column in pandas based on conditions of other columns
提问by nickm
I would like to make a new column based on an if statement that has conditionals of two or more other columns in a dataframe.
我想根据 if 语句创建一个新列,该语句具有数据框中两个或多个其他列的条件。
For example, column3 = True if (column1 < 10.0) and (column2 > 0.0).
例如,column3 = True if (column1 < 10.0) and (column2 > 0.0)。
I have looked around and it seems that other have used the apply method with a lambda function, but i am a bit of a novice on these.
我环顾四周,似乎其他人使用了带有 lambda 函数的 apply 方法,但我在这些方面有点新手。
I suppose i could make two additional columns that makes that row a 1 if the condition is met for each column, then sum the columns to check if all conditions are met, but this seems a bit inelegant.
我想我可以创建两个额外的列,如果每列都满足条件,则使该行成为 1,然后对列求和以检查是否满足所有条件,但这似乎有点不雅。
If you provide an answer with apply/lambda, let's suppose the dataframe is called sample_df and the columns are col1, col2, and col3.
如果您使用 apply/lambda 提供答案,假设数据框名为 sample_df,列是 col1、col2 和 col3。
Thanks so much!
非常感谢!
采纳答案by pansen
You can use eval
here for short:
您可以eval
在此处简称:
# create some dummy data
df = pd.DataFrame(np.random.randint(0, 10, size=(5, 2)),
columns=["col1", "col2"])
print(df)
col1 col2
0 1 7
1 2 3
2 4 6
3 2 5
4 5 4
df["col3"] = df.eval("col1 < 5 and col2 > 5")
print(df)
col1 col2 col3
0 1 7 True
1 2 3 False
2 4 6 True
3 2 5 False
4 5 4 False
You can also write it without eval via (df["col1"] < 5) & (df["col2"] > 5)
.
您也可以在没有 eval 的情况下通过(df["col1"] < 5) & (df["col2"] > 5)
.
You may also enhance the example with np.where
to explicitly set the values for the positiveand negativecases right away:
您还可以通过np.where
立即显式设置正面和负面案例的值来增强示例:
df["col4"] = np.where(df.eval("col1 < 5 and col2 > 5"), "Positive Value", "Negative Value")
print(df)
col1 col2 col3 col4
0 1 7 True Positive Value
1 2 3 False Negative Value
2 4 6 True Positive Value
3 2 5 False Negative Value
4 5 4 False Negative Value