pandas 如何在 Python 中创建一个条件低于或高于中位数的虚拟变量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36637011/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:03:09  来源:igfitidea点击:

How can I create a dummy variable in Python with a condition below or above median?

pythonpandasdummy-variable

提问by jul094

How can I create binary dummy variables in Python which takes the value of 0when a person's salary is below the median salary level and is set to 1otherwise? I don't understand how to make it when salary above or below.

如何在 Python 中创建二进制虚拟变量,0当一个人的工资低于工资中位数时取值,1否则设置为其他值?我不明白在工资高于或低于的情况下如何做到这一点。

I tried this

我试过这个

df['Salary'] = (df['Salary'] > df['Salary'].median()) & (df['Salary'] < df['Salary'].median())

But there is no output.

但是没有输出。

Before that I tried this:

在此之前我试过这个:

df['Salary'].median()
df_Salary = pd.get_dummies(df['Salary'].median())
df_new = pd.concat([df, df_Salary], axis=1)
df_new

And got this

得到了这个

    Gender  Exp Salary  74000.0

0   Female  15  78200   1
1   Female  12  66400   NaN
2   Female  3   6000    NaN
...

回答by Alexander

You can coerce a boolean to an int by just multiplying it by one:

您可以通过将布尔值乘以 1 将布尔值强制转换为 int:

df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()) * 1

回答by DSM

You can do a vectorized comparison and convert the result to an int:

您可以进行矢量化比较并将结果转换为 int:

>>> df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()).astype(int)
>>> df
   Gender  Exp  Salary  Median_Compare
0  Female   15   78200               1
1  Female   12   66400               0
2  Female    3    6000               0

This works because we have

这是有效的,因为我们有

>>> df["Salary"].median()
66400.0
>>> df["Salary"] >= df["Salary"].median()
0     True
1    False
2    False
Name: Salary, dtype: bool
>>> (df["Salary"] >= df["Salary"].median()).astype(int)
0    1
1    0
2    0
Name: Salary, dtype: int32

To make the ternary approaches work (X if (condition) else Y), you'd need to applyit because they don't play nicely with arrays, which don't have an unambiguous truth value.

为了使三元方法起作用(X if (condition) else Y),您需apply要这样做,因为它们不能很好地与没有明确真值的数组一起使用。

回答by zephyr

I think you want something like this (using your notation and variable names).

我想你想要这样的东西(使用你的符号和变量名)。

df['Salary'] = 0 if df['Salary'] < df['Salary'].median() else 1

This works exactly like it reads. It says df['Salary']will be zero if the salary is less than the median, otherwise make it one. For reference, this type of statement is known as a ternary operator.

这和它读起来的完全一样。它表示df['Salary']如果工资低于中位数则为零,否则设为一。作为参考,这种类型的语句称为三元运算符

回答by Daniel Gale

This is just using a basic conditional and storing the variable.

这只是使用基本条件并存储变量。

median = 30500
salary = 50000
median_flag = 1 if salary > median else 0
print median_flag
1