pandas 如何在 Python 中创建一个条件低于或高于中位数的虚拟变量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36637011/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I create a dummy variable in Python with a condition below or above median?
提问by jul094
How can I create binary dummy variables in Python which takes the value of 0
when a person's salary is below the median salary level and is set to 1
otherwise? I don't understand how to make it when salary above or below.
如何在 Python 中创建二进制虚拟变量,0
当一个人的工资低于工资中位数时取值,1
否则设置为其他值?我不明白在工资高于或低于的情况下如何做到这一点。
I tried this
我试过这个
df['Salary'] = (df['Salary'] > df['Salary'].median()) & (df['Salary'] < df['Salary'].median())
But there is no output.
但是没有输出。
Before that I tried this:
在此之前我试过这个:
df['Salary'].median()
df_Salary = pd.get_dummies(df['Salary'].median())
df_new = pd.concat([df, df_Salary], axis=1)
df_new
And got this
得到了这个
Gender Exp Salary 74000.0
0 Female 15 78200 1
1 Female 12 66400 NaN
2 Female 3 6000 NaN
...
回答by Alexander
You can coerce a boolean to an int by just multiplying it by one:
您可以通过将布尔值乘以 1 将布尔值强制转换为 int:
df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()) * 1
回答by DSM
You can do a vectorized comparison and convert the result to an int:
您可以进行矢量化比较并将结果转换为 int:
>>> df["Median_Compare"] = (df["Salary"] >= df["Salary"].median()).astype(int)
>>> df
Gender Exp Salary Median_Compare
0 Female 15 78200 1
1 Female 12 66400 0
2 Female 3 6000 0
This works because we have
这是有效的,因为我们有
>>> df["Salary"].median()
66400.0
>>> df["Salary"] >= df["Salary"].median()
0 True
1 False
2 False
Name: Salary, dtype: bool
>>> (df["Salary"] >= df["Salary"].median()).astype(int)
0 1
1 0
2 0
Name: Salary, dtype: int32
To make the ternary approaches work (X if (condition) else Y), you'd need to apply
it because they don't play nicely with arrays, which don't have an unambiguous truth value.
为了使三元方法起作用(X if (condition) else Y),您需apply
要这样做,因为它们不能很好地与没有明确真值的数组一起使用。
回答by zephyr
I think you want something like this (using your notation and variable names).
我想你想要这样的东西(使用你的符号和变量名)。
df['Salary'] = 0 if df['Salary'] < df['Salary'].median() else 1
This works exactly like it reads. It says df['Salary']
will be zero if the salary is less than the median, otherwise make it one. For reference, this type of statement is known as a ternary operator.
这和它读起来的完全一样。它表示df['Salary']
如果工资低于中位数则为零,否则设为一。作为参考,这种类型的语句称为三元运算符。
回答by Daniel Gale
This is just using a basic conditional and storing the variable.
这只是使用基本条件并存储变量。
median = 30500
salary = 50000
median_flag = 1 if salary > median else 0
print median_flag
1