pandas 从熊猫中的单个字符串列创建新的二进制列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22621716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:50:51  来源:igfitidea点击:

Creating new binary columns from single string column in pandas

pythonpandas

提问by user1610719

I've seen this before and simply can't remember the function.

我以前见过这个,只是不记得这个功能。

Say I have a column "Speed" and each row has 1 of these values:

假设我有一列“速度”,每一行都有以下值之一:

'Slow', 'Normal', 'Fast'

How do I create a new dataframe with all my rows except the column "Speed" which is now 3 columns: "Slow" "Normal" and "Fast" which has all of my rows labeled with a 1 in whichever column the old "Speed" column was. So if I had:

如何使用除“速度”列之外的所有行创建新数据框,该列现在有 3 列:“慢”、“正常”和“快”,其中我的所有行在旧的“速度”列中都标有 1 " 列了。所以如果我有:

print df['Speed'].ix[0]
> 'Normal'

I would not expect this:

我不希望这样:

print df['Normal'].ix[0]
>1

print df['Slow'].ix[0]
>0

回答by joris

You can do this easily with pd.get_dummies(docs):

您可以使用pd.get_dummies( docs)轻松完成此操作:

In [37]: df = pd.DataFrame(['Slow', 'Normal', 'Fast', 'Slow'], columns=['Speed'])

In [38]: df
Out[38]:
    Speed
0    Slow
1  Normal
2    Fast
3    Slow

In [39]: pd.get_dummies(df['Speed'])
Out[39]:
   Fast  Normal  Slow
0     0       0     1
1     0       1     0
2     1       0     0
3     0       0     1

回答by aha

Here is one solution:

这是一种解决方案:

df['Normal'] = df.Speed.apply(lambda x: 1 if x == "Normal" else 0)
df['Slow'] = df.Speed.apply(lambda x: 1 if x == "Slow" else 0)
df['Fast'] = df.Speed.apply(lambda x: 1 if x == "Fast" else 0)

回答by sun

This has another method:

这还有一个方法:

df           = pd.DataFrame(['Slow','Fast','Normal','Normal'],columns=['Speed'])
df['Normal'] = np.where(df['Speed'] == 'Normal', 1 ,0)
df['Fast']   = np.where(df['Speed'] == 'Fast', 1 ,0)
df['Slow']   = np.where(df['Speed'] == 'Slow', 1 ,0)

df 
     Speed  Normal  Fast  Slow
0    Slow       0     0     1
1    Fast       0     1     0
2  Normal       1     0     0
3  Normal       1     0     1