Python 如何通过pandas get_dummies() 方法为某些列创建虚拟对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37265312/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:08:47  来源:igfitidea点击:

how to create dummies for certain columns by pandas get_dummies() method?

pythonpandas

提问by Hyman

df = pd.DataFrame({'A': ['x', 'y', 'x'], 'B': ['z', 'u', 'z'],
                  'C': ['1', '2', '3'],
                  'D':['j', 'l', 'j']})

I just want Column A and D to get dummies not for Column B. If I used pd.get_dummies(df), all columns turned into dummies.

我只是想让 A 列和 D 列得到假人而不是 B 列。如果我使用pd.get_dummies(df),所有列都变成了假人。

I want the final result containing all of columns , which means column C and column B exit,like 'A_x','A_y','B','C','D_j','D_l'.

我想要包含所有列的最终结果,这意味着列 C 和列 B 退出,如'A_x','A_y','B','C','D_j','D_l'.

回答by knagaev

It can be done without concatenation, using get_dummies() with required parameters

它可以在没有连接的情况下完成,使用带有所需参数的 get_dummies()

In [294]: pd.get_dummies(df, prefix=['A', 'D'], columns=['A', 'D'])
Out[294]: 
   B  C  A_x  A_y  D_j  D_l
0  z  1  1.0  0.0  1.0  0.0
1  u  2  0.0  1.0  0.0  1.0
2  z  3  1.0  0.0  1.0  0.0

回答by Patric Fulop

Adding to the above perfect answers, in case you have a big dataset with lots of attributes, if you don't want to specify by hand all of the dummies you want, you can do set differences:

添加到上述完美答案中,如果您有一个包含大量属性的大型数据集,如果您不想手动指定您想要的所有虚拟对象,您可以设置差异:

len(df.columns) = 50
non_dummy_cols = ['A','B','C'] 
# Takes all 47 other columns
dummy_cols = list(set(df.columns) - set(non_dummy_cols))
df = pd.get_dummies(df, columns=dummy_cols)

回答by Stefan

Just select the two columns you want to .get_dummies()for - columnnames indicate source column and variable label represented as binary variable, and pd.concat()the original columns you want unchanged:

只需选择您想要的两列.get_dummies()-column名称表示源列和表示为二进制变量的变量标签,以及pd.concat()您希望保持不变的原始列:

pd.concat([pd.get_dummies(df[['A', 'D']]), df[['B', 'C']]], axis=1)

   A_x  A_y  D_j  D_l  B  C
0  1.0  0.0  1.0  0.0  z  1
1  0.0  1.0  0.0  1.0  u  2
2  1.0  0.0  1.0  0.0  z  3