将“pandas.get_dummies”转换应用到新数据的简单方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28465633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:56:27  来源:igfitidea点击:

Easy way to apply transformation from `pandas.get_dummies` to new data?

pythonpandas

提问by Ellis Valentiner

Suppose I have a data frame datawith strings that I want converted to indicators. I use pandas.get_dummies(data)to convert this to a dataset that I can now use for building a model.

假设我有一个data包含要转换为指标的字符串的数据框。我pandas.get_dummies(data)过去常常将其转换为现在可用于构建模型的数据集。

Now I have a single new observation that I want to run through my model. Obviously I can't use pandas.get_dummies(new_data)because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?

现在我有一个新的观察结果,我想运行我的模型。显然我不能使用,pandas.get_dummies(new_data)因为它不包含所有类并且不会制作相同的指标矩阵。有没有好的方法可以做到这一点?

回答by JAB

you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:

您可以从单个新观察中创建虚拟变量,然后使用原始指标矩阵中的列重新索引此框架列:

import pandas as pd
df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]})
df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]}))
dummies_frame = pd.get_dummies(df)
df1.reindex(columns = dummies_frame.columns, fill_value=0)

returns:

返回:

        val     cat_a   cat_b   cat_c   cat_d
  0     1       1       0       0       0