将“pandas.get_dummies”转换应用到新数据的简单方法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28465633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Easy way to apply transformation from `pandas.get_dummies` to new data?
提问by Ellis Valentiner
Suppose I have a data frame datawith strings that I want converted to indicators. I use pandas.get_dummies(data)to convert this to a dataset that I can now use for building a model.
假设我有一个data包含要转换为指标的字符串的数据框。我pandas.get_dummies(data)过去常常将其转换为现在可用于构建模型的数据集。
Now I have a single new observation that I want to run through my model. Obviously I can't use pandas.get_dummies(new_data)because it doesn't contain all of the classes and won't make the same indicator matrices. Is there a good way to do this?
现在我有一个新的观察结果,我想运行我的模型。显然我不能使用,pandas.get_dummies(new_data)因为它不包含所有类并且不会制作相同的指标矩阵。有没有好的方法可以做到这一点?
回答by JAB
you can create the dummies from the single new observation, and then reindex this frames columns using the columns from the original indicator matrix:
您可以从单个新观察中创建虚拟变量,然后使用原始指标矩阵中的列重新索引此框架列:
import pandas as pd
df = pd.DataFrame({'cat':['a','b','c','d'],'val':[1,2,5,10]})
df1 = pd.get_dummies(pd.DataFrame({'cat':['a'],'val':[1]}))
dummies_frame = pd.get_dummies(df)
df1.reindex(columns = dummies_frame.columns, fill_value=0)
returns:
返回:
val cat_a cat_b cat_c cat_d
0 1 1 0 0 0

