将 Pandas 数据框的多列转换为虚拟变量 - Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26092132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert multiple columns of a pandas data frame to dummy variables - Python
提问by CreamStat
I have this dataframe:
我有这个数据框:


As far as I know, to use the scikit learn package in Python for machine leaning tasks, the categorical variables should be converted to dummy variables. So, for example, using a library of scikit learn I try to convert the values of the third column to dummy values but my code didn't work:
据我所知,要使用 Python 中的 scikit 学习包进行机器学习任务,分类变量应该转换为虚拟变量。因此,例如,使用 scikit learn 库,我尝试将第三列的值转换为虚拟值,但我的代码不起作用:
from sklearn.preprocessing import LabelEncoder
x[:, 2] = LabelEncoder().fit_transform(x[:,2])
So what's wrong with my code? and How Can I convert all the categorical variables to dummy variables in my data frame?
那么我的代码有什么问题?以及如何将数据框中的所有分类变量转换为虚拟变量?
Edit: The full traceback is this :
编辑:完整的回溯是这样的:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-73-c0d726db979e> in <module>()
1 from sklearn.preprocessing import LabelEncoder
2
----> 3 x[:, 2] = LabelEncoder().fit_transform(x[:,2])
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\frame.pyc in __getitem__(self, key)
2001 # get column
2002 if self.columns.is_unique:
-> 2003 return self._get_item_cache(key)
2004
2005 # duplicate columns
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\generic.pyc in _get_item_cache(self, item)
665 return cache[item]
666 except Exception:
--> 667 values = self._data.get(item)
668 res = self._box_item_values(item, values)
669 cache[item] = res
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\internals.pyc in get(self, item)
1653 def get(self, item):
1654 if self.items.is_unique:
-> 1655 _, block = self._find_block(item)
1656 return block.get(item)
1657 else:
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\internals.pyc in _find_block(self, item)
1933
1934 def _find_block(self, item):
-> 1935 self._check_have(item)
1936 for i, block in enumerate(self.blocks):
1937 if item in block:
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\internals.pyc in _check_have(self, item)
1939
1940 def _check_have(self, item):
-> 1941 if item not in self.items:
1942 raise KeyError('no item named %s' % com.pprint_thing(item))
1943
C:\Users\toshiba\Anaconda\lib\site-packages\pandas\core\index.pyc in __contains__(self, key)
317
318 def __contains__(self, key):
--> 319 hash(key)
320 # work around some kind of odd cython bug
321 try:
TypeError: unhashable type
回答by omun
I don't think the LabelEncoderfunction transforms your data to dummy variables (see scikit-learn.org/LabelEncoder) but creates new numerical labels for the variable.
我认为该LabelEncoder函数不会将您的数据转换为虚拟变量(请参阅scikit-learn.org/LabelEncoder),而是为变量创建新的数字标签。
I use the get_dummiesfunction from pandas to do this (see pandas.pydata.org/dummies). Below a simple example.
我使用get_dummiespandas 中的函数来执行此操作(请参阅pandas.pydata.org/dummies)。下面举个简单的例子。
Create a simple DataFramewith categorical and numerical data
创建一个简单DataFrame的分类和数值数据
import pandas as pd
X = pd.DataFrame({"Var1": ["a", "a", "b"],
"Var2": ["a", "b", "c"],
"Var3": [1, 2, 3]},
dtype = "category")
X["Var3"] = X["Var3"].astype(int)
Transform data to dummy variables
将数据转换为虚拟变量
pd.get_dummies(X)
Out[4]:
出[4]:
Var3 Var1_a Var1_b Var2_a Var2_b Var2_c
0 1 1 0 1 0 0
1 2 1 0 0 1 0
2 3 0 1 0 0 1
Notice that Var1was transformed to two dummy variables, but you might want to have all three categories [a, b, c]. You will need to add the new category.
请注意,它Var1已转换为两个虚拟变量,但您可能希望拥有所有三个类别[a, b, c]。您将需要添加新类别。
X["Var1"].cat.add_categories("c", inplace=True)
And the result:
结果:
pd.get_dummies(X)
Out[6]:
出[6]:
Var3 Var1_a Var1_b Var1_c Var2_a Var2_b Var2_c
0 1 1 0 0 1 0 0
1 2 1 0 0 0 1 0
2 3 0 1 0 0 0 1
Hope this helps
希望这可以帮助

