将 Pandas Dataframe 列转换为一个热标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47127388/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting a Pandas Dataframe column into one hot labels
提问by Nir_J
I have a pandas dataframe similar to this:
我有一个与此类似的Pandas数据框:
Col1 ABC
0 XYZ A
1 XYZ B
2 XYZ C
By using the pandas get_dummies()
function on column ABC, I can get this:
通过get_dummies()
在 ABC 列上使用 pandas函数,我可以得到:
Col1 A B C
0 XYZ 1 0 0
1 XYZ 0 1 0
2 XYZ 0 0 1
While I need something like this, where the ABC column has a list / array
datatype:
虽然我需要这样的东西,其中 ABC 列有一个list / array
数据类型:
Col1 ABC
0 XYZ [1,0,0]
1 XYZ [0,1,0]
2 XYZ [0,0,1]
I tried using the get_dummies
function and then combining all the columns into the column which I wanted. I found lot of answers explaining how to combine multiple columns as strings, like this: Combine two columns of text in dataframe in pandas/python. But I cannot figure out a way to combine them as a list.
我尝试使用该get_dummies
函数,然后将所有列组合到我想要的列中。我找到了很多解释如何将多列组合为字符串的答案,如下所示:在 pandas/python 的数据框中组合两列文本。但我无法想出一种方法将它们组合为一个列表。
This question introduced the idea of using sklearn's OneHotEncoder
, but I couldn't get it to work. How do I one-hot encode one column of a pandas dataframe?
这个问题介绍了使用 sklearn's 的想法OneHotEncoder
,但我无法让它工作。如何对Pandas数据帧的一列进行一次性编码?
One more thing: All the answers I came across had solutions where the column names had to be manually typed while combining them. Is there a way to use Dataframe.iloc()
or splicing mechanism to combine columns into a list?
还有一件事:我遇到的所有答案都有解决方案,其中必须在组合列名时手动键入它们。有没有办法使用Dataframe.iloc()
或拼接机制将列组合成一个列表?
采纳答案by MaxU
Here is an example of using sklearn.preprocessing.LabelBinarizer:
这是使用sklearn.preprocessing.LabelBinarizer的示例:
In [361]: from sklearn.preprocessing import LabelBinarizer
In [362]: lb = LabelBinarizer()
In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()
In [364]: df
Out[364]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]
Pandas alternative:
Pandas替代品:
In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()
In [371]: df
Out[371]:
Col1 ABC new
0 XYZ A [1, 0, 0]
1 XYZ B [0, 1, 0]
2 XYZ C [0, 0, 1]
回答by andrew_reece
You can just use tolist()
:
你可以只使用tolist()
:
df['ABC'] = pd.get_dummies(df.ABC).values.tolist()
Col1 ABC
0 XYZ [1, 0, 0]
1 XYZ [0, 1, 0]
2 XYZ [0, 0, 1]
回答by juanpa.arrivillaga
If you have a pd.DataFrame like this:
如果您有这样的 pd.DataFrame:
>>> df
Col1 A B C
0 XYZ 1 0 0
1 XYZ 0 1 0
2 XYZ 0 0 1
You can always do something like this:
你总是可以做这样的事情:
>>> df.apply(lambda s: list(s[1:]), axis=1)
0 [1, 0, 0]
1 [0, 1, 0]
2 [0, 0, 1]
dtype: object
Note, this is essentially a for-loop on the rows. Note, columns do nothave list
data-types, they must be object
, which will make your data-frame operations not able to take advantage of the speed benefits of numpy
.
请注意,这本质上是行上的 for 循环。注意,列别不具备list
的数据类型,它们必须是object
,这将使你的数据帧的动作不能够采取的速度好处numpy
。
回答by Spandyie
if you have a data-frame df
with categorical column ABC
then you could use to create a new column of one-hot vectors
如果您有一个df
带有分类列的数据框,ABC
那么您可以使用它来创建一个新的单热向量列
df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())