将 Pandas Dataframe 列转换为一个热标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47127388/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:44:02  来源:igfitidea点击:

Converting a Pandas Dataframe column into one hot labels

pythonpandassklearn-pandasone-hot-encoding

提问by Nir_J

I have a pandas dataframe similar to this:

我有一个与此类似的Pandas数据框:

  Col1   ABC
0  XYZ    A
1  XYZ    B
2  XYZ    C

By using the pandas get_dummies()function on column ABC, I can get this:

通过get_dummies()在 ABC 列上使用 pandas函数,我可以得到:

  Col1   A   B   C
0  XYZ   1   0   0
1  XYZ   0   1   0
2  XYZ   0   0   1

While I need something like this, where the ABC column has a list / arraydatatype:

虽然我需要这样的东西,其中 ABC 列有一个list / array数据类型:

  Col1    ABC
0  XYZ    [1,0,0]
1  XYZ    [0,1,0]
2  XYZ    [0,0,1]

I tried using the get_dummiesfunction and then combining all the columns into the column which I wanted. I found lot of answers explaining how to combine multiple columns as strings, like this: Combine two columns of text in dataframe in pandas/python. But I cannot figure out a way to combine them as a list.

我尝试使用该get_dummies函数,然后将所有列组合到我想要的列中。我找到了很多解释如何将多列组合为字符串的答案,如下所示:在 pandas/python 的数据框中组合两列文本。但我无法想出一种方法将它们组合为一个列表。

This question introduced the idea of using sklearn's OneHotEncoder, but I couldn't get it to work. How do I one-hot encode one column of a pandas dataframe?

这个问题介绍了使用 sklearn's 的想法OneHotEncoder,但我无法让它工作。如何对Pandas数据帧的一列进行一次性编码?

One more thing: All the answers I came across had solutions where the column names had to be manually typed while combining them. Is there a way to use Dataframe.iloc()or splicing mechanism to combine columns into a list?

还有一件事:我遇到的所有答案都有解决方案,其中必须在组合列名时手动键入它们。有没有办法使用Dataframe.iloc()或拼接机制将列组合成一个列表?

采纳答案by MaxU

Here is an example of using sklearn.preprocessing.LabelBinarizer:

这是使用sklearn.preprocessing.LabelBinarizer的示例:

In [361]: from sklearn.preprocessing import LabelBinarizer

In [362]: lb = LabelBinarizer()

In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()

In [364]: df
Out[364]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Pandas alternative:

Pandas替代品:

In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()

In [371]: df
Out[371]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

回答by andrew_reece

You can just use tolist():

你可以只使用tolist()

df['ABC'] = pd.get_dummies(df.ABC).values.tolist()

  Col1        ABC
0  XYZ  [1, 0, 0]
1  XYZ  [0, 1, 0]
2  XYZ  [0, 0, 1]

回答by juanpa.arrivillaga

If you have a pd.DataFrame like this:

如果您有这样的 pd.DataFrame:

>>> df
  Col1  A  B  C
0  XYZ  1  0  0
1  XYZ  0  1  0
2  XYZ  0  0  1

You can always do something like this:

你总是可以做这样的事情:

>>> df.apply(lambda s: list(s[1:]), axis=1)
0    [1, 0, 0]
1    [0, 1, 0]
2    [0, 0, 1]
dtype: object

Note, this is essentially a for-loop on the rows. Note, columns do nothave listdata-types, they must be object, which will make your data-frame operations not able to take advantage of the speed benefits of numpy.

请注意,这本质上是行上的 for 循环。注意,列别具备list的数据类型,它们必须是object,这将使你的数据帧的动作不能够采取的速度好处numpy

回答by Spandyie

if you have a data-frame dfwith categorical column ABCthen you could use to create a new column of one-hot vectors

如果您有一个df带有分类列的数据框,ABC那么您可以使用它来创建一个新的单热向量列

df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())