将 Pandas Dataframe 列转换为一个热标签

Question

提问by Nir_J

I have a pandas dataframe similar to this:

我有一个与此类似的Pandas数据框：

  Col1   ABC
0  XYZ    A
1  XYZ    B
2  XYZ    C

By using the pandas get_dummies()function on column ABC, I can get this:

通过get_dummies()在 ABC 列上使用 pandas函数，我可以得到：

  Col1   A   B   C
0  XYZ   1   0   0
1  XYZ   0   1   0
2  XYZ   0   0   1

While I need something like this, where the ABC column has a list / arraydatatype:

虽然我需要这样的东西，其中 ABC 列有一个list / array数据类型：

  Col1    ABC
0  XYZ    [1,0,0]
1  XYZ    [0,1,0]
2  XYZ    [0,0,1]

I tried using the get_dummiesfunction and then combining all the columns into the column which I wanted. I found lot of answers explaining how to combine multiple columns as strings, like this: Combine two columns of text in dataframe in pandas/python. But I cannot figure out a way to combine them as a list.

我尝试使用该get_dummies函数，然后将所有列组合到我想要的列中。我找到了很多解释如何将多列组合为字符串的答案，如下所示：在 pandas/python 的数据框中组合两列文本。但我无法想出一种方法将它们组合为一个列表。

This question introduced the idea of using sklearn's OneHotEncoder, but I couldn't get it to work. How do I one-hot encode one column of a pandas dataframe?

这个问题介绍了使用 sklearn's 的想法OneHotEncoder，但我无法让它工作。如何对Pandas数据帧的一列进行一次性编码？

One more thing: All the answers I came across had solutions where the column names had to be manually typed while combining them. Is there a way to use Dataframe.iloc()or splicing mechanism to combine columns into a list?

还有一件事：我遇到的所有答案都有解决方案，其中必须在组合列名时手动键入它们。有没有办法使用Dataframe.iloc()或拼接机制将列组合成一个列表？

Answer 1

采纳答案by MaxU

Here is an example of using sklearn.preprocessing.LabelBinarizer:

这是使用sklearn.preprocessing.LabelBinarizer的示例：

In [361]: from sklearn.preprocessing import LabelBinarizer

In [362]: lb = LabelBinarizer()

In [363]: df['new'] = lb.fit_transform(df['ABC']).tolist()

In [364]: df
Out[364]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Pandas alternative:

Pandas替代品：

In [370]: df['new'] = df['ABC'].str.get_dummies().values.tolist()

In [371]: df
Out[371]:
  Col1 ABC        new
0  XYZ   A  [1, 0, 0]
1  XYZ   B  [0, 1, 0]
2  XYZ   C  [0, 0, 1]

Answer 2

回答by andrew_reece

You can just use tolist():

你可以只使用tolist()：

df['ABC'] = pd.get_dummies(df.ABC).values.tolist()

  Col1        ABC
0  XYZ  [1, 0, 0]
1  XYZ  [0, 1, 0]
2  XYZ  [0, 0, 1]

Answer 3

回答by juanpa.arrivillaga

If you have a pd.DataFrame like this:

如果您有这样的 pd.DataFrame：

>>> df
  Col1  A  B  C
0  XYZ  1  0  0
1  XYZ  0  1  0
2  XYZ  0  0  1

You can always do something like this:

你总是可以做这样的事情：

>>> df.apply(lambda s: list(s[1:]), axis=1)
0    [1, 0, 0]
1    [0, 1, 0]
2    [0, 0, 1]
dtype: object

Note, this is essentially a for-loop on the rows. Note, columns do nothave listdata-types, they must be object, which will make your data-frame operations not able to take advantage of the speed benefits of numpy.

请注意，这本质上是行上的 for 循环。注意，列别不具备list的数据类型，它们必须是object，这将使你的数据帧的动作不能够采取的速度好处numpy。

Answer 4

回答by Spandyie

if you have a data-frame dfwith categorical column ABCthen you could use to create a new column of one-hot vectors

如果您有一个df带有分类列的数据框，ABC那么您可以使用它来创建一个新的单热向量列

df['new_column'] = list(pandas.get_dummies(df['AB]).get_values())

将 Pandas Dataframe 列转换为一个热标签

提问by Nir_J

采纳答案by MaxU

回答by andrew_reece

回答by juanpa.arrivillaga

回答by Spandyie

相关推荐

最近更新

标签

将 Pandas Dataframe 列转换为一个热标签

提问by Nir_J

采纳答案by MaxU

回答by andrew_reece

回答by juanpa.arrivillaga

回答by Spandyie

相关推荐

Pandas：对多列求和并在多列中获得结果

在 Pandas 数据框中查找从点到行的欧几里德距离

pandas Dask连接的简单方法（水平，轴= 1，列）

尝试访问索引时出现 Python Pandas 键错误

相关推荐

最近更新

标签