pandas 如何让熊猫 get_dummies 发出 N-1 个变量以避免共线性？

Question

提问by ihadanny

pandas.get_dummiesemits a dummy variable per categorical value. Is there some automated, easy way to ask it to create only N-1 dummy variables? (just get rid of one "baseline" variable arbitrarily)?

pandas.get_dummies每个分类值发出一个虚拟变量。是否有一些自动化的、简单的方法可以让它只创建 N-1 个虚拟变量？（只是随意摆脱一个“基线”变量）？

Needed to avoid co-linearity in our dataset.

需要避免我们数据集中的共线性。

Answer 1

回答by T.C. Proctor

Pandas version 0.18.0 implemented exactly what you're looking for: the drop_firstoption. Here's an example:

Pandas 0.18.0 版完全实现了您正在寻找的内容：drop_first选项。下面是一个例子：

In [1]: import pandas as pd

In [2]: pd.__version__
Out[2]: u'0.18.1'

In [3]: s = pd.Series(list('abcbacb'))

In [4]: pd.get_dummies(s, drop_first=True)
Out[4]: 
     b    c
0  0.0  0.0
1  1.0  0.0
2  0.0  1.0
3  1.0  0.0
4  0.0  0.0
5  0.0  1.0
6  1.0  0.0

Answer 2

回答by Ami Tavory

There are a number of ways of doing so.

有多种方法可以这样做。

Possibly the simplest is replacing one of the values by Nonebefore calling get_dummies. Say you have:

可能最简单的方法是None在调用之前替换其中一个值get_dummies。说你有：

import pandas as pd
import numpy as np
s = pd.Series(list('babca'))
>> s
0    b
1    a
2    b
3    c
4    a

Then use:

然后使用：

>> pd.get_dummies(np.where(s == s.unique()[0], None, s))
    a   c
0   0   0
1   1   0
2   0   0
3   0   1
4   1   0

to drop b.

下降b。

(Of course, you need to consider if your category column doesn't already contain None.)

（当然，您需要考虑您的类别列是否已经包含None。）

Another way is to use the prefixargument to get_dummies:

另一种方法是使用prefix参数get_dummies：

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False)
prefix: string, list of strings, or dict of strings, default None - String to append DataFrame column names Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternativly, prefix can be a dictionary mapping column names to prefixes.

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False)
前缀：字符串、字符串列表或字符串字典，默认无 - 附加数据帧列名的字符串在对数据帧调用 get_dummies 时传递长度等于列数的列表。或者，前缀可以是将列名称映射到前缀的字典。

This will append some prefix to all of the resulting columns, and you can then erase one of the columns with this prefix (just make it unique).

这将为所有结果列附加一些前缀，然后您可以删除具有此前缀的列之一（只需使其唯一）。

pandas 如何让熊猫 get_dummies 发出 N-1 个变量以避免共线性？

提问by ihadanny

回答by T.C. Proctor

回答by Ami Tavory

相关推荐

最近更新

标签

pandas 如何让熊猫 get_dummies 发出 N-1 个变量以避免共线性？

提问by ihadanny

回答by T.C. Proctor

回答by Ami Tavory

相关推荐

pandas 如何使用pandas仅用空字符串替换None？

pandas ValueError：索引必须单调递增或递减

Pandas：无法根据字符串相等进行过滤

pandas 熊猫在图表上显示多个条形图

相关推荐

最近更新

标签