pandas 如何使用 sklearn FeatureHasher？

Question

提问by KillerSnail

I have a dataframe like this:

我有一个这样的数据框：

import pandas as pd
test = pd.DataFrame({'type': ['a', 'b', 'a', 'c', 'b'], 'model': ['bab', 'ba', 'ba', 'ce', 'bw']})

How do I use the sklearnFeatureHasheron it?

我如何使用sklearnFeatureHasher它？

I tried:

我试过：

from sklearn.feature_extraction import FeatureHasher 
FH = FeatureHasher()
train = FH.transform(test.type)

but it doesn't like it? it seems it wants a string or a list so I try

但它不喜欢吗？似乎它想要一个字符串或一个列表，所以我尝试

FH.transform(test.to_dict(orient='list'))

but that doesn't work either? I get:

但这也不起作用？我得到：

AttributeError: 'str' object has no attribute 'items'

thanks

谢谢

Answer 1

回答by Julien Marrec

You need to specify the input type when initializing your instance of FeatureHasher:

您需要在初始化FeatureHasher实例时指定输入类型：

In [1]:
from sklearn.feature_extraction import FeatureHasher
h = FeatureHasher(n_features=5, input_type='string')
f = h.transform(test.type)
f.toarray()

Out[1]:
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  0., -1.,  0.,  0.],
       [ 0., -1.,  0.,  0.,  0.]])

Note that this will assume the value of these feature is 1 according to the documentation linked above (bold emphasis is mine):

请注意，根据上面链接的文档，这将假设这些功能的值为 1（粗体强调的是我的）：

input_type : string, optional, default “dict”
Either “dict” (the default) to accept dictionaries over (feature_name, value);
“pair” to accept pairs of (feature_name, value);
or “string” to accept single strings. feature_name should be a string, while value should be a number. In the case of “string”, a value of 1 is implied.
The feature_name is hashed to find the appropriate column for the feature. The value's sign might be flipped in the output (but see non_negative, below).

input_type ：字符串，可选，默认“字典”
要么“dict”（默认）接受字典 over (feature_name, value);
“pair” 接受成对的 (feature_name, value)；
或“字符串”接受单个字符串。feature_name 应该是一个字符串，而 value 应该是一个数字。在“字符串”的情况下，隐含值为 1。
对 feature_name 进行散列以找到适合该功能的列。值的符号可能会在输出中翻转（但请参阅下面的 non_negative）。

pandas 如何使用 sklearn FeatureHasher？

提问by KillerSnail

回答by Julien Marrec

相关推荐

最近更新

标签

pandas 如何使用 sklearn FeatureHasher？

提问by KillerSnail

回答by Julien Marrec

相关推荐

pandas 如何在pandas python中创建频率表

尝试迭代并加入 Pandas DF：AttributeError: 'Series' 对象没有属性 'join'

pandas 遍历 numpy 数组的最快方法是什么

将 Python Pandas DataFrame 写入 Word 文档

相关推荐

最近更新

标签