pandas df ['X'].unique() 和 TypeError: unhashable type: 'numpy.ndarray'

Question

提问by SBad

all,

全部，

I have a column in a dataframe that looks like this:

我在数据框中有一列，如下所示：

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

虽然列是一个对象。我无法按该列分组，甚至无法提取该列的唯一值。例如，当我这样做时：

allHoldingsFund['BrokerMixed'].unique()

I get an error

我收到一个错误

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

当我分组时，我也收到错误消息。

Any help is welcome. Thank you

欢迎任何帮助。谢谢

Answer 1

采纳答案by Harry_pb

First I would suggest you to check what's typeof your column. You may try as follows

首先，我建议您检查一下type您的column. 你可以尝试如下

print (type(allHoldingsFund['BrokerMixed']))

If this is a dataframe series, you may try

如果这是一个dataframe series，你可以试试

allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()

and check if this works for you.

并检查这是否适合您。

EDIT 2020: Your way to get unique and mentioned answers fetch same results using Python 3

EDIT 2020：您获得独特和提及的答案的方式使用 Python 3 获取相同的结果

Answer 2

回答by jpp

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

看起来您的系列中有一个 NumPy 数组。但是您不能对 NumPy 数组进行散列，并且pd.Series.unique像一样set依赖于散列。

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

如果您不能确保您的系列数据只包含字符串，您可以在调用之前将 NumPy 数组转换为元组pd.Series.unique：

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

当然，这意味着您的数据类型信息会在结果中丢失，但至少您可以看到“唯一”的 NumPy 数组，前提是它们是一维的。

Answer 3

回答by Sahil Puri

You have an array in your data column, you could try the following

您的数据列中有一个数组，您可以尝试以下操作

allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()

pandas df ['X'].unique() 和 TypeError: unhashable type: 'numpy.ndarray'

提问by SBad

采纳答案by Harry_pb

回答by jpp

回答by Sahil Puri

相关推荐

最近更新

标签

pandas df ['X'].unique() 和 TypeError: unhashable type: 'numpy.ndarray'

提问by SBad

采纳答案by Harry_pb

回答by jpp

回答by Sahil Puri

相关推荐

Pandas Python：如何从列表中创建多列

pandas 使用 DataFrame.plot 在堆积条形图中显示总计和百分比

pandas Python：在 pd.DataFrame 中循环遍历行时，“ValueError：只能将大小为 1 的数组转换为 Python 标量”

pandas 如何将值添加到熊猫数据框中的新列？

相关推荐

最近更新

标签