pandas df ['X'].unique() 和 TypeError: unhashable type: 'numpy.ndarray'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51675151/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:53:05  来源:igfitidea点击:

df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'

pythonpandasgroup-by

提问by SBad

all,

全部,

I have a column in a dataframe that looks like this:

我在数据框中有一列,如下所示:

allHoldingsFund['BrokerMixed']
Out[419]: 
78         ML
81       CITI
92         ML
173      CITI
235        ML
262        ML
264        ML
25617      GS
25621    CITI
25644    CITI
25723      GS
25778    CITI
25786    CITI
25793      GS
25797    CITI
Name: BrokerMixed, Length: 2554, dtype: object

Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:

虽然列是一个对象。我无法按该列分组,甚至无法提取该列的唯一值。例如,当我这样做时:

allHoldingsFund['BrokerMixed'].unique()

I get an error

我收到一个错误

     uniques = table.unique(values)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'

I also get an error when I do group by.

当我分组时,我也收到错误消息。

Any help is welcome. Thank you

欢迎任何帮助。谢谢

采纳答案by Harry_pb

First I would suggest you to check what's typeof your column. You may try as follows

首先,我建议您检查一下type您的column. 你可以尝试如下

print (type(allHoldingsFund['BrokerMixed']))

If this is a dataframe series, you may try

如果这是一个dataframe series,你可以试试

allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()

and check if this works for you.

并检查这是否适合您。

EDIT 2020: Your way to get unique and mentioned answers fetch same results using Python 3

EDIT 2020:您获得独特和提及的答案的方式使用 Python 3 获取相同的结果

enter image description here

在此处输入图片说明

回答by jpp

Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique, like set, relies on hashing.

看起来您的系列中有一个 NumPy 数组。但是您不能对 NumPy 数组进行散列,并且pd.Series.unique像 一样set依赖于散列。

If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique:

如果您不能确保您的系列数据只包含字符串,您可以在调用之前将 NumPy 数组转换为元组pd.Series.unique

s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])

def tuplizer(x):
    return tuple(x) if isinstance(x, (np.ndarray, list)) else x

res = s.apply(tuplizer).unique()

print(res)

array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)

Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.

当然,这意味着您的数据类型信息会在结果中丢失,但至少您可以看到“唯一”的 NumPy 数组,前提是它们是一维的。

回答by Sahil Puri

You have an array in your data column, you could try the following

您的数据列中有一个数组,您可以尝试以下操作

allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()