pandas df ['X'].unique() 和 TypeError: unhashable type: 'numpy.ndarray'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51675151/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
df ['X'].unique() and TypeError: unhashable type: 'numpy.ndarray'
提问by SBad
all,
全部,
I have a column in a dataframe that looks like this:
我在数据框中有一列,如下所示:
allHoldingsFund['BrokerMixed']
Out[419]:
78 ML
81 CITI
92 ML
173 CITI
235 ML
262 ML
264 ML
25617 GS
25621 CITI
25644 CITI
25723 GS
25778 CITI
25786 CITI
25793 GS
25797 CITI
Name: BrokerMixed, Length: 2554, dtype: object
Although the column is an object. I am not able to group by that column or even extract the unique values of that column. For example when I do:
虽然列是一个对象。我无法按该列分组,甚至无法提取该列的唯一值。例如,当我这样做时:
allHoldingsFund['BrokerMixed'].unique()
I get an error
我收到一个错误
uniques = table.unique(values)
File "pandas/_libs/hashtable_class_helper.pxi", line 1340, in pandas._libs.hashtable.PyObjectHashTable.unique
TypeError: unhashable type: 'numpy.ndarray'
I also get an error when I do group by.
当我分组时,我也收到错误消息。
Any help is welcome. Thank you
欢迎任何帮助。谢谢
采纳答案by Harry_pb
First I would suggest you to check what's type
of your column
. You may try as follows
首先,我建议您检查一下type
您的column
. 你可以尝试如下
print (type(allHoldingsFund['BrokerMixed']))
If this is a dataframe series
, you may try
如果这是一个dataframe series
,你可以试试
allHoldingsFund['BrokerMixed'].reset_index()['BrokerMixed'].unique()
and check if this works for you.
并检查这是否适合您。
EDIT 2020
: Your way to get unique and mentioned answers fetch same results using Python 3
EDIT 2020
:您获得独特和提及的答案的方式使用 Python 3 获取相同的结果
回答by jpp
Looks like you have a NumPy array in your series. But you can't hash NumPy arrays and pd.Series.unique
, like set
, relies on hashing.
看起来您的系列中有一个 NumPy 数组。但是您不能对 NumPy 数组进行散列,并且pd.Series.unique
像 一样set
依赖于散列。
If you can't ensure your series data only consists of strings, you can convert NumPy arrays to tuples before calling pd.Series.unique
:
如果您不能确保您的系列数据只包含字符串,您可以在调用之前将 NumPy 数组转换为元组pd.Series.unique
:
s = pd.Series([np.array([1, 2, 3]), 1, 'hello', 'test', 1, 'test'])
def tuplizer(x):
return tuple(x) if isinstance(x, (np.ndarray, list)) else x
res = s.apply(tuplizer).unique()
print(res)
array([(1, 2, 3), 1, 'hello', 'test'], dtype=object)
Of course, this means your data type information is lost in the result, but at least you get to see your "unique" NumPy arrays, provided they are 1-dimensional.
当然,这意味着您的数据类型信息会在结果中丢失,但至少您可以看到“唯一”的 NumPy 数组,前提是它们是一维的。
回答by Sahil Puri
You have an array in your data column, you could try the following
您的数据列中有一个数组,您可以尝试以下操作
allHoldingsFund['BrokerMixed'].apply(lambda x: str(x)).unique()