对 Pandas Dataframe 列中的列表进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39900061/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort lists in a Pandas Dataframe column
提问by Hyman Cooper
I have a Dataframe column which is a collection of lists
我有一个 Dataframe 列,它是一个列表集合
a
['a', 'b']
['b', 'a']
['a', 'c']
['c', 'a']
I would like to use this list to group by its unique values (['a', 'b'] & ['a', 'c']). However, this generates an error
我想使用此列表按其唯一值 (['a', 'b'] & ['a', 'c']) 进行分组。但是,这会产生错误
TypeError: unhashable type: 'list'
Is there any way around this. Ideally I would like to sort the values in place and create an additional column of a concatenated string.
有没有办法解决。理想情况下,我想对值进行排序并创建一个连接字符串的附加列。
回答by estebanpdl
You can also sort values by column.
您还可以按列对值进行排序。
Example:
例子:
x = [['a', 'b'], ['b', 'a'], ['a', 'c'], ['c', 'a']]
df = pandas.DataFrame({'a': Series(x)})
df.a.sort_values()
a
0 [a, b]
2 [a, c]
1 [b, a]
3 [c, a]
However, for what I understand, you want to sort [b, a]
to [a, b]
, and [c, a]
to [a, c]
and then set
values in order to get only [a, b][a, c]
.
但是,我的理解,要排序[b, a]
来[a, b]
,并[c, a]
以[a, c]
再set
为了只获取值[a, b][a, c]
。
i'd recommend use lambda
我建议使用 lambda
Try:
尝试:
result = df.a.sort_values().apply(lambda x: sorted(x))
result = DataFrame(result).reset_index(drop=True)
It returns:
它返回:
0 [a, b]
1 [a, c]
2 [a, b]
3 [a, c]
Then get unique values:
然后获取唯一值:
newdf = pandas.DataFrame({'a': Series(list(set(result['a'].apply(tuple))))})
newdf.sort_values(by='a')
a
0 (a, b)
1 (a, c)
回答by piRSquared
list are unhashable. however, tuples are hashable
列表是不可哈希的。然而,元组是可散列的
use
用
df.groupby([df.a.apply(tuple)])
setupdf = pd.DataFrame(dict(a=[list('ab'), list('ba'), list('ac'), list('ca')]))
resultsdf.groupby([df.a.apply(tuple)]).size()
设置df = pd.DataFrame(dict(a=[list('ab'), list('ba'), list('ac'), list('ca')]))
结果df.groupby([df.a.apply(tuple)]).size()
a
(a, b) 1
(a, c) 1
(b, a) 1
(c, a) 1
dtype: int64