对 Pandas Dataframe 列中的列表进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39900061/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:09:10  来源:igfitidea点击:

Sort lists in a Pandas Dataframe column

pythonpandasdataframe

提问by Hyman Cooper

I have a Dataframe column which is a collection of lists

我有一个 Dataframe 列,它是一个列表集合

    a
['a', 'b']
['b', 'a']
['a', 'c']
['c', 'a']

I would like to use this list to group by its unique values (['a', 'b'] & ['a', 'c']). However, this generates an error

我想使用此列表按其唯一值 (['a', 'b'] & ['a', 'c']) 进行分组。但是,这会产生错误

TypeError: unhashable type: 'list'

Is there any way around this. Ideally I would like to sort the values in place and create an additional column of a concatenated string.

有没有办法解决。理想情况下,我想对值进行排序并创建一个连接字符串的附加列。

回答by estebanpdl

You can also sort values by column.

您还可以按列对值进行排序。

Example:

例子:

x = [['a', 'b'], ['b', 'a'], ['a', 'c'], ['c', 'a']]
df = pandas.DataFrame({'a': Series(x)})
df.a.sort_values()

     a
0   [a, b]
2   [a, c]
1   [b, a]
3   [c, a]

However, for what I understand, you want to sort [b, a]to [a, b], and [c, a]to [a, c]and then setvalues in order to get only [a, b][a, c].

但是,我的理解,要排序[b, a][a, b],并[c, a][a, c]set为了只获取值[a, b][a, c]

i'd recommend use lambda

我建议使用 lambda

Try:

尝试:

result = df.a.sort_values().apply(lambda x: sorted(x))
result = DataFrame(result).reset_index(drop=True)

It returns:

它返回:

0    [a, b]
1    [a, c]
2    [a, b]
3    [a, c]

Then get unique values:

然后获取唯一值:

newdf = pandas.DataFrame({'a': Series(list(set(result['a'].apply(tuple))))})
newdf.sort_values(by='a')

     a
0   (a, b)
1   (a, c)

回答by piRSquared

list are unhashable. however, tuples are hashable

列表是不可哈希的。然而,元组是可散列的

use

df.groupby([df.a.apply(tuple)])

setup
df = pd.DataFrame(dict(a=[list('ab'), list('ba'), list('ac'), list('ca')]))
results
df.groupby([df.a.apply(tuple)]).size()

设置
df = pd.DataFrame(dict(a=[list('ab'), list('ba'), list('ac'), list('ca')]))
结果
df.groupby([df.a.apply(tuple)]).size()

a
(a, b)    1
(a, c)    1
(b, a)    1
(c, a)    1
dtype: int64