从 Pandas 系列列表中获取唯一值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51813266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:55:40  来源:igfitidea点击:

Get unique values from pandas series of lists

pythonpandassetuniquepie-chart

提问by rohan

I have a column in DataFrame containing list of categories. For example:

我在 DataFrame 中有一个包含类别列表的列。例如:

0                                                    [Pizza]
1                                 [Mexican, Bars, Nightlife]
2                                  [American, New, Barbeque]
3                                                     [Thai]
4          [Desserts, Asian, Fusion, Mexican, Hawaiian, F...
6                                           [Thai, Barbeque]
7                           [Asian, Fusion, Korean, Mexican]
8          [Barbeque, Bars, Pubs, American, Traditional, ...
9                       [Diners, Burgers, Breakfast, Brunch]
11                                [Pakistani, Halal, Indian]

I am attempting to do two things:

我正在尝试做两件事:

1) Get unique categories - My approach is have a empty set, iterate through series and append each list.

1)获取唯一类别 - 我的方法是有一个空集,遍历系列并附加每个列表。

my code:

我的代码:

unique_categories = {'Pizza'}
for lst in restaurant_review_df['categories_arr']:
    unique_categories = unique_categories | set(lst)

This give me a set of unique categories contained in all the lists in the column.

这为我提供了一组包含在列中所有列表中的独特类别。

2) Generate pie plot of category counts and each restaurant can belong to multiple categories. For example: restaurant 11 belongs to Pakistani, Indian and Halal categories. My approach is again iterate through categories and one more iteration through series to get counts.

2)生成类别计数的饼图,每个餐厅可以属于多个类别。例如:餐厅 11 属于巴基斯坦、印度和清真类别。我的方法是再次迭代类别,再迭代一次系列以获得计数。

Are there simpler or elegant ways of doing this?

有没有更简单或优雅的方法来做到这一点?

Thanks in advance.

提前致谢。

回答by Scott Boston

Update using pandas 0.25.0+ with explode

使用 pandas 0.25.0+ 更新 explode

df['category'].explode().value_counts()

Output:

输出:

Barbeque       3
Mexican        3
Fusion         2
Thai           2
American       2
Bars           2
Asian          2
Hawaiian       1
New            1
Brunch         1
Pizza          1
Traditional    1
Pubs           1
Korean         1
Pakistani      1
Burgers        1
Diners         1
Indian         1
Desserts       1
Halal          1
Nightlife      1
Breakfast      1
Name: Places, dtype: int64

And with plotting:

并绘图:

df['category'].explode().value_counts().plot.pie(figsize=(8,8))

Output:

输出:

enter image description here

enter image description here



For older verions of pandas before 0.25.0 Try:

对于 0.25.0 之前的旧版Pandas,请尝试:

df['category'].apply(pd.Series).stack().value_counts()

Output:

输出:

Mexican        3
Barbeque       3
Thai           2
Fusion         2
American       2
Bars           2
Asian          2
Pubs           1
Burgers        1
Traditional    1
Brunch         1
Indian         1
Korean         1
Halal          1
Pakistani      1
Hawaiian       1
Diners         1
Pizza          1
Nightlife      1
New            1
Desserts       1
Breakfast      1
dtype: int64

With plotting:

绘图:

df['category'].apply(pd.Series).stack().value_counts().plot.pie()

Output: enter image description here

输出: enter image description here

Per @coldspeed's comments

根据@coldspeed 的评论

from itertools import chain
from collections import Counter

pd.DataFrame.from_dict(Counter(chain(*df['category'])), orient='index').sort_values(0, ascending=False)

Output:

输出:

Barbeque     3
Mexican      3
Bars         2
American     2
Thai         2
Asian        2
Fusion       2
Pizza        1
Diners       1
Halal        1
Pakistani    1
Brunch       1
Breakfast    1
Burgers      1
Hawaiian     1
Traditional  1
Pubs         1
Korean       1
Desserts     1
New          1
Nightlife    1
Indian       1