从 Pandas 系列列表中获取唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51813266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get unique values from pandas series of lists
提问by rohan
I have a column in DataFrame containing list of categories. For example:
我在 DataFrame 中有一个包含类别列表的列。例如:
0 [Pizza]
1 [Mexican, Bars, Nightlife]
2 [American, New, Barbeque]
3 [Thai]
4 [Desserts, Asian, Fusion, Mexican, Hawaiian, F...
6 [Thai, Barbeque]
7 [Asian, Fusion, Korean, Mexican]
8 [Barbeque, Bars, Pubs, American, Traditional, ...
9 [Diners, Burgers, Breakfast, Brunch]
11 [Pakistani, Halal, Indian]
I am attempting to do two things:
我正在尝试做两件事:
1) Get unique categories - My approach is have a empty set, iterate through series and append each list.
1)获取唯一类别 - 我的方法是有一个空集,遍历系列并附加每个列表。
my code:
我的代码:
unique_categories = {'Pizza'}
for lst in restaurant_review_df['categories_arr']:
unique_categories = unique_categories | set(lst)
This give me a set of unique categories contained in all the lists in the column.
这为我提供了一组包含在列中所有列表中的独特类别。
2) Generate pie plot of category counts and each restaurant can belong to multiple categories. For example: restaurant 11 belongs to Pakistani, Indian and Halal categories. My approach is again iterate through categories and one more iteration through series to get counts.
2)生成类别计数的饼图,每个餐厅可以属于多个类别。例如:餐厅 11 属于巴基斯坦、印度和清真类别。我的方法是再次迭代类别,再迭代一次系列以获得计数。
Are there simpler or elegant ways of doing this?
有没有更简单或优雅的方法来做到这一点?
Thanks in advance.
提前致谢。
回答by Scott Boston
Update using pandas 0.25.0+ with explode
使用 pandas 0.25.0+ 更新 explode
df['category'].explode().value_counts()
Output:
输出:
Barbeque 3
Mexican 3
Fusion 2
Thai 2
American 2
Bars 2
Asian 2
Hawaiian 1
New 1
Brunch 1
Pizza 1
Traditional 1
Pubs 1
Korean 1
Pakistani 1
Burgers 1
Diners 1
Indian 1
Desserts 1
Halal 1
Nightlife 1
Breakfast 1
Name: Places, dtype: int64
And with plotting:
并绘图:
df['category'].explode().value_counts().plot.pie(figsize=(8,8))
Output:
输出:
For older verions of pandas before 0.25.0 Try:
对于 0.25.0 之前的旧版Pandas,请尝试:
df['category'].apply(pd.Series).stack().value_counts()
Output:
输出:
Mexican 3
Barbeque 3
Thai 2
Fusion 2
American 2
Bars 2
Asian 2
Pubs 1
Burgers 1
Traditional 1
Brunch 1
Indian 1
Korean 1
Halal 1
Pakistani 1
Hawaiian 1
Diners 1
Pizza 1
Nightlife 1
New 1
Desserts 1
Breakfast 1
dtype: int64
With plotting:
绘图:
df['category'].apply(pd.Series).stack().value_counts().plot.pie()
Per @coldspeed's comments
根据@coldspeed 的评论
from itertools import chain
from collections import Counter
pd.DataFrame.from_dict(Counter(chain(*df['category'])), orient='index').sort_values(0, ascending=False)
Output:
输出:
Barbeque 3
Mexican 3
Bars 2
American 2
Thai 2
Asian 2
Fusion 2
Pizza 1
Diners 1
Halal 1
Pakistani 1
Brunch 1
Breakfast 1
Burgers 1
Hawaiian 1
Traditional 1
Pubs 1
Korean 1
Desserts 1
New 1
Nightlife 1
Indian 1