Python 将 Counter 对象转换为 Pandas DataFrame
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31111032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Transform a Counter object into a Pandas DataFrame
提问by woshitom
I used Counter
on a list to compute this variable:
我用Counter
一个列表来计算这个变量:
final = Counter(event_container)
print final gives:
打印最终给出:
Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})
Now I want to convert final
into a Pandas DataFrame
, but when I'm doing:
现在我想转换final
为 Pandas DataFrame
,但是当我这样做时:
final_df = pd.DataFrame(final)
but I got an error.
但我有一个错误。
I guess final is not a proper dictionary, so how can I convert final
to a dictionary? Or is it an other way to convert final
to a DataFrame
?
我想 final 不是一个合适的字典,那么我该如何转换final
为字典呢?或者它是转换final
为 a的另一种方式DataFrame
?
采纳答案by EdChum
You can construct using from_dict
and pass param orient='index'
, then call reset_index
so you get a 2 column df:
您可以构造 usingfrom_dict
并传递 param orient='index'
,然后调用reset_index
以便获得 2 列 df:
In [40]:
from collections import Counter
d = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})
df = pd.DataFrame.from_dict(d, orient='index').reset_index()
df
Out[40]:
index 0
0 login 1
1 rt_transaction_confirmation 1
2 fb_view_cart 22
3 fb_connect 1
4 rt_view_product 23
5 fb_search 29
6 sale 1
7 fb_view_listing 76
8 add_to_cart 2
9 rt_view_cart 12
10 fb_homescreen 63
11 fb_view_product 37
12 rt_home_start_app 46
13 fb_view_wishlist 39
14 create_campaign 1
15 rt_search 12
16 guest_sale 1
17 remove_from_cart 1
18 rt_view_listing 50
You can rename the columns to something more meaningful:
您可以将列重命名为更有意义的名称:
In [43]:
df = df.rename(columns={'index':'event', 0:'count'})
df
Out[43]:
event count
0 login 1
1 rt_transaction_confirmation 1
2 fb_view_cart 22
3 fb_connect 1
4 rt_view_product 23
5 fb_search 29
6 sale 1
7 fb_view_listing 76
8 add_to_cart 2
9 rt_view_cart 12
10 fb_homescreen 63
11 fb_view_product 37
12 rt_home_start_app 46
13 fb_view_wishlist 39
14 create_campaign 1
15 rt_search 12
16 guest_sale 1
17 remove_from_cart 1
18 rt_view_listing 50
回答by galath
If you want two columns, set the keyword argument orient='index'
when creating a DataFrame
from a dictionary using from_dict
:
如果您需要两列,请在使用以下命令从字典orient='index'
创建 a 时设置关键字参数:DataFrame
from_dict
final_df = pd.DataFrame.from_dict(final, orient='index')
回答by Suzana
I found it more useful to transform the Counter to a pandas Series that is already ordered by count and where the ordered items are the index, so I used zip
:
我发现将 Counter 转换为已经按计数排序的 Pandas 系列更有用,其中排序的项目是索引,所以我使用了zip
:
def counter_to_series(counter):
if not counter:
return pd.Series()
counter_as_tuples = counter.most_common(len(counter))
items, counts = zip(*counter_as_tuples)
return pd.Series(counts, index=items)
The most_common
method of the counter object returns a list of (item, count)
tuples. zip
will throw an exception when the counter has no items, so an empty Counter must be checked beforehand.
most_common
counter 对象的方法返回一个(item, count)
元组列表。zip
当计数器没有物品时会抛出异常,因此必须事先检查空计数器。
回答by pvasek
Another option is to use DataFrame.from_records
method
另一种选择是使用DataFrame.from_records
方法
import pandas as pd
from collections import Counter
c = Counter({'fb_view_listing': 76, 'fb_homescreen': 63, 'rt_view_listing': 50, 'rt_home_start_app': 46, 'fb_view_wishlist': 39, 'fb_view_product': 37, 'fb_search': 29, 'rt_view_product': 23, 'fb_view_cart': 22, 'rt_search': 12, 'rt_view_cart': 12, 'add_to_cart': 2, 'create_campaign': 1, 'fb_connect': 1, 'sale': 1, 'guest_sale': 1, 'remove_from_cart': 1, 'rt_transaction_confirmation': 1, 'login': 1})
df = pd.DataFrame.from_records(list(dict(c).items()), columns=['page','count'])
It's a one-liner and speed seems to be the same.
这是一个单线和速度似乎是一样的。
Or use this variant to have them sorted by most used. Again the performance is about the same.
或者使用这个变体让它们按最常用的排序。同样,性能大致相同。
df = pd.DataFrame.from_records(c.most_common(), columns=['page','count'])