pandas 将集合计数器变成字典
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31807945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
turning a collections counter into dictionary
提问by Blue Moon
I have a collection outcome resulting from the function:
我有一个由函数产生的收集结果:
Counter(df.email_address)
it returns each individual email address with the count of its repetitions.
它返回每个单独的电子邮件地址及其重复次数。
Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})
what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.
我想要做的是像使用字典一样使用它,并从中创建一个 Pandas 数据框,其中有两列,一列用于电子邮件地址,另一列用于关联的值。
I tried with:
我试过:
dfr = repeaters.from_dict(repeaters, orient='index')
but i got the following error:
但我收到以下错误:
AttributeError: 'Counter' object has no attribute 'from_dict'
It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?
这使得 Counter 不像它看起来的那样是一本字典。关于如何将它附加到 df 的任何想法?
回答by doru
d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
d[key] = value
EDIT
编辑
Or, how @Trif Nefzger suggested:
或者,@Trif Nefzger 如何建议:
d = dict(Counter(df.email_address))
回答by omri_saadon
as ajcr wrote at the comment, from_dictis a method that belongs to dataframe and thus you can write the following to achieve your goal:
正如ajcr在评论中所写的那样,from_dict是一种属于数据框的方法,因此您可以编写以下内容来实现您的目标:
from collections import Counter
import pandas as pd
repeaters = Counter({"nan": 1618, '[email protected]': 265, '[email protected]': 1})
dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr
Output:
输出:
[email protected] 1
nan 1618
[email protected] 265
回答by ldirer
Alternatively you could use pd.Series.value_counts, which returns a Seriesobject.
或者,您可以使用pd.Series.value_counts,它返回一个Series对象。
df.email_address.value_counts(dropna=False)
Sample output:
示例输出:
[email protected] 2
[email protected] 1
NaN 1
dtype: int64
This is not exactly what you asked for but looks like what you'd like to achieve.
这不完全是您所要求的,但看起来像是您想要实现的。

