pandas 将集合计数器变成字典

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31807945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:43:44  来源:igfitidea点击:

turning a collections counter into dictionary

pythondictionarypandascollections

提问by Blue Moon

I have a collection outcome resulting from the function:

我有一个由函数产生的收集结果:

Counter(df.email_address)

it returns each individual email address with the count of its repetitions.

它返回每个单独的电子邮件地址及其重复次数。

Counter({nan: 1618, '[email protected]': 265, '[email protected]': 1})

what I want to do is to use it as if it was a dictionary and create a pandas dataframe out of it with two columns one for email addresses and one for the value associated.

我想要做的是像使用字典一样使用它,并从中创建一个 Pandas 数据框,其中有两列,一列用于电子邮件地址,另一列用于关联的值。

I tried with:

我试过:

dfr = repeaters.from_dict(repeaters, orient='index')

but i got the following error:

但我收到以下错误:

AttributeError: 'Counter' object has no attribute 'from_dict'

It makes thing that Counter is not a dictionary as it looks like. Any idea on how to append it to a df?

这使得 Counter 不像它看起来的那样是一本字典。关于如何将它附加到 df 的任何想法?

回答by doru

d = {}
cnt = Counter(df.email_address)
for key, value in cnt.items():
    d[key] = value

EDIT

编辑

Or, how @Trif Nefzger suggested:

或者,@Trif Nefzger 如何建议:

d = dict(Counter(df.email_address))

回答by omri_saadon

as ajcr wrote at the comment, from_dictis a method that belongs to dataframe and thus you can write the following to achieve your goal:

正如ajcr在评论中所写的那样,from_dict是一种属于数据框的方法,因此您可以编写以下内容来实现您的目标:

from collections import Counter
import pandas as pd

repeaters = Counter({"nan": 1618, '[email protected]': 265, '[email protected]': 1})

dfr = pd.DataFrame.from_dict(repeaters, orient='index')
print dfr

Output:

输出:

[email protected]     1
nan                           1618
[email protected]            265

回答by ldirer

Alternatively you could use pd.Series.value_counts, which returns a Seriesobject.

或者,您可以使用pd.Series.value_counts,它返回一个Series对象。

df.email_address.value_counts(dropna=False)

Sample output:

示例输出:

[email protected]    2
[email protected]    1
NaN        1
dtype: int64

This is not exactly what you asked for but looks like what you'd like to achieve.

这不完全是您所要求的,但看起来像是您想要实现的。