Python:来自 dict 系列的 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29681906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:12:56  来源:igfitidea点击:

Python: Pandas dataframe from Series of dict

pythonpandasdataframe

提问by makambi

I have a Pandas dataframe:

我有一个Pandas数据框:

type(original)
pandas.core.frame.DataFrame

which includes the series object original['user']:

其中包括系列对象original['user']

type(original['user'])
pandas.core.series.Series

original['user']points to a number of dicts:

original['user']指向一些字典:

type(original['user'].ix[0])
dict

Each dict has the same keys:

每个字典都有相同的键:

original['user'].ix[0].keys()

[u'follow_request_sent',
 u'profile_use_background_image',
 u'profile_text_color',
 u'id',
 u'verified',
 u'profile_location',
 # ... keys removed for brevity
]

Above is (part of) one of the dicts of userfields in a tweet from tweeter API. I want to build a data frame from these dicts.

以上是user来自推文API的推文中字段的(部分)字典之一。我想从这些字典中构建一个数据框。

When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict:

当我尝试直接制作数据框时,每行只有一列,而该列包含整个字典:

pd.DataFrame(original['user'][:2])
    user
0   {u'follow_request_sent': False, u'profile_use_...
1   {u'follow_request_sent': False, u'profile_use_..

When I try to create a data frame using from_dict() I get the same result:

当我尝试使用 from_dict() 创建数据框时,我得到相同的结果:

pd.DataFrame.from_dict(original['user'][:2])

    user
0   {u'follow_request_sent': False, u'profile_use_...
1   {u'follow_request_sent': False, u'profile_use_..

Next I tried a list comprehension which returned an error:

接下来我尝试了一个返回错误的列表理解:

item = [[k, v] for (k,v) in users]
ValueError: too many values to unpack

When I create a data frame from a single row, it nearly works:

当我从单行创建数据框时,它几乎可以工作:

df = pd.DataFrame.from_dict(original['user'].ix[0])
df.reset_index()

    index   contributors_enabled    created_at  default_profile     default_profile_image   description     entities    favourites_count    follow_request_sent     followers_count     following   friends_count   geo_enabled     id  id_str  is_translation_enabled  is_translator   lang    listed_count    location    name    notifications   profile_background_color    profile_background_image_url    profile_background_image_url_https  profile_background_tile     profile_image_url   profile_image_url_https     profile_link_color  profile_location    profile_sidebar_border_color    profile_sidebar_fill_color  profile_text_color  profile_use_background_image    protected   screen_name     statuses_count  time_zone   url     utc_offset  verified
0   description     False   Mon May 26 11:58:40 +0000 2014  True    False       {u'urls': []}   0   False   157

It works almost like I want it to, except it sets the descriptionfield as the default index.

它几乎像我想要的那样工作,除了它将description字段设置为默认索引。

Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame.

每个字典都有 40 个键,但我只需要大约 10 个键,并且我在数据框中有 28734 行。

How can I filter out the keys which I do not need?

如何过滤掉不需要的键?

采纳答案by Eyad

what I would try to do is the following:

我会尝试做的是以下内容:

new_df = pd.DataFrame(list(original['user']))

this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.

这会将系列转换为列表,然后将其传递给 Pandas 数据框,它应该负责其余的工作。

回答by saynah

df = original['user'].apply(pd.Series)

df = original['user'].apply(pd.Series)

works well

效果很好

credit

信用