Python：来自 dict 系列的 Pandas 数据框

Question

提问by makambi

I have a Pandas dataframe:

我有一个Pandas数据框：

type(original)
pandas.core.frame.DataFrame

which includes the series object original['user']:

其中包括系列对象original['user']：

type(original['user'])
pandas.core.series.Series

original['user']points to a number of dicts:

original['user']指向一些字典：

type(original['user'].ix[0])
dict

Each dict has the same keys:

每个字典都有相同的键：

original['user'].ix[0].keys()

[u'follow_request_sent',
 u'profile_use_background_image',
 u'profile_text_color',
 u'id',
 u'verified',
 u'profile_location',
 # ... keys removed for brevity
]

Above is (part of) one of the dicts of userfields in a tweet from tweeter API. I want to build a data frame from these dicts.

以上是user来自推文API的推文中字段的（部分）字典之一。我想从这些字典中构建一个数据框。

When I try to make a data frame directly, I get only one column for each row and this column contains the whole dict:

当我尝试直接制作数据框时，每行只有一列，而该列包含整个字典：

pd.DataFrame(original['user'][:2])
    user
0   {u'follow_request_sent': False, u'profile_use_...
1   {u'follow_request_sent': False, u'profile_use_..

When I try to create a data frame using from_dict() I get the same result:

当我尝试使用 from_dict() 创建数据框时，我得到相同的结果：

pd.DataFrame.from_dict(original['user'][:2])

    user
0   {u'follow_request_sent': False, u'profile_use_...
1   {u'follow_request_sent': False, u'profile_use_..

Next I tried a list comprehension which returned an error:

接下来我尝试了一个返回错误的列表理解：

item = [[k, v] for (k,v) in users]
ValueError: too many values to unpack

When I create a data frame from a single row, it nearly works:

当我从单行创建数据框时，它几乎可以工作：

df = pd.DataFrame.from_dict(original['user'].ix[0])
df.reset_index()

    index   contributors_enabled    created_at  default_profile     default_profile_image   description     entities    favourites_count    follow_request_sent     followers_count     following   friends_count   geo_enabled     id  id_str  is_translation_enabled  is_translator   lang    listed_count    location    name    notifications   profile_background_color    profile_background_image_url    profile_background_image_url_https  profile_background_tile     profile_image_url   profile_image_url_https     profile_link_color  profile_location    profile_sidebar_border_color    profile_sidebar_fill_color  profile_text_color  profile_use_background_image    protected   screen_name     statuses_count  time_zone   url     utc_offset  verified
0   description     False   Mon May 26 11:58:40 +0000 2014  True    False       {u'urls': []}   0   False   157

It works almost like I want it to, except it sets the descriptionfield as the default index.

它几乎像我想要的那样工作，除了它将description字段设置为默认索引。

Each of the dicts has 40 keys but I only need about 10 of them and I have 28734 rows in data frame.

每个字典都有 40 个键，但我只需要大约 10 个键，并且我在数据框中有 28734 行。

How can I filter out the keys which I do not need?

如何过滤掉不需要的键？

Answer 1

采纳答案by Eyad

what I would try to do is the following:

我会尝试做的是以下内容：

new_df = pd.DataFrame(list(original['user']))

this will convert the series to list then pass it to pandas dataframe and it should take care of the rest.

这会将系列转换为列表，然后将其传递给 Pandas 数据框，它应该负责其余的工作。

Answer 2

回答by saynah

df = original['user'].apply(pd.Series)

works well

效果很好

credit

信用

Python：来自 dict 系列的 Pandas 数据框

提问by makambi

采纳答案by Eyad

回答by saynah

相关推荐

最近更新

标签

Python：来自 dict 系列的 Pandas 数据框

提问by makambi

采纳答案by Eyad

回答by saynah

相关推荐

如何将 Pandas 数据框转换为 numpy 数据框

pandas Python如何使用数据框应用方法查找列的平均值

pandas 我怎样才能捕捉到熊猫数据错误？

Pandas：根据字符串计数创建直方图

相关推荐

最近更新

标签