如何访问 Pandas DataFrame 中嵌入的 json 对象？

Question

提问by Kyle Kelley

TL;DR If loaded fields in a Pandas DataFrame contain JSON documents themselves, how can they be worked with in a Pandas like fashion?

TL;DR 如果 Pandas DataFrame 中加载的字段本身包含 JSON 文档，那么如何以类似 Pandas 的方式使用它们？

Currently I'm directly dumping json/dictionary results from a Twitter library (twython) into a Mongo collection (called users here).

目前，我直接将 Twitter 库 ( twython) 中的json/dictionary 结果转储到 Mongo 集合中（此处称为用户）。

from twython import Twython
from pymongo import MongoClient

tw = Twython(...<auth>...)

# Using mongo as object storage 
client = MongoClient()
db = client.twitter
user_coll = db.users

user_batch = ... # collection of user ids
user_dict_batch = tw.lookup_user(user_id=user_batch)

for user_dict in user_dict_batch:
    if(user_coll.find_one({"id":user_dict['id']}) == None):
        user_coll.insert(user_dict)

After populating this database I read the documents into Pandas:

填充此数据库后，我将文档读入 Pandas：

# Pull straight from mongo to pandas
cursor = user_coll.find()
df = pandas.DataFrame(list(cursor))

Which works like magic:

这就像魔术一样：

Pandas is magic

Pandas是魔法

I'd like to be able to mangle the 'status' field Pandas style (directly accessing attributes). Is there a way?

我希望能够修改“状态”字段 Pandas 样式（直接访问属性）。有办法吗？

status field

状态字段

EDIT: Something like df['status:text']. Status has fields like 'text', 'created_at'. One option could be flattening/normalizing this json field like this pull requestWes McKinney was working on.

编辑：类似 df['status:text'] 的东西。状态具有诸如“text”、“created_at”之类的字段。一种选择可能是扁平化/规范化这个 json 字段，就像Wes McKinney 正在处理的这个拉取请求。

Answer 1

回答by Andy Hayden

One solution is just to smash it with the Series constructor:

一种解决方案是用 Series 构造函数粉碎它：

In [1]: df = pd.DataFrame([[1, {'a': 2}], [2, {'a': 1, 'b': 3}]])

In [2]: df
Out[2]: 
   0                   1
0  1           {u'a': 2}
1  2  {u'a': 1, u'b': 3}

In [3]: df[1].apply(pd.Series)
Out[3]: 
   a   b
0  2 NaN
1  1   3

In some cases you'll want to concatthis to the DataFrame in place of the dict row:

在某些情况下，您需要将其连接到 DataFrame 以代替 dict 行：

In [4]: dict_col = df.pop(1)  # here 1 is the column name

In [5]: pd.concat([df, dict_col.apply(pd.Series)], axis=1)
Out[5]: 
   0  a   b
0  1  2 NaN
1  2  1   3

If the it goes deeper, you can do this a few times...

如果它更深，你可以这样做几次......

如何访问 Pandas DataFrame 中嵌入的 json 对象？

提问by Kyle Kelley

回答by Andy Hayden

相关推荐

最近更新

标签

如何访问 Pandas DataFrame 中嵌入的 json 对象？

提问by Kyle Kelley

回答by Andy Hayden

相关推荐

将 fill_between() 与 Pandas 数据系列一起使用

pandas 熊猫平均函数的 NaN 结果

pandas 累积和重置为 NaN

Pandas HDFStore 从内存中卸载数据帧

相关推荐

最近更新

标签