如何访问 Pandas DataFrame 中嵌入的 json 对象?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/18665284/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I access embedded json objects in a Pandas DataFrame?
提问by Kyle Kelley
TL;DR If loaded fields in a Pandas DataFrame contain JSON documents themselves, how can they be worked with in a Pandas like fashion?
TL;DR 如果 Pandas DataFrame 中加载的字段本身包含 JSON 文档,那么如何以类似 Pandas 的方式使用它们?
Currently I'm directly dumping json/dictionary results from a Twitter library (twython) into a Mongo collection (called users here).
目前,我直接将 Twitter 库 ( twython) 中的json/dictionary 结果转储到 Mongo 集合中(此处称为用户)。
from twython import Twython
from pymongo import MongoClient
tw = Twython(...<auth>...)
# Using mongo as object storage 
client = MongoClient()
db = client.twitter
user_coll = db.users
user_batch = ... # collection of user ids
user_dict_batch = tw.lookup_user(user_id=user_batch)
for user_dict in user_dict_batch:
    if(user_coll.find_one({"id":user_dict['id']}) == None):
        user_coll.insert(user_dict)
After populating this database I read the documents into Pandas:
填充此数据库后,我将文档读入 Pandas:
# Pull straight from mongo to pandas
cursor = user_coll.find()
df = pandas.DataFrame(list(cursor))
Which works like magic:
这就像魔术一样:


I'd like to be able to mangle the 'status' field Pandas style (directly accessing attributes). Is there a way?
我希望能够修改“状态”字段 Pandas 样式(直接访问属性)。有办法吗?


EDIT: Something like df['status:text']. Status has fields like 'text', 'created_at'. One option could be flattening/normalizing this json field like this pull requestWes McKinney was working on.
编辑:类似 df['status:text'] 的东西。状态具有诸如“text”、“created_at”之类的字段。一种选择可能是扁平化/规范化这个 json 字段,就像Wes McKinney 正在处理的这个拉取请求。
回答by Andy Hayden
One solution is just to smash it with the Series constructor:
一种解决方案是用 Series 构造函数粉碎它:
In [1]: df = pd.DataFrame([[1, {'a': 2}], [2, {'a': 1, 'b': 3}]])
In [2]: df
Out[2]: 
   0                   1
0  1           {u'a': 2}
1  2  {u'a': 1, u'b': 3}
In [3]: df[1].apply(pd.Series)
Out[3]: 
   a   b
0  2 NaN
1  1   3
In some cases you'll want to concatthis to the DataFrame in place of the dict row:
在某些情况下,您需要将其连接到 DataFrame 以代替 dict 行:
In [4]: dict_col = df.pop(1)  # here 1 is the column name
In [5]: pd.concat([df, dict_col.apply(pd.Series)], axis=1)
Out[5]: 
   0  a   b
0  1  2 NaN
1  2  1   3
If the it goes deeper, you can do this a few times...
如果它更深,你可以这样做几次......

