Pandas json_normalize 会产生令人困惑的“KeyError”消息?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32291437/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:50:00  来源:igfitidea点击:

Pandas json_normalize produces confusing `KeyError` message?

pythonjsondictionarypandas

提问by themachinist

I'm trying to convert a nested JSON to a Pandas dataframe. I've been using json_normalizewith success until I came across a certain JSON. I've made a smaller version of it to recreate the problem.

我正在尝试将嵌套的 JSON 转换为 Pandas 数据框。我一直在json_normalize成功使用,直到遇到某个 JSON。我制作了一个较小的版本来重现这个问题。

from pandas.io.json import json_normalize

json=[{"events": [{"schedule": {"date": "2015-08-27",
     "location": {"building": "BDC", "floor": 5},
     "ID": 815},
    "group": "A"},
   {"schedule": {"date": "2015-08-27",
     "location": {"building": "BDC", "floor": 5},
 "ID": 816},
"group": "A"}]}]

I then run:

然后我运行:

json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']])

Expecting to see something like this:

期待看到这样的事情:

ID      group   schedule.date   schedule.location.building schedule.location.floor  
'815'   'A'     '2015-08-27'            'BDC'                       5
'816'   'A'     '2015-08-27'            'BDC'                       5

But instead I get this error:

但是我收到了这个错误:

In [2]: json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-b588a9e3ef1d> in <module>()
----> 1 json_normalize(json[0],'events',[['schedule','date'],['schedule','location','building'],['schedule','location','floor']])

/Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in json_normalize(data, record_path, meta, meta_prefix, record_prefix)
    739                 records.extend(recs)
    740
--> 741     _recursive_extract(data, record_path, {}, level=0)
    742
    743     result = DataFrame(records)

/Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in _recursive_extract(data, path, seen_meta, level)
    734                         meta_val = seen_meta[key]
    735                     else:
--> 736                         meta_val = _pull_field(obj, val[level:])
    737                     meta_vals[key].append(meta_val)
    738

/Users/logan/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/io/json.pyc in _pull_field(js, spec)
    674         if isinstance(spec, list):
    675             for field in spec:
--> 676                 result = result[field]
    677         else:
    678             result = result[spec]

KeyError: 'schedule'

采纳答案by chrisb

In this case, I think you'd just use this:

在这种情况下,我认为你只需要使用这个:

In [57]: json_normalize(data[0]['events'])
Out[57]: 
  group  schedule.ID schedule.date schedule.location.building  \
0     A          815    2015-08-27                        BDC   
1     A          816    2015-08-27                        BDC   

   schedule.location.floor  
0                        5  
1                        5  

The metapaths ([['schedule','date']...]) are for specifying data at the same level of nesting as your records, i.e. at the same level as 'events'. It doesn't look like json_normalizehandles dicts with nested lists particularly well, so you may need to do some manual reshaping if your actual data is much more complicated.

meta路径([['schedule','date']...])是在同一级别为“事件”嵌套为您的记录,即同级别指定数据。它看起来不像json_normalize处理嵌套列表的字典特别好,所以如果你的实际数据要复杂得多,你可能需要做一些手动整形。

回答by Sandesh

I got the KeyError when the structue of the json was not consistent. Meaning, when one of the nested strucutes were missing from the json, I got KeyError.

当 json 的结构不一致时,我得到了 KeyError。意思是,当 json 中缺少嵌套结构之一时,我得到了 KeyError。

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html

From the examples mentioned on the pandas documentation site, if you make the nested tag (counties) missing on one of the records, you will get a KeyError. To circumvent this, you might have to make sure ignore the missing tag or consider only the records which have nested column/tag populated with data.

从 pandas 文档站点上提到的示例中,如果您在其中一条记录中缺少嵌套标记(县),您将收到 KeyError。为了避免这种情况,您可能必须确保忽略丢失的标签或仅考虑使用数据填充的嵌套列/标签的记录。

回答by Jim Arnold

I had this same problem! This thread helped, especially parachute py's answer.

我有同样的问题!该线程有所帮助,尤其是降落伞 py 的回答。

I found a solution using:

我找到了一个解决方案:

df.dropna(subset = *column(s) with nested data*)

then saving the resultant dfas a new json. Load the new json and now you'll be able to flatten the nested columns.

然后将结果保存df为新的 json。加载新的 json,现在您将能够展平嵌套的列。

There's probably a more efficient way to get around this, but my solution works.

可能有更有效的方法来解决这个问题,但我的解决方案有效。

edit: forgot to mention, I tried using the *errors = 'ignore'*arg in json.normalize()and it didn't help.

编辑:忘了提及,我尝试使用*errors = 'ignore'*argjson.normalize()并没有帮助。