使用 Pandas 读取 JSON 时出现“预期字符串或 Unicode”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24848416/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:16:30  来源:igfitidea点击:

'Expected String or Unicode' when reading JSON with Pandas

pythonjsonpandasopenstreetmapoverpass-api

提问by Balzer82

I try to read an Openstreetmaps APIoutput JSONstring, which is valid.

我尝试读取有效的Openstreetmaps API输出JSON字符串。

I am using following code:

我正在使用以下代码:

import pandas as pd
import requests

# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()

osmdataframe = pd.read_json(osmdata)

which throws following error:

抛出以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-66-304b7fbfb645> in <module>()
----> 1 osmdataframe = pd.read_json(osmdata)

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()
    199 
    200     if typ == 'series' or obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
    264 
    265         else:
--> 266             self._parse_no_numpy()
    267 
    268         if self.obj is None:

/Users/paul/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
    481         if orient == "columns":
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":
    485             decoded = dict((str(k), v)

TypeError: Expected String or Unicode

How to modify the request or Pandas read_json, to avoid an error? By the way, what's the problem?

如何修改 request 或 Pandas read_json,避免出错?顺便问一下,有什么问题?

回答by unutbu

If you print the json string to a file,

如果将 json 字符串打印到文件中,

content = osm.read()
with open('/tmp/out', 'w') as f:
    f.write(content)

you'll see something like this:

你会看到这样的:

{
  "version": 0.6,
  "generator": "Overpass API",
  "osm3s": {
    "timestamp_osm_base": "2014-07-20T07:52:02Z",
    "copyright": "The data included in this document is from www.openstreetmap.org. The data is made available under ODbL."
  },
  "elements": [

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederh?slich Bergmannsweg"
  }
},
...]}

If the JSON string were to be converted to a Python object, it would be a dict whose elementskey is a list of dicts. The vast majority of the data is inside this list of dicts.

如果将 JSON 字符串转换为 Python 对象,它将是一个 dict,其elements键是一个dict列表。绝大多数数据都在这个字典列表中。

This JSON string is not directly convertible to a Pandas object. What would be the index, and what would be the columns? Surely you don't want [u'elements', u'version', u'osm3s', u'generator']to be the columns, since almost all the information is in the elementslist-of-dicts.

此 JSON 字符串不能直接转换为 Pandas 对象。什么是索引,什么是列?您肯定不想[u'elements', u'version', u'osm3s', u'generator']成为列,因为几乎所有信息都在elements字典列表中。

But if you want the DataFrame to consist of the data only in the elementslist-of-dicts, then you'd have to specify that, since Pandas can't make that assumption for you.

但是,如果您希望 DataFrame 仅包含elements字典列表中的数据,那么您必须指定它,因为 Pandas 无法为您做出这种假设。

Further complicating things is that each dict in elementsis a nested dict. Consider the first dict in elements:

更复杂的是,每个 dictelements都是一个嵌套的 dict。考虑第一个字典elements

{
  "type": "node",
  "id": 536694,
  "lat": 50.9849256,
  "lon": 13.6821776,
  "tags": {
    "highway": "bus_stop",
    "name": "Niederh?slich Bergmannsweg"
  }
}

Should ['lat', 'lon', 'type', 'id', 'tags']be the columns? That seems plausible, except that the tagscolumn would end up being a column of dicts. That's usually not very useful. It would be nicer perhaps if the keys inside the tagsdict were made into columns. We can do that, but again we have to code it ourselves since Pandas has no way of knowing that's what we want.

应该['lat', 'lon', 'type', 'id', 'tags']是柱子?这似乎是合理的,只是该tags列最终会成为一列 dicts。这通常不是很有用。如果将tagsdict 中的键制成列,也许会更好。我们可以这样做,但我们又必须自己编写代码,因为 Pandas 无法知道这就是我们想要的。



import pandas as pd
import requests
# Links unten
minLat = 50.9549
minLon = 13.55232

# Rechts oben
maxLat = 51.1390
maxLon = 13.89873

osmrequest = {'data': '[out:json][timeout:25];(node["highway"="bus_stop"](%s,%s,%s,%s););out body;>;out skel qt;' % (minLat, minLon, maxLat, maxLon)}
osmurl = 'http://overpass-api.de/api/interpreter'
osm = requests.get(osmurl, params=osmrequest)

osmdata = osm.json()
osmdata = osmdata['elements']
for dct in osmdata:
    for key, val in dct['tags'].iteritems():
        dct[key] = val
    del dct['tags']

osmdataframe = pd.DataFrame(osmdata)
print(osmdataframe[['lat', 'lon', 'name']].head())

yields

产量

         lat        lon                        name
0  50.984926  13.682178  Niederh?slich Bergmannsweg
1  51.123623  13.782789                Sagarder Weg
2  51.065752  13.895734     Wei?ig, Einkaufszentrum
3  51.007140  13.698498          Stuttgarter Stra?e
4  51.010199  13.701411          Heilbronner Stra?e