pandas 如何用熊猫读取json文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39040250/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:51:00  来源:igfitidea点击:

how to read json file with pandas?

pythonjsonlistpandasscrapy

提问by Tony Wang

I have scraped a website with scrapy and stored the data in a json file.
Link to the json file: https://drive.google.com/file/d/0B6JCr_BzSFMHLURsTGdORmlPX0E/view?usp=sharing

我用scrapy抓取了一个网站并将数据存储在一个json文件中。
json文件链接:https: //drive.google.com/file/d/0B6JCr_BzSFMHLURsTGdORmlPX0E/view?usp =sharing

But the json isn't standard json and gives errors:

但是 json 不是标准的 json 并给出错误:

>>> import json
>>> with open("/root/code/itjuzi/itjuzi/investorinfo.json") as file:
...     data = json.load(file)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/root/anaconda2/lib/python2.7/json/__init__.py", line 291, in load
**kw)
  File "/root/anaconda2/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/root/anaconda2/lib/python2.7/json/decoder.py", line 367, in decode
    raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 3 column 2 - line 3697 column 2 (char 45 - 3661517)

Then I tried this:

然后我尝试了这个:

with open('/root/code/itjuzi/itjuzi/investorinfo.json','rb') as f:
     data = f.readlines()
data = map(lambda x: x.decode('unicode_escape'), data)
>>> df = pd.DataFrame(data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'pd' is not defined
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> print pd
<module 'pandas' from '/root/anaconda2/lib/python2.7/site-packages/pandas/__init__.pyc'>
>>> print df
[3697 rows x 1 columns]

Why does this only return 1 column?

为什么这只返回 1 列?

How can I standardize the json file and read it with pandas correctly?

如何标准化 json 文件并正确使用 Pandas 读取它?

回答by SerialDev

try this:

尝试这个:

import json
with open('data.json') as data_file:    
data = json.load(data_file)

This has the advantage of dealing well with large JSON files that do not fit in memory

这样做的好处是可以很好地处理不适合内存的大型 JSON 文件

EDIT: Your data is not valid JSON. Delete the following in the first 3 lines and it will validate:

编辑:您的数据不是有效的 JSON。删除前 3 行中的以下内容,它将验证:

[{
    "website": ["\u5341\u65b9\u521b\u6295"]
}]

EDIT2[Since you need to access nested values from json]:

EDIT2[因为您需要从 json 访问嵌套值]:

You can now also access single values like this:

您现在还可以像这样访问单个值:

data["one"][0]["id"]  # will return 'value'
data["two"]["id"]    # will return 'value'
data["three"]      # will return 'value'

回答by Luffy Cyliu

Try following codes: (you are missing one something)

尝试以下代码:(你错过了一个东西)

>>> import json
>>> with open("/root/code/itjuzi/itjuzi/investorinfo.json") as file:
 ...     data = json.load(file.read())