Python 如何使用 Pandas 读取 json-dictionary 类型文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28373282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read a json-dictionary type file with pandas?
提问by skwoi
I have a long json like this: http://pastebin.com/gzhHEYGy
我有一个像这样的长 json:http: //pastebin.com/gzhHEYGy
I would like to place it into a pandas datframe in order to play with it, so by the documentation I do the following:
我想将它放入一个 Pandas 数据框以便使用它,因此通过文档我执行以下操作:
df = pd.read_json('/user/file.json')
print df
I got this traceback:
我得到了这个回溯:
File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module>
df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json')
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
date_unit).parse()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
self._parse_no_numpy()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict
dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr
index = extract_index(arrays)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
Then from a previous question I found that I need to do something like this:
然后从上一个问题我发现我需要做这样的事情:
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
But I dont get how should I obtain the contents like a numpy array. How can I preserve the length of the arrays in a big file like this?. Thanks in advance.
但我不明白我应该如何获取像 numpy 数组这样的内容。如何在这样的大文件中保留数组的长度?提前致谢。
采纳答案by knightofni
The json method doesnt work as the json file is not in the format it expects. As we can easily load a json as a dict, let's try this way :
json 方法不起作用,因为 json 文件不是它期望的格式。由于我们可以轻松地将 json 加载为 dict,让我们尝试这种方式:
import pandas as pd
import json
import os
os.chdir('/Users/nicolas/Downloads')
# Reading the json as a dict
with open('json_example.json') as json_data:
data = json.load(json_data)
# using the from_dict load function. Note that the 'orient' parameter
#is not using the default value (or it will give the same error that you got before)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')
output:
输出:
data columns
index
311210177061863424 [25-34\n, FEMALE, @bikewa absolutely the best.... age
310912785183813632 [25-34\n, FEMALE, Photo: I love the Burke-Gilm... gender
311290293871849472 [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h... text
309386414548717569 [25-34\n, FEMALE, Facebook Is Making The Most ... None
312327801187495936 [25-34\n, FEMALE, Still upset about this >&... None
312249421079400449 [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've... None
308692673194246145 [25-34\n, FEMALE, @Social_Freedom_ actually, t... None
308995226633129984 [25-34\n, FEMALE, @seattleweekly that's more t... None
308660851219501056 [25-34\n, FEMALE, @adamholdenbache I noticed 1... None
308658690528014337 [25-34\n, FEMALE, @CEM_Social I am waiting pat... None
309719798001070080 [25-34\n, FEMALE, Going to be watching Faceboo... None
312349448049152002 [25-34\n, FEMALE, @anikamarketer I applied for... None
312325152698404864 [25-34\n, FEMALE, @_chrisrojas_ wow, that's so... None
310546490844135425 [25-34\n, FEMALE, Photo: Feeling like a bit of... None
回答by Vaid?tas Iv??ka
the pandas module and not the json module should be the answer: pandas itself has read_json capabilities and the root of the problem must be that you did not read the json in the correct orientation. you must pass the exact orient parameter with which you created the json variable in the first place
答案应该是 pandas 模块而不是 json 模块:pandas 本身具有 read_json 功能,问题的根源一定是您没有以正确的方向读取 json。您必须首先传递用于创建 json 变量的确切 orient 参数
ex.:
前任。:
df_json = globals()['df'].to_json(orient='split')
and then:
进而:
read_to_json = pd.read_json(df_json, orient='split')