pandas 熊猫读取 json 不适用于 MultiIndex
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22768682/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas read json not working on MultiIndex
提问by Olga Botvinnik
I'm trying to read in a dataframe created via df.to_json()via pd.read_jsonbut I'm getting a ValueError. I think it may have to do with the fact that the index is a MultiIndex but I'm not sure how to deal with that.
我正在尝试读取通过df.to_json()via创建的数据帧,pd.read_json但我得到了一个ValueError. 我认为这可能与索引是 MultiIndex 的事实有关,但我不确定如何处理。
The original dataframe of 55k rows is called psiand I created test.jsonvia:
调用了 55k 行的原始数据框psi,我test.json通过以下方式创建:
psi.head().to_json('test.json')
Hereis the output of print psi.head().to_string()if you want to use that.
print psi.head().to_string()如果你想使用它,这是输出。
When I do it on this small set of data (5 rows), I get a ValueError.
当我对这一小组数据(5 行)执行此操作时,我得到一个ValueError.
! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json
import pandas as pd
pd.read_json('test.json')
Here's the full stack:
这是完整的堆栈:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-1de2f0e65268> in <module>()
1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>)
2 import pandas as pd
----> 3 pd.read_json('test.json')
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
196 obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
197 keep_default_dates, numpy, precise_float,
--> 198 date_unit).parse()
199
200 if typ == 'series' or obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
264
265 else:
--> 266 self._parse_no_numpy()
267
268 if self.obj is None:
/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
481 if orient == "columns":
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
485 decoded = dict((str(k), v)
ValueError: No ':' found when decoding object value
> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy()
482 self.obj = DataFrame(
--> 483 loads(json, precise_float=self.precise_float), dtype=None)
484 elif orient == "split":
But when I do it on the whole dataframe (55k rows) then I get an invalid pointer errorand the IPython kernel dies. Any ideas?
但是当我在整个数据帧(55k 行)上执行此操作时,我会收到无效指针错误并且 IPython 内核死机。有任何想法吗?
EDIT: added how the json was generated in the first place.
编辑:首先添加了 json 的生成方式。
回答by Jeff
This is not implemented ATM, see the issue here: https://github.com/pydata/pandas/issues/4889.
这不是 ATM 实现的,请参阅此处的问题:https: //github.com/pydata/pandas/issues/4889。
You can simply reset the index first, e.g
您可以简单地先重置索引,例如
df.reset_index().to_json(...)
and it will work.
它会起作用。
回答by as - if
Or you can just write json with orient = 'table'
或者你可以只用 orient = 'table' 编写 json
df.to_json(path_or_buf='test.json', orient='table')
df.to_json(path_or_buf='test.json', orient='table')
read multi_index json
读取 multi_index json
pd.read_json('test.json', orient='table')
pd.read_json('test.json', orient='table')
回答by Константин Гудков
if you want to return MultiIndex structure:
如果要返回 MultiIndex 结构:
# save MultiIndex indexes names
indexes_names = df.index.names
df.reset_index().to_json('dump.json')
# return back MultiIndex structure:
loaded_df = pd.read_json('dump.json').set_index(indexes_names)

