pandas 熊猫读取 json 不适用于 MultiIndex

Question

提问by Olga Botvinnik

I'm trying to read in a dataframe created via df.to_json()via pd.read_jsonbut I'm getting a ValueError. I think it may have to do with the fact that the index is a MultiIndex but I'm not sure how to deal with that.

我正在尝试读取通过df.to_json()via创建的数据帧，pd.read_json但我得到了一个ValueError. 我认为这可能与索引是 MultiIndex 的事实有关，但我不确定如何处理。

The original dataframe of 55k rows is called psiand I created test.jsonvia:

调用了 55k 行的原始数据框psi，我test.json通过以下方式创建：

psi.head().to_json('test.json')

Hereis the output of print psi.head().to_string()if you want to use that.

print psi.head().to_string()如果你想使用它，这是输出。

When I do it on this small set of data (5 rows), I get a ValueError.

当我对这一小组数据（5 行）执行此操作时，我得到一个ValueError.

! wget --no-check-certificate https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json
import pandas as pd
pd.read_json('test.json')

Here's the full stack:

这是完整的堆栈：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-1de2f0e65268> in <module>()
      1 get_ipython().system(u' wget https://gist.githubusercontent.com/olgabot/9897953/raw/c270d8cf1b736676783cc1372b4f8106810a14c5/test.json'>)
      2 import pandas as pd
----> 3 pd.read_json('test.json')

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit)
    196         obj = FrameParser(json, orient, dtype, convert_axes, convert_dates,
    197                           keep_default_dates, numpy, precise_float,
--> 198                           date_unit).parse()
    199 
    200     if typ == 'series' or obj is None:

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in parse(self)
    264 
    265         else:
--> 266             self._parse_no_numpy()
    267 
    268         if self.obj is None:

/home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.pyc in _parse_no_numpy(self)
    481         if orient == "columns":
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":
    485             decoded = dict((str(k), v)

ValueError: No ':' found when decoding object value

> /home/obot/virtualenvs/envy/lib/python2.7/site-packages/pandas/io/json.py(483)_parse_no_numpy()
    482             self.obj = DataFrame(
--> 483                 loads(json, precise_float=self.precise_float), dtype=None)
    484         elif orient == "split":

But when I do it on the whole dataframe (55k rows) then I get an invalid pointer errorand the IPython kernel dies. Any ideas?

但是当我在整个数据帧（55k 行）上执行此操作时，我会收到无效指针错误并且 IPython 内核死机。有任何想法吗？

EDIT: added how the json was generated in the first place.

编辑：首先添加了 json 的生成方式。

Answer 1

回答by Jeff

This is not implemented ATM, see the issue here: https://github.com/pydata/pandas/issues/4889.

这不是 ATM 实现的，请参阅此处的问题：https: //github.com/pydata/pandas/issues/4889。

You can simply reset the index first, e.g

您可以简单地先重置索引，例如

df.reset_index().to_json(...)

and it will work.

它会起作用。

Answer 2

回答by as - if

Or you can just write json with orient = 'table'

或者你可以只用 orient = 'table' 编写 json

df.to_json(path_or_buf='test.json', orient='table')

read multi_index json

读取 multi_index json

pd.read_json('test.json', orient='table')

Answer 3

回答by Константин Гудков

if you want to return MultiIndex structure:

如果要返回 MultiIndex 结构：

 # save MultiIndex indexes names 
 indexes_names = df.index.names

 df.reset_index().to_json('dump.json')

 # return back MultiIndex structure:
 loaded_df = pd.read_json('dump.json').set_index(indexes_names)

pandas 熊猫读取 json 不适用于 MultiIndex

提问by Olga Botvinnik

回答by Jeff

回答by as - if

回答by Константин Гудков

相关推荐

最近更新

标签

pandas 熊猫读取 json 不适用于 MultiIndex

提问by Olga Botvinnik

回答by Jeff

回答by as - if

回答by Константин Гудков

相关推荐

在 Pandas DataFrame 中外推值

pandas 过滤数据框的熊猫直方图

pandas 在熊猫中标记变量？

pandas python pandas中两个datetime.time列之间的微秒差异？

相关推荐

最近更新

标签