将 Pandas DataFrame 转换为 JSON 作为更大数据结构的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26244323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:33:10  来源:igfitidea点击:

Convert Pandas DataFrame to JSON as element of larger data structure

pythonjsonpandas

提问by smontanaro

I've been working with pandas DataFrame objects in my server, converting them to CSV for transmission to the browser, where the tabular values are plotted using d3. While CSV is file as far as it goes, I really need more than just a 2D table of data. If nothing else, I'd like to return some metadata about the data.

我一直在我的服务器中使用 Pandas DataFrame 对象,将它们转换为 CSV 以传输到浏览器,在浏览器中使用 d3 绘制表格值。虽然 CSV 是文件,但我真的需要的不仅仅是一个二维数据表。如果不出意外,我想返回一些有关数据的元数据。

So I started messing around with JSON thinking I would be able to construct a dictionary with some meta information and my DataFrame. For example, just as an absurdly simple example:

所以我开始搞乱 JSON,认为我可以用一些元信息和我的 DataFrame 构建一个字典。例如,就像一个荒谬的简单例子:

>>> z = numpy.zeros(10)
>>> df = pandas.DataFrame(z)
>>> df
   0
0  0
1  0
2  0
3  0
4  0
5  0
6  0
7  0
8  0
9  0
>>> result = {
...   "name": "Simple Example",
...   "data": df,
... }

Not surprisingly, that can't be directly serialized using the json module. I found the jsonext module and tried it. It "works", but produces incomplete results:

毫不奇怪,不能使用 json 模块直接序列化。我找到了 jsonext 模块并尝试了它。它“有效”,但会产生不完整的结果:

>>> jsonext.dumps(result)
'{"data": ["0"], "name": "Simple Example"}'

Looking at the methods DataFrame itself provides for this sort of thing, I found to_dict() and to_json(). The former produces dictionaries of dictionaries:

查看 DataFrame 本身为这类事情提供的方法,我发现了 to_dict() 和 to_json()。前者产生字典的字典:

>>> df.to_dict()
{0: {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0}}

but as you can see, they can't be serialized to JSON, since the keys are not strings.

但是正如您所看到的,它们不能序列化为 JSON,因为键不是字符串。

df.to_json() looked like it might work, though I would then wind up with a JSON string embedded in aanother JSON string. Something like this:

df.to_json() 看起来可能有用,但我最终会在另一个 JSON 字符串中嵌入一个 JSON 字符串。像这样的东西:

json.dumps({"name": "Simple Example", "data": df.to_json()}) '{"data": "{\"0\":{\"0\":0.0,\"1\":0.0,\"2\":0.0,\"3\":0.0,\"4\":0.0,\"5\":0.0,\"6\":0.0,\"7\":0.0,\"8\":0.0,\"9\":0.0}}", "name": "Simple Example"}'

json.dumps({"name": "Simple Example", "data": df.to_json()}) '{"data": "{\"0\":{\"0\":0.0,\" 1\":0.0,\"2\":0.0,\"3\":0.0,\"4\":0.0,\"5\":0.0,\"6\":0.0,\"7\ ":0.0,\"8\":0.0,\"9\":0.0}}", "name": "简单示例"}'

In other words, a bit of a mess.

换句话说,有点乱。

Any suggestions about how to handle this sort of nested structure where some of the elements can't be directly serialized? I think I might be able to get jsonext to work, but its Dict mixin expects to find a proper (in its mind) to_dict() method. DataFrame.to_dict() doesn't seem to return the right thing. (Though I will continue to horse around with it.)

关于如何处理某些元素无法直接序列化的这种嵌套结构的任何建议?我想我可能能让 jsonext 工作,但它的 Dict mixin 期望找到一个合适的(在它的脑海中) to_dict() 方法。DataFrame.to_dict() 似乎没有返回正确的东西。(虽然我会继续骑它。)

I figured this must be a cat which has already been skinned. I just haven't found it. I'd be happy for now with nothing more hierarchical than something like my example (though with more key/value pairs), though I won't turn my nose up at a more general solution.

我想这一定是一只已经剥了皮的猫。我只是没有找到。我现在很高兴没有比我的例子更分层的东西(尽管有更多的键/值对),尽管我不会对更通用的解决方案嗤之以鼻。

回答by unutbu

The defaultfunction(supplied to json.dumps) gets called for all objects that can't be serialized by default. It can return any object that the default encoder can serialize, such as a dict.

对于defaultjson.dumps默认情况下无法序列化的所有对象,将调用函数(提供给)。它可以返回默认编码器可以序列化的任何对象,例如字典。

df.to_json()returns a string. json.loads(df.to_json)returns a dict with keys which are strings. So if we set default=lambda df: json.loads(df.to_json())then the DataFrame will get serialized as though it were a dict.

df.to_json()返回一个字符串。json.loads(df.to_json)返回一个带有字符串键的字典。因此,如果我们设置,default=lambda df: json.loads(df.to_json())那么 DataFrame 将被序列化,就好像它是一个 dict。

import json
import numpy as np
import pandas as pd

z = np.zeros(10)
df = pd.DataFrame(z)
result = {"name": "Simple Example",
          "data": df, }

jstr = json.dumps(result,
                   default=lambda df: json.loads(df.to_json()))
newresult = json.loads(jstr)
print(newresult)
# {u'data': {u'0': {u'0': 0.0,
#    u'1': 0.0,
#    u'2': 0.0,
#    u'3': 0.0,
#    u'4': 0.0,
#    u'5': 0.0,
#    u'6': 0.0,
#    u'7': 0.0,
#    u'8': 0.0,
#    u'9': 0.0}},
#  u'name': u'Simple Example'}


print(pd.DataFrame(newresult['data']))

yields

产量

   0
0  0
1  0
2  0
3  0
4  0
5  0
6  0
7  0
8  0
9  0

回答by smontanaro

I think a bit more reading on the jsonext docs was warranted. It looks like I can create my own mixin that knows how to properly encode my DataFrame objects, then call jsonext.dumps(result). I was seduced by the existing to_dict() and to_json() methods of DataFrame objects, which don't really solve the problem.

我认为有必要对 jsonext 文档进行更多阅读。看起来我可以创建自己的 mixin,它知道如何正确编码我的 DataFrame 对象,然后调用 jsonext.dumps(result)。我被 DataFrame 对象的现有 to_dict() 和 to_json() 方法所吸引,它们并没有真正解决问题。

回答by chrisb

One way would be to convert your index/columns to strings, like this:

一种方法是将您的索引/列转换为字符串,如下所示:

In [355]: df.index = df.index.astype(str)
In [356]: df.columns = df.columns.astype(str)

Then you could build the dict and pass to json.dump:

然后你可以构建字典并传递给json.dump

In [357]: result = {
     ...: ...   "name": "Simple Example",
     ...: ...   "data": df.to_dict(),
     ...: ... }

In [359]: print json.dumps(result, indent=4)
{
    "data": {
        "0": {
            "1": 0.0, 
            "0": 0.0, 
            "3": 0.0, 
            "2": 0.0, 
            "5": 0.0, 
            "4": 0.0, 
            "7": 0.0, 
            "6": 0.0, 
            "9": 0.0, 
            "8": 0.0
        }
    }, 
    "name": "Simple Example"
}