如何有效地从 Pandas 数据帧转移到 JSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19214588/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I efficiently move from a Pandas dataframe to JSON
提问by djq
I've started using pandasto do some aggregation by date. My goal is to count all of the instances of a measurement that occur on a particular day, and to then represent this in D3. To illustrate my workflow, I have a queryset (from Django) that looks like this:
我已经开始使用pandas按日期进行一些聚合。我的目标是计算特定日期发生的所有测量实例,然后在D3. 为了说明我的工作流程,我有一个查询集(来自Django),如下所示:
queryset = [{'created':"05-16-13", 'counter':1, 'id':13}, {'created':"05-16-13", 'counter':1, 'id':34}, {'created':"05-17-13", 'counter':1, 'id':12}, {'created':"05-16-13", 'counter':1, 'id':7}, {'created':"05-18-13", 'counter':1, 'id':6}]
I make a dataframe in pandasand aggregate the measure 'counter' by the day created:
我在pandas创建的那天创建了一个数据框并聚合了度量“计数器”:
import pandas as pd
queryset_df = pd.DataFrame.from_records(queryset).set_index('id')
aggregated_df = queryset_df.groupby('created').sum()
This gives me a dataframe like this:
这给了我一个这样的数据框:
counter
created
05-16-13 3
05-17-13 1
05-18-13 1
As I'm using D3I thought that a JSONobject would be the most useful. Using the Pandasto_json()function I convert my dataframe like this:
当我使用时,D3我认为一个JSON对象将是最有用的。使用该Pandasto_json()函数,我像这样转换我的数据框:
aggregated_df.to_json()
giving me the following JSONobject
给我以下JSON对象
{"counter":{"05-16-13":3,"05-17-13":1,"05-18-13":1}}
This is not exactly what I want, as I would like to be able to access both the date, and the measurement. Is there a way that I can export the data such that I end up with something like this?
这不是我想要的,因为我希望能够访问日期和测量值。有没有办法可以导出数据,这样我最终会得到这样的结果?
data = {"c1":{"date":"05-16-13", "counter":3},"c2":{"date":"05-17-13", "counter":1}, "c3":{"date":"05-18-13", "counter":1}}
I thought that if I could structure this differently on the Pythonside, it would reduce the amount of data formatting I would need to do on the JSside as I planned to load the data doing something like this:
我想,如果我可以在Python侧面以不同的方式构造它,它将减少我需要在JS侧面进行的数据格式化量,因为我计划加载数据,执行如下操作:
x.domain(d3.extent(data, function(d) { return d.date; }));
y.domain(d3.extent(data, function(d) { return d.counter; }));
I'm very open to suggestions of better workflows overall as this is something I will need to do frequently but am unsure of the best way of handling the connection between D3and pandas. (I have looked at several packages that combine both pythonand D3directly, but that is not something that I am looking for as they seem to focus on static chart generation and not making an svg)
我更好地工作流程的建议非常开放的整体,因为这是我需要做的频繁,但我不能确定的处理之间的连接的最佳方式D3和pandas。(我看着那个结合了几个包python和D3直接,但不是东西,我找的,因为他们似乎把重点放在静态图表生成,而不是做一个SVG)
回答by Boud
Transform your date index back into a simple data column with reset_index, and then generate your json object by using the orient='index'property:
使用 将日期索引转换回简单的数据列reset_index,然后使用以下orient='index'属性生成 json 对象:
In [11]: aggregated_df.reset_index().to_json(orient='index')
Out[11]: '{"0":{"created":"05-16-13","counter":3},"1":{"created":"05-17-13","counter":1},"2":{"created":"05-18-13","counter":1}}'

