Python 快速将 JSON 列转换为 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41209764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:36:10  来源:igfitidea点击:

Fast convert JSON column into Pandas dataframe

pythonjsonpandas

提问by jodoox

I'm reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly inefficient and really takes forever when run against the whole db. Note that not all the items have the same attributes and that the JSON have some nested attributes.

我正在从数据库(50k+ 行)中读取数据,其中一列存储为 JSON。我想将其提取到熊猫数据框中。下面的代码片段工作正常,但效率相当低,并且在针对整个数据库运行时确实需要很长时间。请注意,并非所有项目都具有相同的属性,并且 JSON 具有一些嵌套属性。

How could I make this faster?

我怎么能让这个更快?

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

df.data.apply(json.loads) \
       .apply(pd.io.json.json_normalize)\
       .pipe(lambda x: pd.concat(x.values))
###this returns a dataframe where each JSON key is a column

回答by piRSquared

json_normalizetakes an already processed json string or a pandas series of such strings.

json_normalize接受一个已经处理过的 json 字符串或一个Pandas系列这样的字符串。

pd.io.json.json_normalize(df.data.apply(json.loads))


setup

设置

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

回答by jezrael

I think you can first convert stringcolumn datato dict, then create listof numpy arraysby valuesand last DataFrame.from_records:

我觉得你可以先转换stringdatadict,然后创建listnumpy arrays通过values和最后一个DataFrame.from_records

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

a = df.data.apply(json.loads).values.tolist() 
print (pd.DataFrame.from_records(a))

回答by Madhur Yadav

data = { "events":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"Player is loading", "data":"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"Player is loaded", "data":"5" } ] }

数据 = { "事件":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"玩家正在加载", "data" :"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"播放器已加载", "data":"5 " } ] }

from pandas.io.json import json_normalize
result = json_normalize(data,'events')
print(result)