Python 快速将 JSON 列转换为 Pandas 数据框

Question

提问by jodoox

I'm reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly inefficient and really takes forever when run against the whole db. Note that not all the items have the same attributes and that the JSON have some nested attributes.

我正在从数据库（50k+ 行）中读取数据，其中一列存储为 JSON。我想将其提取到熊猫数据框中。下面的代码片段工作正常，但效率相当低，并且在针对整个数据库运行时确实需要很长时间。请注意，并非所有项目都具有相同的属性，并且 JSON 具有一些嵌套属性。

How could I make this faster?

我怎么能让这个更快？

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

df.data.apply(json.loads) \
       .apply(pd.io.json.json_normalize)\
       .pipe(lambda x: pd.concat(x.values))
###this returns a dataframe where each JSON key is a column

Answer 1

回答by piRSquared

json_normalizetakes an already processed json string or a pandas series of such strings.

json_normalize接受一个已经处理过的 json 字符串或一个Pandas系列这样的字符串。

pd.io.json.json_normalize(df.data.apply(json.loads))

setup

设置

import pandas as pd
import json

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

Answer 2

回答by jezrael

I think you can first convert stringcolumn datato dict, then create listof numpy arraysby valuesand last DataFrame.from_records:

我觉得你可以先转换string柱data来dict，然后创建list的numpy arrays通过values和最后一个DataFrame.from_records：

df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
                 header=None, index_col=0, names=['data'])

a = df.data.apply(json.loads).values.tolist() 
print (pd.DataFrame.from_records(a))

Answer 3

回答by Madhur Yadav

data = { "events":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"Player is loading", "data":"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"Player is loaded", "data":"5" } ] }

数据 = { "事件":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"玩家正在加载", "data" :"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"播放器已加载", "data":"5 " } ] }

from pandas.io.json import json_normalize
result = json_normalize(data,'events')
print(result)

Python 快速将 JSON 列转换为 Pandas 数据框

提问by jodoox

回答by piRSquared

回答by jezrael

回答by Madhur Yadav

相关推荐

最近更新

标签

Python 快速将 JSON 列转换为 Pandas 数据框

提问by jodoox

回答by piRSquared

回答by jezrael

回答by Madhur Yadav

相关推荐

Python Pygame：如何更改背景颜色

Python 使用不是符号张量 keras 的输入调用的层

如何通过我的 Discord bot 发送嵌入文件，w/python？

Python 如何安装 Geckodriver？

相关推荐

最近更新

标签