Python 快速将 JSON 列转换为 Pandas 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41209764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast convert JSON column into Pandas dataframe
提问by jodoox
I'm reading data from a database (50k+ rows) where one column is stored as JSON. I want to extract that into a pandas dataframe. The snippet below works fine but is fairly inefficient and really takes forever when run against the whole db. Note that not all the items have the same attributes and that the JSON have some nested attributes.
我正在从数据库(50k+ 行)中读取数据,其中一列存储为 JSON。我想将其提取到熊猫数据框中。下面的代码片段工作正常,但效率相当低,并且在针对整个数据库运行时确实需要很长时间。请注意,并非所有项目都具有相同的属性,并且 JSON 具有一些嵌套属性。
How could I make this faster?
我怎么能让这个更快?
import pandas as pd
import json
df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
header=None, index_col=0, names=['data'])
df.data.apply(json.loads) \
.apply(pd.io.json.json_normalize)\
.pipe(lambda x: pd.concat(x.values))
###this returns a dataframe where each JSON key is a column
回答by piRSquared
json_normalizetakes an already processed json string or a pandas series of such strings.
json_normalize接受一个已经处理过的 json 字符串或一个Pandas系列这样的字符串。
pd.io.json.json_normalize(df.data.apply(json.loads))
setup
设置
import pandas as pd
import json
df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
header=None, index_col=0, names=['data'])
回答by jezrael
I think you can first convert string
column data
to dict
, then create list
of numpy arrays
by values
and last DataFrame.from_records
:
我觉得你可以先转换string
柱data
来dict
,然后创建list
的numpy arrays
通过values
和最后一个DataFrame.from_records
:
df = pd.read_csv('http://pastebin.com/raw/7L86m9R2', \
header=None, index_col=0, names=['data'])
a = df.data.apply(json.loads).values.tolist()
print (pd.DataFrame.from_records(a))
回答by Madhur Yadav
data = { "events":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"Player is loading", "data":"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"Player is loaded", "data":"5" } ] }
数据 = { "事件":[
{
"timemillis":1563467463580, "date":"18.7.2019", "time":"18:31:03,580", "name":"玩家正在加载", "data" :"" }, {
"timemillis":1563467463668, "date":"18.7.2019", "time":"18:31:03,668", "name":"播放器已加载", "data":"5 " } ] }
from pandas.io.json import json_normalize
result = json_normalize(data,'events')
print(result)