将 Json 文件转换为 Pandas 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50221204/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:32:32  来源:igfitidea点击:

Converting Json file to Pandas dataframe

pythonpandas

提问by Baktaawar

I have a json file which I converted to dict like below:

我有一个 json 文件,我将其转换为 dict,如下所示:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}

I wrote the following code to convert a json file to above dict format

我写了以下代码将json文件转换为dict格式

with open('data.json') as data_file: 
    data = json.load(data_file)

Now I am trying to store this dict as pandas data frame with keys as column headers.

现在我试图将这个 dict 存储为 Pandas 数据框,并将键作为列标题。

So I write below:

所以我写在下面:

df=pd.DataFrame.from_dict(data,orient='columns')

But I get all columns as one column.

但是我将所有列都作为一列。

df.head(3)

    DATA
0   {'COMPANY_SCHEMA': 'ABC.', 'CON...
1   {'COMPANY_SCHEMA': 'ABC', 'CON...
2   {'COMPANY_SCHEMA': 'ABC', 'CON...

I basically have a bunch of such json files in a folder and I am trying to read all of them and store in one pandas data frame appended one below the other.

我基本上在一个文件夹中有一堆这样的 json 文件,我试图读取所有这些文件并将它们存储在一个 Pandas 数据框中,一个附加在另一个下面。

So I was trying above. So

所以我在上面尝试。所以

1) why the above error when converting to pandas data frame and

1)为什么在转换为pandas数据框时出现上述错误和

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one?

ii)有没有更好更快的方法来读取一堆这样的文件并附加到一个json然后将它添加到pandas框架或一个一个?

回答by Martin Bobak

Not sure about why you are getting the error you show, but I would skip converting the json to a dictionary and just use pd.read_json()instead.

不确定为什么会出现您显示的错误,但我会跳过将 json 转换为字典而直接使用pd.read_json()

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

回答by Peque

The data you provide is broken, so it is hard to reproduce. Try to provide a reproducible case when asking! ;-)

您提供的数据已损坏,因此很难重现。在询问时尝试提供可重现的案例!;-)

Anyway I guess you just need to:

无论如何,我想你只需要:

df = pandas.DataFrame(data['DATA'])

Where datais the dictionary you created with json.load().

哪里data是你创建的字典json.load()

A pandas.DataFrame()can be initialized with a list of dictionaries with no problem, but you need to pass the list of dictionaries.

Apandas.DataFrame()可以用字典列表初始化,没有问题,但您需要传递字典列表。

If you are concerned about performance then yeah, append to your list of dictionaries first and convert the whole list to a DataFrame with pandas.DataFrame(list_of_dictionaries).

如果您担心性能,那么是的,请先附加到您的字典列表中,然后将整个列表转换为带有pandas.DataFrame(list_of_dictionaries).