Python pandas read_json：“如果使用所有标量值，则必须传递索引”

Question

提问by Marco Fumagalli

I have some difficulty in importing a JSON file with pandas.

我在使用 Pandas 导入 JSON 文件时遇到了一些困难。

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

This is the error that I get:

这是我得到的错误：

ValueError: If using all scalar values, you must pass an index

The file structure is simplified like this:

文件结构简化如下：

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

It is from the machine learning course of University of Washington on Coursera. You can find the file here.

它来自Coursera上华盛顿大学的机器学习课程。您可以在此处找到该文件。

Answer 1

回答by ayhan

Try

尝试

ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')

That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').

该文件仅包含值是标量的键值对。您可以使用ser.to_frame('count').

You can also do something like this:

你也可以做这样的事情：

import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
    data = json.load(f)

Now data is a dictionary. You can pass it to a dataframe constructor like this:

现在数据是一本字典。您可以将其传递给数据帧构造函数，如下所示：

df = pd.DataFrame({'count': data})

Answer 2

回答by Adonis H.

You can do as @ayhan mention which will give you a column base format

你可以像@ayhan 提到的那样做，这会给你一个列基本格式

Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.

或者，您可以将对象括在 [ ] ( source) 中，如下所示，为您提供一种行格式，如果您正在加载多个值并计划为您的机器学习模型使用矩阵，这种格式会很方便。

df = pd.DataFrame([data])

Answer 3

回答by Anant Gupta

I think what is happening is that the data in

我认为正在发生的事情是数据

map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

is being read as a string instead of a json

被读取为字符串而不是 json

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

is actually

实际上是

'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'

Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing

由于字符串是标量，它希望您将其作为 json 加载，您必须将其转换为 dict，这正是其他响应正在执行的操作

The best way is to do a json loads on the string to convert it to a dict and load it into pandas

最好的方法是在字符串上加载 json 以将其转换为 dict 并将其加载到 Pandas

myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)

Python pandas read_json：“如果使用所有标量值，则必须传递索引”

提问by Marco Fumagalli

回答by ayhan

回答by Adonis H.

回答by Anant Gupta

相关推荐

最近更新

标签

Python pandas read_json：“如果使用所有标量值，则必须传递索引”

提问by Marco Fumagalli

回答by ayhan

回答by Adonis H.

回答by Anant Gupta

相关推荐

Python matplotlib/seaborn：第一行和最后一行切成热图的一半

Windows上的“溢出错误：Python int太大而无法转换为C long”，但不是mac

Python 在 Pytorch 中“unsqueeze”有什么作用？

Python 如何缓存 Django Rest Framework API 调用？

相关推荐

最近更新

标签