Python 将 json 文件读取为 Pandas 数据框？

Question

提问by Alberto Alvarez

I am using python 3.6 and trying to download json file (350 MB) as pandas dataframe using the code below. However, I get the following error:

我正在使用 python 3.6 并尝试使用下面的代码下载 json 文件（350 MB）作为 Pandas 数据帧。但是，我收到以下错误：

data_json_str = "[" + ",".join(data) + "]
"TypeError: sequence item 0: expected str instance, bytes found

data_json_str = "[" + ",".join(data) + "]
"TypeError: sequence item 0: expected str instance, bytes found

How can I fix the error?

我该如何修复错误？

import pandas as pd

# read the entire file into a python array
with open('C:/Users/Alberto/nutrients.json', 'rb') as f:
   data = f.readlines()

# remove the trailing "\n" from each line
data = map(lambda x: x.rstrip(), data)

# each element of 'data' is an individual JSON object.
# i want to convert it into an *array* of JSON objects
# which, in and of itself, is one large JSON object
# basically... add square brackets to the beginning
# and end, and have all the individual business JSON objects
# separated by a comma
data_json_str = "[" + ",".join(data) + "]"

# now, load it into pandas
data_df = pd.read_json(data_json_str)

Answer 1

采纳答案by Stephen Rauch

If you open the file as binary ('rb'), you will get bytes. How about:

如果您以二进制 ( 'rb')格式打开文件，您将获得字节。怎么样：

with open('C:/Users/Alberto/nutrients.json', 'rU') as f:

Answer 2

回答by cs95

From your code, it looks like you're loading a JSON file which has JSON data on each separate line. read_jsonsupports a linesargument for data like this:

从您的代码来看，您似乎正在加载一个 JSON 文件，该文件的每一行都包含 JSON 数据。read_json支持lines这样的数据参数：

data_df = pd.read_json('C:/Users/Alberto/nutrients.json', lines=True)

Note
Remove lines=Trueif you have a single JSON object instead of individual JSON objects on each line.

注意如果每行只有一个 JSON 对象而不是单独的 JSON 对象，请
删除lines=True。

Answer 3

回答by James Doepp - pihentagyu

Using the json module you can parse the json into a python object, then create a dataframe from that:

使用 json 模块，您可以将 json 解析为 python 对象，然后从中创建一个数据帧：

import json
import pandas as pd
with open('C:/Users/Alberto/nutrients.json', 'r') as f:
    data = json.load(f)
df = pd.DataFrame(data)

Answer 4

回答by A.Emad

if you want to convert it into an arrayof JSON objects, I think this one will do what you want

如果你想把它转换成一组JSON 对象，我认为这个会做你想做的

import json
data = []
with open('nutrients.json', errors='ignore') as f:
    for line in f:
        data.append(json.loads(line))
print(data[0])

Answer 5

回答by Amir Md Amiruzzaman

Please the code below

请在下面的代码

#call the pandas library
import pandas as pd
#set the file location as URL or filepath of the json file
url = 'https://www.something.com/data.json'
#load the json data from the file to a pandas dataframe
df = pd.read_json(url, orient='columns')
#display the top 10 rows from the dataframe (this is to test only)
df.head(10)

Please review the code and modify based on your need. I have added comments to explain each line of code. Hope this helps!

请查看代码并根据您的需要进行修改。我添加了注释来解释每一行代码。希望这可以帮助！

Python 将 json 文件读取为 Pandas 数据框？

提问by Alberto Alvarez

采纳答案by Stephen Rauch

回答by cs95

回答by James Doepp - pihentagyu

回答by A.Emad

回答by Amir Md Amiruzzaman

相关推荐

最近更新

标签

Python 将 json 文件读取为 Pandas 数据框？

提问by Alberto Alvarez

采纳答案by Stephen Rauch

回答by cs95

回答by James Doepp - pihentagyu

回答by A.Emad

回答by Amir Md Amiruzzaman

相关推荐

Python Pyspark 替换 Spark 数据框列中的字符串

Python PyTorch 内存模型：“torch.from_numpy()”与“torch.Tensor()”

AttributeError: 'module' 对象没有属性 'xfeatures2d' [Python/OpenCV 2.4]

Python Keras AttributeError: 'list' 对象没有属性 'ndim'

相关推荐

最近更新

标签