Python 如何从一个文件中提取多个 JSON 对象？

Question

提问by user6396

I am very new to Json files. If I have a json file with multiple json objects such as following:

我对 Json 文件很陌生。如果我有一个包含多个 json 对象的 json 文件，例如：

{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
 "Code":[{"event1":"A","result":"1"},…]}
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
 "Code":[{"event1":"B","result":"1"},…]}
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
 "Code":[{"event1":"B","result":"0"},…]}
…

I want to extract all "Timestamp" and "Usefulness" into a data frames:

我想将所有“时间戳”和“有用性”提取到数据框中：

    Timestamp    Usefulness
 0   20140101      Yes
 1   20140102      No
 2   20140103      No
 …

Does anyone know a general way to deal with such problems?

有谁知道处理此类问题的一般方法？

Answer 1

采纳答案by danielfranca

Use a json array, in the format:

使用 json 数组，格式为：

[
{"ID":"12345","Timestamp":"20140101", "Usefulness":"Yes",
  "Code":[{"event1":"A","result":"1"},…]},
{"ID":"1A35B","Timestamp":"20140102", "Usefulness":"No",
  "Code":[{"event1":"B","result":"1"},…]},
{"ID":"AA356","Timestamp":"20140103", "Usefulness":"No",
  "Code":[{"event1":"B","result":"0"},…]},
...
]

Then import it into your python code

然后将它导入到你的python代码中

import json

with open('file.json') as json_file:

    data = json.load(json_file)

Now the content of data is an array with dictionaries representing each of the elements.

现在数据的内容是一个数组，其中包含代表每个元素的字典。

You can access it easily, i.e:

您可以轻松访问它，即：

data[0]["ID"]

Answer 2

回答by Dan Temkin

So, as was mentioned in a couple comments containing the data in an array is simpler but the solution does not scale well in terms of efficiency as the data set size increases. You really should only use an iterator when you want to access a random object in the array, otherwise, generators are the way to go. Below I have prototyped a reader function which reads each json object individually and returns a generator.

因此，正如在包含数组中的数据的几条评论中提到的那样更简单，但随着数据集大小的增加，解决方案在效率方面不能很好地扩展。当你想访问数组中的随机对象时，你真的应该只使用迭代器，否则，生成器是要走的路。下面我设计了一个 reader 函数的原型，它分别读取每个 json 对象并返回一个生成器。

The basic idea is to signal the reader to split on the carriage character "\n" (or "\r\n" for Windows). Python can do this with the file.readline() function.

基本思想是让读者在回车符“\n”（或 Windows 的“\r\n”）上拆分。Python 可以使用文件.readline() 函数来做到这一点。

import json
def json_readr(file):
    for line in open(file, mode="r"):
        yield json.loads(line)

However, this method only really works when the file is written as you have it -- with each object separated by a new line character. Below I wrote an example of a writer that separates an array of json objects and saves each one on a new line.

然而，这种方法只有在文件按您的方式写入时才真正有效——每个对象用换行符分隔。下面我写了一个 writer 的例子，它分离了一组 json 对象并将每个对象保存在一个新行上。

def json_writr(file, json_objects):
    f = open(file, mode="w")
    for jsonobj in json_objects:
        jsonstr = json.dumps(jsonobj)
        f.write(jsonstr+"\n")
    f.flush()
    f.close()

You could also do the same operation with file.writelines() and list comprehension

您也可以对文件.writelines() 和列表理解执行相同的操作

...
    jsobjs = [json.dumps(j)+"\n" for j in json_objects]
    f.writelines(jsobjs)
...

And if you wanted to append the data instead of writing a new file just change ' mode="w" ' to ' mode="a" '.

如果您想追加数据而不是写入新文件，只需将“ mode="w" ' 更改为 ' mode="a" '。

In the end I find this helps a great deal not only with readability when I try and open json files in text editor but also in terms of using memory more efficiently.

最后，我发现这不仅对我尝试在文本编辑器中打开 json 文件时的可读性有很大帮助，而且在更有效地使用内存方面也有很大帮助。

On that note if you change you mind at some point and you want a list out of the reader, Python allows you to put a generator function inside of a list and populate the list automatically. In other words, just write

在这一点上，如果您在某个时候改变主意并且想要从读者那里得到一个列表，Python 允许您将生成器函数放在列表中并自动填充列表。换句话说，只要写

lst = list(json_readr(file))

Hope this helps. Sorry if it was a bit verbose.

希望这可以帮助。对不起，如果它有点冗长。

Answer 3

回答by Dunes

You can use json.JSONDecoder.raw_decodeto decode arbitarily big strings of "stacked" JSON (so long as they can fit in memory). raw_decodestops once it has a valid object and returns the last position where wasn't part of the parsed object. It's not documented, but you can pass this position back to raw_decodeand it start parsing again from that position. Unfortunately, the Python jsonmodule doesn't accept strings that have prefixing whitespace. So we need to search to find the first none-whitespace part of your document.

您可以使用json.JSONDecoder.raw_decode任意大的“堆叠”JSON 字符串进行解码（只要它们可以放入内存中）。raw_decode一旦它有一个有效的对象就停止并返回不属于解析对象的最后一个位置。它没有记录在案，但您可以将此位置传回，raw_decode然后它会从该位置再次开始解析。不幸的是，Pythonjson模块不接受带有空格前缀的字符串。所以我们需要搜索以找到文档的第一个非空白部分。

from json import JSONDecoder, JSONDecodeError
import re

NOT_WHITESPACE = re.compile(r'[^\s]')

def decode_stacked(document, pos=0, decoder=JSONDecoder()):
    while True:
        match = NOT_WHITESPACE.search(document, pos)
        if not match:
            return
        pos = match.start()

        try:
            obj, pos = decoder.raw_decode(document, pos)
        except JSONDecodeError:
            # do something sensible if there's some error
            raise
        yield obj

s = """

{"a": 1}  


   [
1
,   
2
]


"""

for obj in decode_stacked(s):
    print(obj)

prints:

印刷：

{'a': 1}
[1, 2]

Answer 4

回答by Fantix King

Added streaming support based on the answer of @dunes:

根据@dunes 的回答添加了流媒体支持：

import re
from json import JSONDecoder, JSONDecodeError

NOT_WHITESPACE = re.compile(r"[^\s]")


def stream_json(file_obj, buf_size=1024, decoder=JSONDecoder()):
    buf = ""
    ex = None
    while True:
        block = file_obj.read(buf_size)
        if not block:
            break
        buf += block
        pos = 0
        while True:
            match = NOT_WHITESPACE.search(buf, pos)
            if not match:
                break
            pos = match.start()
            try:
                obj, pos = decoder.raw_decode(buf, pos)
            except JSONDecodeError as e:
                ex = e
                break
            else:
                ex = None
                yield obj
        buf = buf[pos:]
    if ex is not None:
        raise ex

Answer 5

回答by snr

import json

with open('yourjsonfile.json') as json_file:
    data = json.load(json_file.read())

Don't forget to add read().

不要忘记添加read().

Python 如何从一个文件中提取多个 JSON 对象？

提问by user6396

采纳答案by danielfranca

回答by Dan Temkin

回答by Dunes

回答by Fantix King

回答by snr

相关推荐

最近更新

标签

Python 如何从一个文件中提取多个 JSON 对象？

提问by user6396

采纳答案by danielfranca

回答by Dan Temkin

回答by Dunes

回答by Fantix King

回答by snr

相关推荐

Python pip install: 请检查该目录的权限和所有者

Python 中的实时中断

python的加/减运算符±

如何使用 Python 发送带有 .csv 附件的电子邮件

相关推荐

最近更新

标签