Python 验证和格式化 JSON 文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23344948/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Validate and format JSON files
提问by Black
I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded
) In turn, I can't read it into my program.
我有大约 2000 个 JSON 文件,我正在尝试通过 Python 程序运行这些文件。当 JSON 文件的格式不正确时会出现问题。(错误:)ValueError: No JSON object could be decoded
反过来,我无法将其读入我的程序。
I am currently doing something like the below:
我目前正在做类似下面的事情:
for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part
I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?
我知道有验证和格式化 JSON 文件的离线方法,但是有没有一种编程方式来检查和格式化这些文件?如果没有,是否有免费/便宜的替代方法来离线修复所有这些文件,即我只是在包含所有 JSON 文件的文件夹上运行该程序并根据需要对其进行格式化?
SOLVED using @reece's comment:
使用@reece 的评论解决:
invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
try:
simplejson.load(json_file)
read_json_files.append(files)
except ValueError, e:
print ("JSON object issue: %s") % e
invalid_json_files.append(files)
print invalid_json_files, len(read_json_files)
Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.
原来我在我的工作目录中保存了一个不是 JSON 格式的文件,这与我从中读取数据的位置相同。感谢您提供有用的建议。
采纳答案by reece
The built-in JSON module can be used as a validator:
内置的 JSON 模块可以用作验证器:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
您可以使用以下方法使其与文件一起使用:
with open(filename) as f:
return json.load(f)
instead of json.loads
and you can include the filename as well in the error message.
而不是,json.loads
您也可以在错误消息中包含文件名。
On Python 3.3.5, for {test: "foo"}
, I get:
在 Python 3.3.5 上,对于{test: "foo"}
,我得到:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
并在 2.7.6 上:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}
.
这是因为正确的 json 是{"test": "foo"}
.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
在处理无效文件时,最好不要进一步处理它们。您可以构建一个 skipped.txt 文件,列出有错误的文件,以便手动检查和修复它们。
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
如果可能,您应该检查生成无效 json 文件的站点/程序,修复该问题,然后重新生成 json 文件。否则,您将继续拥有无效 JSON 的新文件。
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
否则,您将需要编写一个自定义 json 解析器来修复常见错误。有了这个,您应该将原始文件置于源代码控制之下(或存档),以便您可以查看和检查自动化工具修复的差异(作为完整性检查)。不明确的情况应该手动修复。
回答by MxLDevs
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
是的,有多种方法可以验证 JSON 文件是否有效。一种方法是使用 JSON 解析库,如果您提供的输入格式不正确,它将抛出异常。
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
当然,如果你想修复它,你自然不能使用 JSON 加载器,因为它首先不是有效的 JSON。除非您使用的库会自动为您解决问题,否则您甚至可能不会有这个问题。
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
一种方法是手动加载文件并将其标记化并尝试检测错误并尝试随时修复它们,但我确信在某些情况下错误无法自动修复并且最好抛出一个错误并要求用户修复他们的文件。
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
我自己还没有编写 JSON 修复程序,因此我无法提供有关如何实际修复错误的任何详细信息。
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
但是我不确定修复所有错误是否是一个好主意,因为那样你就会假设你的修复是用户真正想要的。如果缺少逗号或者他们有一个额外的尾随逗号,那么这可能没问题,但可能存在用户想要的不明确的情况。