pandas 如何使用python pandas读取json文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43803180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read json file using python pandas?
提问by kit
I want to read json file using python pandas. Each line of the file is a complete object in JSON.
我想使用 python pandas 读取 json 文件。文件的每一行都是一个完整的 JSON 对象。
I'm using below versions-
我正在使用以下版本-
python : 2.7.6
蟒蛇:2.7.6
pandas: 1.19.1
Pandas:1.19.1
json file-
json文件——
{"id":"111","p_id":"55","name":"aaa","notes":"","childs":[]}
{"id":"222","p_id":"56","name":"bbb","notes":"","childs":[]}
{"id":"333","p_id":"75","name":"ccc","notes":"","childs":[]}
{"id":"444","p_id":"76","name":"ddd","notes":"","childs":["abc","efg","pqr"
,"rtu"]}
I'm using below code to read json file-
我正在使用下面的代码来读取 json 文件 -
df = pd.read_json("temp.txt", lines = True)
print df
The problem is, in json file "childs" key contains a array of not known indexes and in between "\n" is available. so if I run above code I'm getting ValueError: Expected object or valuebut if I remove "\n" available after "pqr" my code gets work.
问题是,在 json 文件中,“childs”键包含一个未知索引数组,并且在“\n”之间是可用的。所以如果我运行上面的代码,我会得到ValueError: Expected object or value但是如果我在“pqr”之后删除可用的“\n”,我的代码就可以工作了。
I don't want to remove available "\n" from my data. I want to handle this within my code. I want to use python pandas only instead of python json libraries for handling data in good manner.
我不想从我的数据中删除可用的“\n”。我想在我的代码中处理这个。我只想使用 python pandas 而不是 python json 库来以良好的方式处理数据。
How I can make use of python pandas only and handle this type of file?
我如何才能只使用 python pandas 并处理这种类型的文件?
回答by Ravi.Dudi
first check if it's a valid json file or not using JSON validatorsite
首先使用JSON 验证器站点检查它是否是有效的 json 文件
once the file is in valid json format you can use the below code to read it as dataframe
一旦文件为有效的 json 格式,您就可以使用以下代码将其作为数据帧读取
with open("training.json") as datafile:
data = json.load(datafile)
dataframe = pd.DataFrame(data)
hope this helps.
希望这可以帮助。
回答by MadScone
read_json()
can't work because of the new line after "pqr". You can either try and fix that line or try and format the whole thing into valid JSON. I'm doing the latter here by adding commas after new lines and surrounding the whole thing with brackets to form a proper JSON array:
read_json()
由于“pqr”之后的新行而无法工作。您可以尝试修复该行,也可以尝试将整个内容格式化为有效的 JSON。我在这里通过在新行后添加逗号并用括号将整个内容括起来以形成适当的 JSON 数组来执行后者:
with open('temp.txt') as f:
content = f.read()
pd.read_json('[' + content.replace('}\n', '},') + ']')