用python从JSON文件中提取部分数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28218173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract part of data from JSON file with python
提问by Torostar
I have been trying to extract only certain data from a JSON file. I managed to decode the JSON and get the wanted data into a python dict. When I print out the dict it shows all the wanted data, but when I try to write the dict into a new file, only the last object gets written. One thing that I can't understand is also why when I print the dict I get multiple dicts objects instead of 1 as I would expect.
我一直在尝试仅从 JSON 文件中提取某些数据。我设法解码了 JSON 并将想要的数据放入 python 字典中。当我打印 dict 时,它会显示所有想要的数据,但是当我尝试将 dict 写入新文件时,只会写入最后一个对象。我无法理解的一件事也是为什么当我打印 dict 时,我会得到多个 dicts 对象,而不是我期望的 1 个。
My code:
我的代码:
import json
input_file=open('json.json', 'r')
output_file=open('test.json', 'w')
json_decode=json.load(input_file)
for item in json_decode:
my_dict={}
my_dict['title']=item.get('labels').get('en').get('value')
my_dict['description']=item.get('descriptions').get('en').get('value')
my_dict['id']=item.get('id')
print my_dict
back_json=json.dumps(my_dict, output_file)
output_file.write(back_json)
output_file.close()
my json.json file:
我的 json.json 文件:
[
{"type":"item","labels":{"en":{"language":"en","value":"George Washington"}},"descriptions":{"en":{"language":"en","value":"American politician, 1st president of the United States (in office from 1789 to 1797)"}},"id":"Q23"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Douglas No?l Adams"},{"language":"en","value":"Douglas Noel Adams"}]},"labels":{"en":{"language":"en","value":"Douglas Adams"}},"descriptions":{"en":{"language":"en","value":"English writer and humorist"}},"id":"Q42"},
{"type":"item","aliases":{"en":[{"language":"en","value":"George Bush"},{"language":"en","value":"George Walker Bush"}]},"labels":{"en":{"language":"en","value":"George W. Bush"}},"descriptions":{"en":{"language":"en","value":"American politician, 43rd president of the United States from 2001 to 2009"}},"id":"Q207"},
{"type":"item","aliases":{"en":[{"language":"en","value":"Velázquez"},{"language":"en","value":"Diego Rodríguez de Silva y Velázquez"}]},"labels":{"en":{"language":"en","value":"Diego Velázquez"}},"descriptions":{"en":{"language":"en","value":"Spanish painter who was the leading artist in the court of King Philip IV"}},"id":"Q297"},
{"type":"item","labels":{"en":{"language":"en","value":"Eduardo Frei Ruiz-Tagle"}},"descriptions":{"en":{"language":"en","value":"Chilean politician and former President"}},"id":"Q326"}
]
print my_dict output:
打印 my_dict 输出:
{'id': u'Q23', 'description': u'American politician, 1st president of the United States (in office from 1789 to 1797)', 'title': u'George Washington'}
{'id': u'Q42', 'description': u'English writer and humorist', 'title': u'Douglas Adams'}
{'id': u'Q207', 'description': u'American politician, 43rd president of the United States from 2001 to 2009', 'title': u'George W. Bush'}
{'id': u'Q297', 'description': u'Spanish painter who was the leading artist in the court of King Philip IV', 'title': u'Diego Vel\xe1zquez'}
{'id': u'Q326', 'description': u'Chilean politician and former President', 'title': u'Eduardo Frei Ruiz-Tagle'}
output in the file test.json:
文件 test.json 中的输出:
{"id": "Q326", "description": "Chilean politician and former President", "title": "Eduardo Frei Ruiz-Tagle"}
Also I would like to know why the dict is outputing 'title': u'Diego Vel\xe1zquez' but if i go print my_dict.values()[2] i Get the name written normaly as Diego Velázquez.
另外我想知道为什么 dict 输出“title”:u'Diego Vel\xe1zquez' 但是如果我去打印 my_dict.values()[2] 我得到的名字通常写成 Diego Velázquez。
Many thanks
非常感谢
采纳答案by jms
Your code creates new dictionary object for each object with:
您的代码为每个对象创建新的字典对象:
my_dict={}
Moreover, it overwrites the previous contents of the variable. Olddictionary in m_dictis deleted from memory.
此外,它会覆盖变量的先前内容。m_dict 中的旧字典从内存中删除。
Try to create a list before your for loop and store the result there.
尝试在 for 循环之前创建一个列表并将结果存储在那里。
result = []
for item in json_decode:
my_dict={}
my_dict['title']=item.get('labels').get('en').get('value')
my_dict['description']=item.get('descriptions').get('en').get('value')
my_dict['id']=item.get('id')
print my_dict
result.append(my_dict)
Finally, write the result to the output:
最后,将结果写入输出:
back_json=json.dumps(result, output_file)
Printing the dictionary object aims to help the developer by showing the type of the data. In u'Diego Vel\xe1zquez', uat the start indicates a Unicode object (string). When object using is printed, it is decoded according to current language settings in your OS.
打印字典对象旨在通过显示数据类型来帮助开发人员。在 u'Diego Vel\xe1zquez' 中,开头的u表示一个 Unicode 对象(字符串)。打印 object using 时,它会根据您操作系统中的当前语言设置进行解码。
回答by Spencer
When you do this:
当你这样做时:
for item in json_decode:
You are looping through each line in the file.
您正在遍历文件中的每一行。
Every time through the loop you are overriding the my_dict variable, which is why you get only one line in your output.
每次通过循环时,您都会覆盖 my_dict 变量,这就是为什么您的输出中只有一行。
Once you load in the file, you can simply print out the json_decode
variable to do what you want.
加载文件后,您可以简单地打印出json_decode
变量来执行您想要的操作。