Python 使用 Scrapy 抓取 JSON 响应
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18171835/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scraping a JSON response with Scrapy
提问by Thomas Kingaroy
How do you use Scrapy to scrape web requests that return JSON? For example, the JSON would look like this:
如何使用 Scrapy 抓取返回 JSON 的 Web 请求?例如,JSON 将如下所示:
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
I would be looking to scrape specific items (e.g. name
and fax
in the above) and save to csv.
我希望抓取特定项目(例如name
和fax
上面的)并保存到 csv。
采纳答案by alecxe
It's the same as using Scrapy's HtmlXPathSelector
for html responses. The only difference is that you should use json
module to parse the response:
这与使用 Scrapy 的HtmlXPathSelector
html 响应相同。唯一的区别是您应该使用json
module 来解析响应:
class MySpider(BaseSpider):
...
def parse(self, response):
jsonresponse = json.loads(response.text)
item = MyItem()
item["firstName"] = jsonresponse["firstName"]
return item
Hope that helps.
希望有帮助。
回答by Manoj Sahu
The possible reason JSON is not loading is that it has single-quotes before and after. Try this:
JSON 未加载的可能原因是它前后都有单引号。尝试这个:
json.loads(response.body_as_unicode().replace("'", '"'))