Python 使用 Scrapy 抓取 JSON 响应

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18171835/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:02:48  来源:igfitidea点击:

Scraping a JSON response with Scrapy

pythonjsonweb-scrapingscrapy

提问by Thomas Kingaroy

How do you use Scrapy to scrape web requests that return JSON? For example, the JSON would look like this:

如何使用 Scrapy 抓取返回 JSON 的 Web 请求?例如,JSON 将如下所示:

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021"
    },
    "phoneNumber": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}

I would be looking to scrape specific items (e.g. nameand faxin the above) and save to csv.

我希望抓取特定项目(例如namefax上面的)并保存到 csv。

采纳答案by alecxe

It's the same as using Scrapy's HtmlXPathSelectorfor html responses. The only difference is that you should use jsonmodule to parse the response:

这与使用 Scrapy 的HtmlXPathSelectorhtml 响应相同。唯一的区别是您应该使用jsonmodule 来解析响应:

class MySpider(BaseSpider):
    ...


    def parse(self, response):
         jsonresponse = json.loads(response.text)

         item = MyItem()
         item["firstName"] = jsonresponse["firstName"]             

         return item

Hope that helps.

希望有帮助。

回答by Manoj Sahu

The possible reason JSON is not loading is that it has single-quotes before and after. Try this:

JSON 未加载的可能原因是它前后都有单引号。尝试这个:

json.loads(response.body_as_unicode().replace("'", '"'))