Python 使用 Scrapy 抓取 JSON 响应

Question

提问by Thomas Kingaroy

How do you use Scrapy to scrape web requests that return JSON? For example, the JSON would look like this:

如何使用 Scrapy 抓取返回 JSON 的 Web 请求？例如，JSON 将如下所示：

{
    "firstName": "John",
    "lastName": "Smith",
    "age": 25,
    "address": {
        "streetAddress": "21 2nd Street",
        "city": "New York",
        "state": "NY",
        "postalCode": "10021"
    },
    "phoneNumber": [
        {
            "type": "home",
            "number": "212 555-1234"
        },
        {
            "type": "fax",
            "number": "646 555-4567"
        }
    ]
}

I would be looking to scrape specific items (e.g. nameand faxin the above) and save to csv.

我希望抓取特定项目（例如name和fax上面的）并保存到 csv。

Answer 1

采纳答案by alecxe

It's the same as using Scrapy's HtmlXPathSelectorfor html responses. The only difference is that you should use jsonmodule to parse the response:

这与使用 Scrapy 的HtmlXPathSelectorhtml 响应相同。唯一的区别是您应该使用jsonmodule 来解析响应：

class MySpider(BaseSpider):
    ...


    def parse(self, response):
         jsonresponse = json.loads(response.text)

         item = MyItem()
         item["firstName"] = jsonresponse["firstName"]             

         return item

Hope that helps.

希望有帮助。

Answer 2

回答by Manoj Sahu

The possible reason JSON is not loading is that it has single-quotes before and after. Try this:

JSON 未加载的可能原因是它前后都有单引号。尝试这个：

json.loads(response.body_as_unicode().replace("'", '"'))

Python 使用 Scrapy 抓取 JSON 响应

提问by Thomas Kingaroy

采纳答案by alecxe

回答by Manoj Sahu

相关推荐

最近更新

标签

Python 使用 Scrapy 抓取 JSON 响应

提问by Thomas Kingaroy

采纳答案by alecxe

回答by Manoj Sahu

相关推荐

Python 使用 Flask 会话时出现内部服务器错误

Python 无法导入 tweepy 模块

Python django 查询中的 sql“LIKE”等价物

Python 如何使用 Spark 查找中位数和分位数

相关推荐

最近更新

标签