使用python抓取ajax页面

Question

提问by Lynob

I've already seen this question about scraping ajax, but python isn't mentioned there. I considered using scrapy, i believe they have some docs on that subject, but as you can see the website is down. So i don't know what to do. I want to do the following:

我已经看到了这个关于抓取 ajax 的问题，但是那里没有提到 python。我考虑过使用scrapy，我相信他们有一些关于该主题的文档，但正如您所看到的，该网站已关闭。所以我不知道该怎么办。我想做以下事情：

I only have one url, example.com you go from page to page by clicking submit, the url doesn't change since they're using ajax to display the content. I want to scrape the content of each page, how to do it?

我只有一个 url，example.com，您可以通过单击提交从一个页面转到另一个页面，该 url 不会更改，因为他们使用 ajax 来显示内容。我想抓取每个页面的内容，怎么做？

Lets say that i want to scrape only the numbers, is there anything other than scrapy that would do it? If not, would you give me a snippet on how to do it, just because their website is down so i can't reach the docs.

假设我只想抓取数字，除了scrapy之外还有什么可以做到的吗？如果没有，你能给我一个关于如何做的片段，只是因为他们的网站已经关闭，所以我无法访问文档。

Answer 1

采纳答案by alecxe

First of all, scrapy docs are available at https://scrapy.readthedocs.org/en/latest/.

首先，scrapy 文档可在https://scrapy.readthedocs.org/en/latest/ 获得。

Speaking about handling ajax while web scraping. Basically, the idea is rather simple:

谈到在网页抓取时处理 ajax。基本上，这个想法相当简单：

open browser developer tools, network tab
go to the target site
click submit button and see what XHRrequestis going to the server
simulate this XHRrequest in your spider

打开浏览器开发者工具，网络选项卡
转到目标站点
单击提交按钮，查看XHR发送到服务器的请求
XHR在你的蜘蛛中模拟这个请求

Also see:

另见：

Hope that helps.

希望有帮助。

Answer 2

回答by Malak

I found the answer very useful but I would like to make it more simple.

我发现答案非常有用，但我想让它更简单。

response = requests.post(request_url, data=payload, headers=request_headers)

request.post takes three parameters url, data and headers. Values for these three attributes can be found in the XHR request.

request.post 接受三个参数 url、data 和 headers。这三个属性的值可以在 XHR 请求中找到。

Copy the whole request header and form data to load into the above variables and you are good to go

复制整个请求头和表单数据以加载到上述变量中，您就可以开始了

使用python抓取ajax页面

提问by Lynob

采纳答案by alecxe

回答by Malak

相关推荐

最近更新

标签

使用python抓取ajax页面

提问by Lynob

采纳答案by alecxe

回答by Malak

相关推荐

提高 Python 模块导入的速度

Python分割函数。解包错误的值太多

Python 如何使用 tkinter 中的按钮设置“Entry”小部件的文本/值/内容

Python Pandas 数据框隐藏索引功能？

相关推荐

最近更新

标签