需要 JavaScript 支持的页面上的 cURL 请求

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12303134/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 15:52:03  来源:igfitidea点击:

cURL request on a page requiring JavaScript support

javascriptcookiescurlweb-scrapingspoofing

提问by user965748

I need to get the HTML source of pinnaclesports.com. The problem is it detects whether cookies and JS are enabled and if not, it just returns some page saying

我需要获取 pinnaclesports.com 的 HTML 源代码。问题是它检测是否启用了 cookie 和 JS,如果没有,它只返回一些页面说

This site requires JavaScript and Cookies to be enabled. Please change your browser settings or upgrade your browser.

本网站需要启用 JavaScript 和 Cookie。请更改您的浏览器设置或升级您的浏览器。

Is there any way how to spoof JS support when using cURL?

有什么办法可以在使用 cURL 时欺骗 JS 支持?

EDIT: I can use a headless browser that runs either as a Perl/Ruby module or is written in PHP

编辑:我可以使用作为 Perl/Ruby 模块运行或用 PHP 编写的无头浏览器

采纳答案by Markandey Singh

I figured out that, if you make cookie-less REQUEST a page will be returned , which uses javascript to set cookies, the one which you are getting using the curl.

我发现,如果您进行无 cookie 请求,将返回一个页面,该页面使用 javascript 设置 cookie,即您使用 curl 获得的页面。

make another curl call like this

像这样进行另一个 curl 调用

curl https://www.pinnaclesports.com/ --cookie "YPF8827340282Jdskjhfiw_928937459182JAX666=122.167.231.139"

i.e. You have to make 2 calls 1) make cookie less call, read and regex to find cookiename. 2) make 2nd request after setting the cokie name. that will solve your problem.

即您必须进行 2 次调用 1) 减少 cookie 调用,读取和正则表达式以查找 cookiename。2) 设置 cokie 名称后发出第二个请求。这将解决您的问题。

OR
Just use YQL


仅使用 YQL

select * from html where url="https://www.pinnaclesports.com/" 

point your curl to here

将你的卷发指向这里

回答by Jo?o Paulo Cercal

Other sugestion is set the user agent, this solution works for me on parser of the Google Groups:

其他 sugestion 设置为用户代理,此解决方案适用于 Google Groups 的解析器:

curl -L -v "https://groups.google.com/d/forum/<GROUP-NAME>" -A "Mozilla/5.0 (compatible;  MSIE 7.01; Windows NT 5.0)"