php curl：我如何像网络浏览器一样模拟获取请求？

Question

提问by ufk

there are websites that when i open specific ajax request on browser i get the resulted page, but when i try to load them with curl, i receive an error from the server.

有一些网站，当我在浏览器上打开特定的 ajax 请求时，我得到了结果页面，但是当我尝试使用 curl 加载它们时，我收到来自服务器的错误。

how can i properly emulate a get request to the server that will simulate a browser ?

如何正确模拟对将模拟浏览器的服务器的获取请求？

that's what i'm doing:

这就是我正在做的：

$url="https://new.aol.com/productsweb/subflows/ScreenNameFlow/AjaxSNAction.do?s=username&f=firstname&l=lastname";
ini_set('user_agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT
5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
print $result;

Answer 1

回答by VolkerK

Are you sure the curl module honors ini_set('user_agent',...)? There is an option CURLOPT_USERAGENT described at http://docs.php.net/function.curl-setopt.
Could there also be a cookie tested by the server? That you can handle by using CURLOPT_COOKIE, CURLOPT_COOKIEFILE and/or CURLOPT_COOKIEJAR.

你确定 curl 模块支持 ini_set('user_agent',...) 吗？http://docs.php.net/function.curl-setopt 中描述了一个 CURLOPT_USERAGENT 选项。
是否也有服务器测试的 cookie？您可以使用 CURLOPT_COOKIE、CURLOPT_COOKIEFILE 和/或 CURLOPT_COOKIEJAR 来处理。

edit: Since the request uses https there might also be error in verifying the certificate, see CURLOPT_SSL_VERIFYPEER.

编辑：由于请求使用 https，因此验证证书时也可能出错，请参阅 CURLOPT_SSL_VERIFYPEER。

$url="https://new.aol.com/productsweb/subflows/ScreenNameFlow/AjaxSNAction.do?s=username&f=firstname&l=lastname";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
var_dump($result);

Answer 2

回答by hanshenrik

i'll make an example, first decide what browser you want to emulate, in this case i chose Firefox 60.6.1esr (64-bit), and check what GET request it issues, this can be obtained with a simple netcat server (MacOS bundles netcat, most linux distributions bunles netcat, and Windows users can get netcat from.. Cygwin.org , among other places),

我举个例子，首先决定你想模拟什么浏览器，在这种情况下我选择了Firefox 60.6.1esr (64-bit)，并检查它发出什么 GET 请求，这可以通过一个简单的 netcat 服务器获得（MacOS 捆绑 netcat，大多数 linux 发行版捆绑 netcat， Windows 用户可以从.. Cygwin.org 等地方获取 netcat），

setting up the netcat server to listen on port 9999: nc -l 9999

设置 netcat 服务器以侦听端口 9999： nc -l 9999

now hitting http://127.0.0.1:9999in firefox, i get:

现在在 Firefox 中点击http://127.0.0.1:9999，我得到：

$ nc -l 9999
GET / HTTP/1.1
Host: 127.0.0.1:9999
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1

now let us compare that with this simple script:

现在让我们将其与这个简单的脚本进行比较：

<?php
$ch=curl_init("http://127.0.0.1:9999");
curl_exec($ch);

i get:

我得到：

$ nc -l 9999
GET / HTTP/1.1
Host: 127.0.0.1:9999
Accept: */*

there are several missing headers here, they can all be added with the CURLOPT_HTTPHEADER option of curl_setopt, but the User-Agentspecifically should be set with CURLOPT_USERAGENT instead (it will be persistent across multiple calls to curl_exec() and if you use CURLOPT_FOLLOWLOCATION then it will persist across http redirections as well), and the Accept-Encodingheader should be set with CURLOPT_ENCODING instead (if they're set with CURLOPT_ENCODING then curl will automatically decompress the response if the server choose to compress it, but if you set it via CURLOPT_HTTPHEADER then you must manually detect and decompress the content yourself, which is a pain in the ass and completely unnecessary, generally speaking) so adding those we get:

这里有几个缺少的标头，它们都可以用 curl_setopt 的 CURLOPT_HTTPHEADER 选项添加，但User-Agent具体应该用 CURLOPT_USERAGENT 设置（它会在多次调用 curl_exec() 时保持不变，如果你使用 CURLOPT_FOLLOWLOCATION 那么它会保持不变http 重定向），并且Accept-Encoding标头应该用 CURLOPT_ENCODING 设置（如果它们用 CURLOPT_ENCODING 设置，那么如果服务器选择压缩响应，curl 将自动解压缩响应，但如果您通过 CURLOPT_HTTPHEADER 设置它，那么您必须手动检测并自己解压缩内容，这很麻烦而且完全没有必要，一般来说）所以添加我们得到的内容：

<?php
$ch=curl_init("http://127.0.0.1:9999");
curl_setopt_array($ch,array(
        CURLOPT_USERAGENT=>'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
        CURLOPT_ENCODING=>'gzip, deflate',
        CURLOPT_HTTPHEADER=>array(
                'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Language: en-US,en;q=0.5',
                'Connection: keep-alive',
                'Upgrade-Insecure-Requests: 1',
        ),
));
curl_exec($ch);

now running that code, our netcat server gets:

现在运行该代码，我们的 netcat 服务器得到：

$ nc -l 9999
GET / HTTP/1.1
Host: 127.0.0.1:9999
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept-Encoding: gzip, deflate
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Upgrade-Insecure-Requests: 1

and voila! our php-emulated browserGET request should now be indistinguishable from the real firefox GET request :)

瞧！我们的 php 模拟 browserGET 请求现在应该与真实的 firefox GET 请求没有区别:)

this next part is just nitpicking, but if you look very closely, you'll see that the headers are stacked in the wrong order, firefox put the Accept-Encodingheader in line 6, and our emulated GET request puts it in line 3.. to fix this, we can manually put the Accept-Encoding header in the right line,

下一部分只是吹毛求疵，但如果仔细观察，您会发现标头以错误的顺序堆叠，firefox 将Accept-Encoding标头放在第 6 行，而我们模拟的 GET 请求将其放在第 3 行......以修复这样，我们可以手动将 Accept-Encoding 标头放在正确的行中，

<?php
$ch=curl_init("http://127.0.0.1:9999");
curl_setopt_array($ch,array(
        CURLOPT_USERAGENT=>'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0',
        CURLOPT_ENCODING=>'gzip, deflate',
        CURLOPT_HTTPHEADER=>array(
                'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                'Accept-Language: en-US,en;q=0.5',
                'Accept-Encoding: gzip, deflate',
                'Connection: keep-alive',
                'Upgrade-Insecure-Requests: 1',
        ),
));
curl_exec($ch);

running that, our netcat server gets:

运行它，我们的 netcat 服务器得到：

$ nc -l 9999
GET / HTTP/1.1
Host: 127.0.0.1:9999
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:60.0) Gecko/20100101 Firefox/60.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1

problem solved, now the headers is even in the correct order, and the request seems to be COMPLETELY INDISTINGUISHABLEfrom the real firefox request :) (i don't actually recommend this last step, it's a maintenance burden to keep CURLOPT_ENCODING in sync with the custom Accept-Encoding header, and i've never experienced a situation where the order of the headers are significant)

问题解决了，现在标题的顺序是正确的，并且请求似乎与真实的 firefox 请求完全无法区分:)（我实际上并不推荐这最后一步，保持 CURLOPT_ENCODING 与自定义 Accept-Encoding 标头，我从未遇到过标头顺序很重要的情况）

php curl：我如何像网络浏览器一样模拟获取请求？

提问by ufk

回答by VolkerK

回答by hanshenrik

相关推荐

最近更新

标签

php curl：我如何像网络浏览器一样模拟获取请求？

提问by ufk

回答by VolkerK

回答by hanshenrik

相关推荐

PHP getopt 操作

PHP：__('Some text') 有什么作用？

php mysqli_real_escape_string，我应该使用它吗？

如何在 PHP 上声明多个标头

相关推荐

最近更新

标签