PHP / Curl:某些站点上的 HEAD 请求需要很长时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/770179/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP / Curl: HEAD Request takes a long time on some sites
提问by Ian
I have simple code that does a head request for a URL and then prints the response headers. I've noticed that on some sites, this can take a long time to complete.
我有一个简单的代码,它对 URL 进行头部请求,然后打印响应头。我注意到在某些网站上,这可能需要很长时间才能完成。
For example, requesting http://www.arstechnica.comtakes about two minutes. I've tried the same request using another web site that does the same basic task, and it comes back immediately. So there must be something I have set incorrectly that's causing this delay.
例如,请求http://www.arstechnica.com大约需要两分钟。我已经使用另一个执行相同基本任务的网站尝试了相同的请求,它立即返回。所以一定是我设置不正确导致了这种延迟。
Here's the code I have:
这是我的代码:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
$content = curl_exec ($ch);
curl_close ($ch);
Here's a link to the web site that does the same function: http://www.seoconsultants.com/tools/headers.asp
这是具有相同功能的网站的链接:http: //www.seoconsultants.com/tools/headers.asp
The code above, at least on my server, takes two minutes to retrieve www.arstechnica.com, but the service at the link above returns it right away.
上面的代码,至少在我的服务器上,需要两分钟来检索 www.arstechnica.com,但是上面链接中的服务会立即返回它。
What am I missing?
我错过了什么?
回答by Paolo Bergantino
Try simplifying it a little bit:
尝试稍微简化一下:
print htmlentities(file_get_contents("http://www.arstechnica.com"));
The above outputs instantly on my webserver. If it doesn't on yours, there's a good chance your web host has some kind of setting in place to throttle these kind of requests.
以上输出立即在我的网络服务器上。如果它不在您的身上,则您的网络主机很有可能进行了某种设置来限制此类请求。
EDIT:
编辑:
Since the above happens instantly for you, try setting this curl settingon your original code:
由于上述情况会立即发生,请尝试在原始代码上设置此 curl 设置:
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, true);
Using the tool you posted, I noticed that http://www.arstechnica.comhas a 301 header sent for any request sent to it. It is possible that cURL is getting this and not following the new Location specified to it, thus causing your script to hang.
使用您发布的工具,我注意到http://www.arstechnica.com发送给它的任何请求都会发送一个 301 标头。cURL 可能正在获取此信息,而不是遵循为其指定的新位置,从而导致您的脚本挂起。
SECOND EDIT:
第二次编辑:
Curiously enough, trying the same code you have above was making my webserver hang too. I replaced this code:
奇怪的是,尝试与上面相同的代码也使我的网络服务器挂起。我替换了这个代码:
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
With this:
有了这个:
curl_setopt($ch, CURLOPT_NOBODY, true);
Which is the way the manualrecommends you do a HEAD request. It made it work instantly.
这是手册建议您执行 HEAD 请求的方式。它使它立即起作用。
回答by Trey
You have to remember that HEAD is only a suggestion to the web server. For HEAD to do the right thing it often takes some explicit effort on the part of the admins. If you HEAD a static file Apache (or whatever your webserver is) will often step in an do the right thing. If you HEAD a dynamic page, the default for most setups is to execute the GET path, collect all the results, and just send back the headers without the content. If that application is in a 3 (or more) tier setup, that call could potentially be very expensive and needless for a HEAD context. For instance, on a Java servlet, by default doHead() just calls doGet(). To do something a little smarter for the application the developer would have to explicitly implement doHead() (and more often than not, they will not).
您必须记住 HEAD 只是对 Web 服务器的建议。为了让 HEAD 做正确的事情,管理员通常需要付出一些明确的努力。如果你是一个静态文件 Apache(或任何你的网络服务器),通常会介入做正确的事情。如果你 HEAD 一个动态页面,大多数设置的默认设置是执行 GET 路径,收集所有结果,然后只发回没有内容的标题。如果该应用程序处于 3(或更多)层设置中,则该调用可能非常昂贵且对于 HEAD 上下文是不必要的。例如,在 Java servlet 上,默认情况下 doHead() 只调用 doGet()。要为应用程序做一些更聪明的事情,开发人员必须显式实现 doHead()(而且通常情况下,他们不会)。
I encountered an app from a fortune 100 company that is used for downloading several hundred megabytes of pricing information. We'd check for updates to that data by executing HEAD requests fairly regularly until the modified date changed. It turns out that this request would actually make back end calls to generate this list every time we made the request which involved gigabytes of data on their back end and xfer it between several internal servers. They weren't terribly happy with us but once we explained the use case they quickly came up with an alternate solution. If they had implemented HEAD, rather than relying on their web server to fake it, it would not have been an issue.
我遇到了一个来自财富 100 强公司的应用程序,用于下载数百兆字节的定价信息。我们将通过相当定期地执行 HEAD 请求来检查该数据的更新,直到修改日期发生变化。事实证明,每次我们发出请求时,该请求实际上都会进行后端调用以生成此列表,该请求涉及后端的千兆字节数据,并在多个内部服务器之间进行传输。他们对我们不是很满意,但是一旦我们解释了用例,他们很快就想出了一个替代解决方案。如果他们实施了 HEAD,而不是依靠他们的 Web 服务器来伪造它,那么这就不成问题了。
回答by Alix Axel
If my memory doesn't fails me doing a HEAD request in CURL changes the HTTP protocol version to 1.0 (which is slow and probably the guilty part here) try changing that to:
如果我的记忆没有失败,我在 CURL 中执行 HEAD 请求将 HTTP 协议版本更改为 1.0(这很慢,可能是这里的错误部分)尝试将其更改为:
$ch = curl_init();
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt ($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
// Only calling the head
curl_setopt($ch, CURLOPT_HEADER, true); // header will be at output
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'HEAD'); // HTTP request is 'HEAD'
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1); // ADD THIS
$content = curl_exec ($ch);
curl_close ($ch);
回答by San
I used the below function to find out the redirected URL.
我使用下面的函数来找出重定向的 URL。
$head = get_headers($url, 1);
The second argument makes it return an array with keys. For e.g. the below will give the Locationvalue.
第二个参数使它返回一个带键的数组。例如,下面将给出Location值。
$head["Location"]
回答by Brad
This:
这个:
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
I wasn't trying to get headers.
I was just trying to make the page load of some data not take 2 minutes similar to described above.
That magical little options has dropped it down to 2 seconds.
我不是想得到标题。
我只是想使某些数据的页面加载不需要 2 分钟,类似于上述情况。
那个神奇的小选项将它降到了 2 秒。

