使用完整网址时,PHP file_get_contents 非常慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3629504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 10:29:03  来源:igfitidea点击:

PHP file_get_contents very slow when using full url

php

提问by ecurbh

I am working with a script (that I did not create originally) that generates a pdf file from an HTML page. The problem is that it is now taking a very long time, like 1-2 minutes, to process. Supposedly this was working fine originally, but has slowed down within the past couple of weeks.

我正在使用从 HTML 页面生成 pdf 文件的脚本(不是我最初创建的)。问题是现在处理需要很长时间,比如 1-2 分钟。据说这最初工作正常,但在过去几周内放缓。

The script calls file_get_contentson a php script, which then outputs the result into an HTML file on the server, and runs the pdf generator app on that file.

该脚本调用file_get_contentsphp 脚本,然后将结果输出到服务器上的 HTML 文件中,并在该文件上运行 pdf 生成器应用程序。

I seem to have narrowed down the problem to the file_get_contentscall on a full url, rather than a local path.

我似乎已将问题缩小到file_get_contents完整 url 上的调用,而不是本地路径。

When I use

当我使用

$content = file_get_contents('test.txt');

it processes almost instantaneously. However, if I use the full url

它几乎是即时处理的。但是,如果我使用完整的网址

$content = file_get_contents('http://example.com/test.txt');

it takes anywhere from 30-90 seconds to process.

处理需要 30-90 秒。

It's not limited to our server, it is slow when accessing any external url, such as http://www.google.com. I believe the script calls the full url because there are query string variables that are necessary that don't work if you call the file locally.

它不仅限于我们的服务器,访问任何外部 url 时都很慢,例如http://www.google.com。我相信脚本会调用完整的 url,因为如果您在本地调用文件,则有必要的查询字符串变量不起作用。

I also tried fopen, readfile, and curl, and they were all similarly slow. Any ideas on where to look to fix this?

我也试过fopen, readfile, 和curl, 它们都同样缓慢。关于在哪里寻找解决这个问题的任何想法?

回答by KrisWebDev

Note:This has been fixed in PHP 5.6.14. A Connection: closeheader will now automatically be sent even for HTTP/1.0 requests. See commit 4b1dff6.

注意:这已在 PHP 5.6.14 中修复。甲Connection: close头将现在自动即使对于HTTP / 1.0请求发送。请参阅提交4b1dff6

I had a hard time figuring out the cause of the slowness of file_get_contents scripts.

我很难找出 file_get_contents 脚本缓慢的原因。

By analyzing it with Wireshark, the issue (in my case and probably yours too) was that the remote web server DIDN'T CLOSE THE TCP CONNECTION UNTIL 15 SECONDS (i.e. "keep-alive").

通过使用 Wireshark 对其进行分析,问题(在我的情况下,也可能是您的情况)是远程 Web 服务器在 15 秒之前没有关闭 TCP 连接(即“保持活动”)。

Indeed, file_get_contents doesn't send a "connection" HTTP header, so the remote web server considers by default that's it's a keep-alive connection and doesn't close the TCP stream until 15 seconds (It might not be a standard value - depends on the server conf).

实际上,file_get_contents 不会发送“连接”HTTP 标头,因此远程 Web 服务器默认认为这是一个保持活动的连接,并且直到 15 秒才会关闭 TCP 流(它可能不是标准值 - 取决于在服务器配置上)。

A normal browser would consider the page is fully loaded if the HTTP payload length reaches the length specified in the response Content-Length HTTP header. File_get_contents doesn't do this and that's a shame.

如果 HTTP 负载长度达到响应 Content-Length HTTP 标头中指定的长度,则普通浏览器会认为页面已完全加载。File_get_contents 不这样做,这是一种耻辱。

SOLUTION

解决方案

SO, if you want to know the solution, here it is:

所以,如果你想知道解决方案,这里是:

$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
file_get_contents("http://www.something.com/somepage.html",false,$context);

The thing is just to tell the remote web server to close the connection when the download is complete, as file_get_contents isn't intelligent enough to do it by itself using the response Content-Length HTTP header.

事情只是告诉远程 Web 服务器在下载完成时关闭连接,因为 file_get_contents 不够智能,无法使用响应 Content-Length HTTP 标头自行完成。

回答by Jim W.

I would use curl()to fetch external content, as this is much quicker than the file_get_contentsmethod. Not sure if this will solve the issue, but worth a shot.

我会使用curl()来获取外部内容,因为这比该file_get_contents方法快得多。不确定这是否能解决问题,但值得一试。

Also note that your servers speed will effect the time it takes to retrieve the file.

另请注意,您的服务器速度会影响检索文件所需的时间。

Here is an example of usage:

下面是一个使用示例:

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://example.com/test.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
curl_close($ch);

回答by diyism

Sometimes, it's because the DNS is too slow on your server, try this:

有时,这是因为您服务器上的 DNS 速度太慢,请尝试以下操作:

replace

代替

echo file_get_contents('http://www.google.com');

as

作为

$context=stream_context_create(array('http' => array('header'=>"Host: www.google.com\r\n")));
echo file_get_contents('http://74.125.71.103', false, $context);

回答by Walid Ammar

I had the same issue,

我遇到过同样的问题,

The only thing that worked for me is setting timeout in $optionsarray.

唯一对我有用的是在$options数组中设置超时。

$options = array(
    'http' => array(
        'header'  => implode($headers, "\r\n"),
        'method'  => 'POST',
        'content' => '',
        'timeout' => .5
    ),
);

回答by Marc B

Can you try fetching that url, on the server, from the command line? curl or wget come to mind. If those retrieve the URL at a normal speed, then it's not a network problem and most likely something in the apache/php setup.

您可以尝试在服务器上从命令行获取该 url 吗?想到 curl 或 wget。如果那些以正常速度检索 URL,那么这不是网络问题,很可能是 apache/php 设置中的问题。

回答by Amito

$context = stream_context_create(array('http' => array('header'=>'Connection: close\r\n')));
$string = file_get_contents("http://localhost/testcall/request.php",false,$context);

Time: 50976 ms(avaerage time in total 5 attempts)

时间:50976 ms(总共 5 次尝试的平均时间)

$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, "http://localhost/testcall/request.php");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
echo $data = curl_exec($ch);
curl_close($ch);

Time: 46679 ms(avaerage time in total 5 attempts)

时间:46679 ms(总共 5 次尝试的平均时间)

Note: request.php is used to fetch some data from mysql database.

注意:request.php 用于从 mysql 数据库中获取一些数据。

回答by Elyor

I have a huge data passed by API, I'm using file_get_contentsto read the data, but it took around 60 seconds. However, using KrisWebDev's solution it took around 25 seconds.

我通过 API 传递了大量数据,我正在使用它file_get_contents来读取数据,但它花了大约60 秒。但是,使用 KrisWebDev 的解决方案大约需要25 秒

$context = stream_context_create(array('https' => array('header'=>'Connection: close\r\n')));
file_get_contents($url,false,$context);

回答by Mike Q

What I would also consider with Curl is that you can "thread" the requests. This has helped me immensely as I do not have access to a version of PHP that allows threading at the moment .

我还会考虑使用 Curl 是您可以“线程化”请求。这对我帮助很大,因为我目前无法访问允许线程处理的 PHP 版本。

For example, I was getting 7 images from a remote server using file_get_contents and it was taking 2-5 seconds per request. This process alone was adding 30seconds or something to the process, while the user waited for the PDF to be generated.

例如,我使用 file_get_contents 从远程服务器获取 7 个图像,每个请求需要 2-5 秒。仅此过程就为过程增加了 30 秒或其他时间,而用户则等待生成 PDF。

This literally reduced the time to about 1 image. Another example, I verify 36 urls in the time it took before to do one. I think you get the point. :-)

这实际上将时间减少到大约 1 张图像。再举一个例子,我在之前验证了 36 个 url 所花费的时间。我认为你说对了。:-)

    $timeout = 30;
    $retTxfr = 1;
    $user = '';
    $pass = '';

    $master = curl_multi_init();
    $node_count = count($curlList);
    $keys = array("url");

    for ($i = 0; $i < $node_count; $i++) {
        foreach ($keys as $key) {
            if (empty($curlList[$i][$key])) continue;
            $ch[$i][$key] = curl_init($curlList[$i][$key]);
            curl_setopt($ch[$i][$key], CURLOPT_TIMEOUT, $timeout); // -- timeout after X seconds
            curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, $retTxfr);
            curl_setopt($ch[$i][$key], CURLOPT_HTTPAUTH, CURLAUTH_ANY);
            curl_setopt($ch[$i][$key], CURLOPT_USERPWD, "{$user}:{$pass}");
            curl_setopt($ch[$i][$key], CURLOPT_RETURNTRANSFER, true);
            curl_multi_add_handle($master, $ch[$i][$key]);
        }
    }

    // -- get all requests at once, finish when done or timeout met --
    do {  curl_multi_exec($master, $running);  }
    while ($running > 0);

Then check over the results:

然后检查结果:

            if ((int)curl_getinfo($ch[$i][$key], CURLINFO_HTTP_CODE) > 399 || empty($results[$i][$key])) {
                unset($results[$i][$key]);
            } else {
                $results[$i]["options"] = $curlList[$i]["options"];
            }
            curl_multi_remove_handle($master, $ch[$i][$key]);
            curl_close($ch[$i][$key]);

then close file:

然后关闭文件:

    curl_multi_close($master);

回答by ElChupacabra

I know that is old question but I found it today and answers didn't work for me. I didn't see anyone saying that max connections per IP may be set to 1. That way you are doing API request and API is doing another request because you use full url. That's why loading directly from disc works. For me that fixed a problem:

我知道这是个老问题,但我今天找到了,答案对我不起作用。我没有看到有人说每个 IP 的最大连接数可能设置为 1。这样你就在做 API 请求,而 API 正在做另一个请求,因为你使用了完整的 url。这就是直接从光盘加载的原因。对我来说,解决了一个问题:

if (strpos($file->url, env('APP_URL')) === 0) {
    $url = substr($file->url, strlen(env('APP_URL')));
} else {
    $url = $file->url;
}
return file_get_contents($url);