file_exists() 在 PHP 中太慢了。任何人都可以提出更快的替代方案吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1708768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 03:36:17  来源:igfitidea点击:

file_exists() is too slow in PHP. Can anyone suggest a faster alternative?

phpperformancefile-exists

提问by Rik Heywood

When displaying images on our website, we check if the file exists with a call to file_exists(). We fall back to a dummy image if the file was missing.

在我们的网站上显示图像时,我们会通过调用来检查文件是否存在file_exists()。如果文件丢失,我们将退回到虚拟图像。

However, profiling has shown that this is the slowest part of generating our pages with file_exists()taking up to 1/2 msper file. We are only testing 40 or so files, but this still pushes 20msonto the page load time.

但是,分析表明这是生成页面最慢的部分,每个文件file_exists()最多占用1/2 毫秒。我们只测试了 40 个左右的文件,但这仍然会将 20毫秒推到页面加载时间上。

Can anyone suggest a way of making this go faster?Is there a better way of testing if the file is present? If I build a cache of some kind, how should I keep it in sync.

任何人都可以建议一种方法来加快速度吗?有没有更好的方法来测试文件是否存在?如果我建立某种缓存,我应该如何保持同步。

采纳答案by RC.

file_exists()should be a very inexpensive operation. Note too that file_existsbuilds its own cache to help with performance.

file_exists()应该是一个非常便宜的操作。还要注意,它file_exists会构建自己的缓存以帮助提高性能。

See: http://php.net/manual/en/function.file-exists.php

请参阅:http: //php.net/manual/en/function.file-exists.php

回答by powtac

Use absolute paths!Depending on your include_pathsetting PHP checks all(!) these dirs if you check relative file paths! You might unset include_pathtemporarily before checking the existence.

使用绝对路径!include_path如果您检查相对文件路径,PHP 会根据您的设置检查所有(!)这些目录!include_path在检查存在之前,您可能会暂时取消设置。

realpath()does the same but I don't know if it is faster.

realpath()做同样的,但我不知道它是否更快。

But file access I/O is always slow. A hard disk access ISslower than calculating something in the processor, normally.

但是文件访问 I/O 总是很慢。硬盘访问IS比计算在所述处理器的东西,通常是较慢的。

回答by Alexander Yancharuk

The fastest way to check existence of a local file is stream_resolve_include_path():

检查本地文件是否存在的最快方法是stream_resolve_include_path()

if (false !== stream_resolve_include_path($s3url)) { 
  //do stuff 
}

Performance results stream_resolve_include_path()vs file_exists():

性能结果stream_resolve_include_path()file_exists()

Test name       Repeats         Result          Performance     
stream_resolve  10000           0.051710 sec    +0.00%
file_exists     10000           0.067452 sec    -30.44%

In test used absolute paths. Test source is here. PHP version:

在测试中使用了绝对路径。测试源在这里。PHP版本:

PHP 5.4.23-1~dotdeb.1 (cli) (built: Dec 13 2013 21:53:21)
Copyright (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2013 Zend Technologies

PHP 5.4.23-1~dotdeb.1 (cli) (built: Dec 13 2013 21:53:21)
版权所有 (c) 1997-2013 The PHP Group
Zend Engine v2.4.0, 版权所有 (c) 1998-2013 Zend Technologies

回答by jensgram

We fall back to a dummy image if the file was missing

如果文件丢失,我们将退回到虚拟图像

If you're just interested in falling back to this dummy image, you might want to consider letting the client negotiate with the server by means of a redirect (to the dummy image) on file-not-found.

如果您只是对回退到这个虚拟图像感兴趣,您可能需要考虑让客户端通过在文件未找到时重定向(到虚拟图像)的方式与服务器协商。

That way you'll just have a little redirection overhead and a not-noticeable delay on the client side. At least you'll get rid of the "expensive" (which it isn't, I know) call to file_exists.

这样你只会有一点重定向开销和客户端不明显的延迟。至少你会摆脱对file_exists.

Just a thought.

只是一个想法。

回答by Jonathan Falkner

Benchmarks with PHP 5.6:

PHP 5.6 的基准测试:

Existing File:

现有文件:

0.0012969970 : stream_resolve_include_path + include  
0.0013520717 : file_exists + include  
0.0013728141 : @include  

Invalid File:

无效的文件:

0.0000281333 : file_exists + include  
0.0000319480 : stream_resolve_include_path + include  
0.0001471042 : @include  

Invalid Folder:

无效文件夹:

0.0000281333 : file_exists + include  
0.0000360012 : stream_resolve_include_path + include  
0.0001239776 : @include  

Code:

代码:

// microtime(true) is less accurate.
function microtime_as_num($microtime){
  $time = array_sum(explode(' ', $microtime));
  return $time;
}

function test_error_suppression_include ($file) {
  $x = 0;
  $x = @include($file);
  return $x;
}

function test_file_exists_include($file) {
  $x = 0;
  $x = file_exists($file);
  if ($x === true) {
    include $file;
  }
  return $x;
}

function test_stream_resolve_include_path_include($file) {
  $x = 0;
  $x = stream_resolve_include_path($file);
  if ($x !== false) {
    include $file;
  }
  return $x;
}

function run_test($file, $test_name) {
  echo $test_name . ":\n";
  echo str_repeat('=',strlen($test_name) + 1) . "\n";

  $results = array();
  $dec = 10000000000; // digit precision as a multiplier

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_error_suppression_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time*$dec] = '@include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_stream_resolve_include_path_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec] = 'stream_resolve_include_path + include';

  $i = 0;
  $j = 0;
  $time_start = 0;
  $time_end = 0;
  $x = -1;
  $time = 0;

  $time_start = microtime();
  $x= test_file_exists_include($file);
  $time_end = microtime();
  $time = microtime_as_num($time_end) - microtime_as_num($time_start);

  $results[$time * $dec ] = 'file_exists + include';

  ksort($results, SORT_NUMERIC);

  foreach($results as $seconds => $test) {
    echo number_format($seconds/$dec,10) . ' : ' . $test . "\n";
  }
  echo "\n\n";
}

run_test($argv[1],$argv[2]);

Command line Execution:

命令行执行:

php test.php '/path/to/existing_but_empty_file.php' 'Existing File'  
php test.php '/path/to/non_existing_file.php' 'Invalid File'  
php test.php '/path/invalid/non_existing_file.php' 'Invalid Folder'  

回答by mculp

file_exists()is automatically cached by PHP. I don't think you'll find a faster function in PHP to check the existence of a file.

file_exists()由 PHP 自动缓存。我认为您不会在 PHP 中找到更快的函数来检查文件是否存在。

See this thread.

看到这个线程

回答by racerror

Create a hashing routine for sharding the files into multiple sub-directories.

创建一个散列例程,用于将文件分片到多个子目录中。

filename.jpg -> 012345 -> /01/23/45.jpg

文件名.jpg -> 012345 -> /01/23/45.jpg

Also, you could use mod_rewrite to return your placeholder image for requests to your image directory that 404.

此外,您可以使用 mod_rewrite 将占位符图像返回到您的图像目录中的 404.

回答by ViperArrow

I don't exactly know what you want to do, but you could just let the client handle it.

我不完全知道你想做什么,但你可以让客户处理它

回答by Alex

If you are only checking for existing files, use is_file(). file_exists()checks for a existing file OR directory, so maybe is_file()could be a little faster.

如果您只检查现有的files,请使用is_file(). file_exists()检查现有文件或目录,所以可能is_file()会快一点。

回答by Beracah

Old question, I'm going to add an answer here. For php 5.3.8, is_file() (for an existing file) is an order of magnitude faster. For a non-existing file, the times are nearly identical. For PHP 5.1 with eaccelerator, they are a little closer.

老问题,我要在这里添加一个答案。对于 php 5.3.8,is_file()(对于现有文件)快了一个数量级。对于不存在的文件,时间几乎相同。对于带有加速器的 PHP 5.1,它们更接近一些。

PHP 5.3.8 w & w/o APC

PHP 5.3.8 带 & 不带 APC

time ratio (1000 iterations)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.002305269241333)
    [5."is_link('exists')"] => 1.21x    (0.0027914047241211)
    [7."stream_resolve_inclu"(exists)] => 2.79x (0.0064241886138916)
    [1."file_exists('exists')"] => 13.35x   (0.030781030654907)
    [8."stream_resolve_inclu"(nonexists)] => 14.19x (0.032708406448364)
    [4."is_file('nonexists)"] => 14.23x (0.032796382904053)
    [6."is_link('nonexists)"] => 14.33x (0.033039808273315)
    [2."file_exists('nonexists)"] => 14.77x (0.034039735794067)
)

PHP 5.1 w/ eaccelerator

PHP 5.1 带加速器

time ratio (1000x)
Array
(
    [3."is_file('exists')"] => 1.00x    (0.000458002090454)
    [5."is_link('exists')"] => 1.22x    (0.000559568405151)
    [6."is_link('nonexists')"] => 3.27x (0.00149989128113)
    [4."is_file('nonexists')"] => 3.36x (0.00153875350952)
    [2."file_exists('nonexists')"] => 3.92x (0.00179600715637)
    [1."file_exists('exists"] => 4.22x  (0.00193166732788)
)

There are a couple of caveats.
1) Not all "files" are files, is_file() tests for regularfiles, not symlinks. So on a *nix system, you can't get away with just is_file() unless you are surethat you are only dealing with regular files. For uploads, etc, this may be a fair assumption, or if the server is Windows based, which does not actually have symlinks. Otherwise, you'll have to test is_file($file) || is_link($file).

有几个注意事项。
1) 并非所有“文件”都是文件,is_file() 测试常规文件,而不是符号链接。因此,在 *nix 系统上,您不能只使用 is_file() ,除非您确定您只处理常规文件。对于上传等,这可能是一个合理的假设,或者如果服务器是基于 Windows 的,它实际上没有符号链接。否则,您将不得不测试is_file($file) || is_link($file).

2) Performance definitely degrades for all methods if the file is missing and becomes roughly equal.

2) 如果文件丢失并变得大致相等,则所有方法的性能肯定会下降。

3) Biggest caveat. All the methods cache the file statistics to speed lookup, so if the file is changing regularly or quickly, deleted, reappears, deletes, then clearstatcache();has to be run to insure that the correct file existence information is in the cache. So I tested those. I left out all the filenames and such. The important thing is that almost all the times converge, except stream_resolve_include, which is 4x as fast. Again, this server has eaccelerator on it, so YMMV.

3)最大的警告。所有方法都缓存文件统计信息以加快查找速度,因此如果文件定期或快速更改、删除、重新出现、删除,则clearstatcache();必须运行以确保正确的文件存在信息在缓存中。所以我测试了这些。我省略了所有文件名等。重要的是几乎所有时间都收敛,除了 stream_resolve_include,它的速度是 4 倍。同样,这个服务器上有加速器,所以 YMMV。

time ratio (1000x)
Array
(
    [7."stream_resolve_inclu...;clearstatcache();"] => 1.00x    (0.0066831111907959)
    [1."file_exists(...........;clearstatcache();"] => 4.39x    (0.029333114624023)
    [3."is_file(................;clearstatcache();] => 4.55x    (0.030423402786255)
    [5."is_link(................;clearstatcache();] => 4.61x    (0.030798196792603)
    [4."is_file(................;clearstatcache();] => 4.89x    (0.032709360122681)
    [8."stream_resolve_inclu...;clearstatcache();"] => 4.90x    (0.032740354537964)
    [2."file_exists(...........;clearstatcache();"] => 4.92x    (0.032855272293091)
    [6."is_link(...............;clearstatcache();"] => 5.11x    (0.034154653549194)
)

Basically, the idea is, if you're 100% sure that it is a file, not a symlink or a directory, and in all probability, it will exist, then use is_file(). You'll see a definite gain. If the file could be a file or a symlink at any moment, then the failed is_file() 14x + is_link() 14x (is_file() || is_link()), and will end up being 2x slower overall. If the file's existence changes A LOT, then use stream_resolve_include_path().

基本上,这个想法是,如果您 100% 确定它是一个文件,而不是一个符号链接或目录,并且很可能它会存在,那么使用is_file(). 你会看到一定的收获。如果文件在任何时候都可以是文件或符号链接,那么失败的 is_file() 14x + is_link() 14x ( is_file() || is_link()),最终会整体慢 2倍。如果文件的存在改变了很多,那么使用 stream_resolve_include_path()。

So it depends on your usage scenario.

所以这取决于你的使用场景。