PHP 函数的 Big-O 列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2473989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
List of Big-O for PHP functions
提问by Kendall Hopkins
After using PHP for a while now, I've noticed that not all built-in PHP functions are as fast as expected. Consider these two possible implementations of a function that finds if a number is prime using a cached array of primes.
使用 PHP 一段时间后,我注意到并非所有内置的 PHP 函数都像预期的那样快。考虑使用缓存的素数数组查找数字是否为素数的函数的这两种可能实现。
//very slow for large $prime_array
$prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... );
$result_array = array();
foreach( $prime_array => $number ) {
$result_array[$number] = in_array( $number, $large_prime_array );
}
//speed is much less dependent on size of $prime_array, and runs much faster.
$prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL,
11 => NULL, 13 => NULL, .... 104729 => NULL, ... );
foreach( $prime_array => $number ) {
$result_array[$number] = array_key_exists( $number, $large_prime_array );
}
This is because in_arrayis implemented with a linear search O(n) which will linearly slow down as $prime_arraygrows. Where the array_key_existsfunction is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(n)).
这是因为in_array使用线性搜索 O(n) 实现,它会随着$prime_array增长线性减慢。该array_key_exists函数是通过哈希查找 O(1) 实现的,除非哈希表变得非常填充(在这种情况下,它只是 O(n)),否则不会减慢速度。
So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question...
到目前为止,我不得不通过反复试验发现大 O,并偶尔查看源代码。现在的问题...
Is there a list of the theoretical (or practical) big O times for all* the built-in PHP functions?
是否有所有内置 PHP 函数的理论(或实践)大 O 时间列表?
*or at least the interesting ones
*或至少是有趣的
For example, I find it very hard to predict the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace(with array inputs), etc.
例如,我发现很难预测列出的函数的大 O,因为可能的实现取决于 PHP 的未知核心数据结构:array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace(带有数组输入)等。
回答by Kendall Hopkins
Since it doesn't seem like anyone has done this before I thought it'd be good idea to have it for reference somewhere. I've gone though and either via benchmark or code-skimming to characterize the array_*functions. I've tried to put the more interesting Big-O near the top. This list is not complete.
因为在我认为最好将它放在某处以供参考之前似乎没有人这样做过。我已经通过基准测试或代码略读来描述array_*功能。我试图将更有趣的 Big-O 放在靠近顶部的位置。这份清单并不完整。
Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it's really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_existsat N=1 and N=1,000,000 is ~50% time increase.
注意:假设哈希查找计算的所有 Big-O 都是 O(1),即使它实际上是 O(n)。n 的系数如此之低,在查找 Big-O 的特性开始生效之前,存储足够大数组的 ram 开销会伤害您。例如,array_key_exists在 N=1 和 N=1,000,000 处调用之间的差异是大约 50% 的时间增加。
Interesting Points:
有趣的点:
isset/array_key_existsis much faster thanin_arrayandarray_search+(union) is a bit faster thanarray_merge(and looks nicer). But it does work differently so keep that in mind.shuffleis on the same Big-O tier asarray_randarray_pop/array_pushis faster thanarray_shift/array_unshiftdue to re-index penalty
isset/array_key_exists比in_arrayand快得多array_search+(union) 比array_merge(并且看起来更好)快一点。但它的工作方式有所不同,所以请记住这一点。shuffle与 Big-O 层级相同array_randarray_pop/由于重新索引惩罚而array_push比array_shift/快array_unshift
Lookups:
查找:
array_key_existsO(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.
array_key_existsO(n) 但真的接近 O(1)——这是因为碰撞中的线性轮询,但是因为碰撞的机会很小,所以系数也很小。我发现您将哈希查找视为 O(1) 以提供更现实的大 O。例如,N=1000 和 N=100000 之间的差异只有大约 50% 的减速。
isset( $array[$index] )O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it's language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.
isset( $array[$index] )O(n) 但非常接近 O(1) - 它使用与 array_key_exists 相同的查找。由于它是语言结构,如果键是硬编码的,将缓存查找,从而在重复使用相同键的情况下加快速度。
in_arrayO(n) - this is because it does a linear search though the array until it finds the value.
in_arrayO(n) - 这是因为它通过数组进行线性搜索,直到找到该值。
array_searchO(n) - it uses the same core function as in_array but returns value.
array_searchO(n) - 它使用与 in_array 相同的核心函数,但返回值。
Queue functions:
队列功能:
array_pushO(∑ var_i, for all i)
array_pushO(∑ var_i, 对于所有 i)
array_popO(1)
array_popO(1)
array_shiftO(n) - it has to reindex all the keys
array_shiftO(n) - 它必须重新索引所有的键
array_unshiftO(n + ∑ var_i, for all i) - it has to reindex all the keys
array_unshiftO(n + ∑ var_i, for all i) - 它必须重新索引所有的键
Array Intersection, Union, Subtraction:
数组交集、联合、减法:
array_intersect_keyif intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_intersect_key如果相交 100% 做 O(Max(param_i_size)*∑param_i_count, for all i), 如果相交 0% 相交 O(∑param_i_size, for all i)
array_intersectif intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)
array_intersect如果相交 100% 做 O(n^2*∑param_i_count, for all i),如果相交 0% 相交 O(n^2)
array_intersect_associf intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_intersect_assoc如果相交 100% 做 O(Max(param_i_size)*∑param_i_count, for all i), 如果相交 0% 相交 O(∑param_i_size, for all i)
array_diffO(π param_i_size, for all i) - That's product of all the param_sizes
array_diffO(π param_i_size, for all i) - 这是所有 param_sizes 的乘积
array_diff_keyO(∑ param_i_size, for i != 1) - this is because we don't need to iterate over the first array.
array_diff_keyO(∑ param_i_size, for i != 1) - 这是因为我们不需要迭代第一个数组。
array_mergeO( ∑ array_i, i != 1 ) - doesn't need to iterate over the first array
array_mergeO( ∑ array_i, i != 1 ) - 不需要迭代第一个数组
+(union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn't have to renumber
+(union) O(n),其中 n 是第二个数组的大小(即 array_first + array_second)——开销比 array_merge 少,因为它不必重新编号
array_replaceO( ∑ array_i, for all i )
array_replaceO( ∑ array_i, 对于所有 i )
Random:
随机:
shuffleO(n)
shuffle在)
array_randO(n) - Requires a linear poll.
array_randO(n) - 需要线性轮询。
Obvious Big-O:
明显的大O:
array_fillO(n)
array_fill在)
array_fill_keysO(n)
array_fill_keys在)
rangeO(n)
range在)
array_spliceO(offset + length)
array_spliceO(偏移量 + 长度)
array_sliceO(offset + length) or O(n) if length = NULL
array_sliceO(offset + length) 或 O(n) 如果长度 = NULL
array_keysO(n)
array_keys在)
array_valuesO(n)
array_values在)
array_reverseO(n)
array_reverse在)
array_padO(pad_size)
array_padO(pad_size)
array_flipO(n)
array_flip在)
array_sumO(n)
array_sum在)
array_productO(n)
array_product在)
array_reduceO(n)
array_reduce在)
array_filterO(n)
array_filter在)
array_mapO(n)
array_map在)
array_chunkO(n)
array_chunk在)
array_combineO(n)
array_combine在)
I'd like to thank Eureqafor making it easy to find the Big-O of the functions. It's an amazing freeprogram that can find the best fitting function for arbitrary data.
我要感谢Eureqa使找到函数的 Big-O 变得容易。这是一个了不起的免费程序,可以为任意数据找到最佳拟合函数。
EDIT:
编辑:
For those who doubt that PHP array lookups are O(N), I've written a benchmark to test that (they are still effectively O(1)for most realistic values).
对于那些怀疑 PHP 数组查找是否为 的人O(N),我编写了一个基准测试来测试(它们O(1)对于大多数实际值仍然有效)。


$tests = 1000000;
$max = 5000001;
for( $i = 1; $i <= $max; $i += 10000 ) {
//create lookup array
$array = array_fill( 0, $i, NULL );
//build test indexes
$test_indexes = array();
for( $j = 0; $j < $tests; $j++ ) {
$test_indexes[] = rand( 0, $i-1 );
}
//benchmark array lookups
$start = microtime( TRUE );
foreach( $test_indexes as $test_index ) {
$value = $array[ $test_index ];
unset( $value );
}
$stop = microtime( TRUE );
unset( $array, $test_indexes, $test_index );
printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
unset( $stop, $start );
}
回答by Dathan
The explanation for the case you specifically describe is that associative arrays are implemented as hash tables - so lookup by key (and correspondingly, array_key_exists) is O(1). However, arrays aren't indexed by value, so the only way in the general case to discover whether a value exists in the array is a linear search. There's no surprise there.
您具体描述的情况的解释是关联数组是作为哈希表实现的 - 因此按键(以及相应的array_key_exists)查找是 O(1)。但是,数组不是按值索引的,因此在一般情况下发现数组中是否存在值的唯一方法是线性搜索。这并不奇怪。
I don't think there's specific comprehensive documentation of the algorithmic complexity of PHP methods. However, if it's a big enough concern to warrant the effort, you can always look through the source code.
我认为没有关于 PHP 方法算法复杂性的具体综合文档。但是,如果这是一个足够大的问题,值得付出努力,您可以随时查看源代码。
回答by ryeguy
You almost always want to use issetinstead of array_key_exists. I'm not looking at the internals, but I'm pretty sure that array_key_existsis O(N) because it iterates over each and every key of the array, while issettries to access the element using the same hash algorithm that is used when you access an array index. That should be O(1).
您几乎总是想使用isset而不是array_key_exists. 我不是在看内部结构,但我很确定这array_key_exists是 O(N),因为它遍历数组的每个键,同时isset尝试使用访问时使用的相同哈希算法访问元素数组索引。那应该是 O(1)。
One "gotcha" to watch out for is this:
需要注意的一个“问题”是:
$search_array = array('first' => null, 'second' => 4);
// returns false
isset($search_array['first']);
// returns true
array_key_exists('first', $search_array);
I was curious, so I benchmarked the difference:
我很好奇,所以我对差异进行了基准测试:
<?php
$bigArray = range(1,100000);
$iterations = 1000000;
$start = microtime(true);
while ($iterations--)
{
isset($bigArray[50000]);
}
echo 'is_set:', microtime(true) - $start, ' seconds', '<br>';
$iterations = 1000000;
$start = microtime(true);
while ($iterations--)
{
array_key_exists(50000, $bigArray);
}
echo 'array_key_exists:', microtime(true) - $start, ' seconds';
?>
is_set:0.132308959961 secondsarray_key_exists:2.33202195168 seconds
is_set:0.132308959961 秒array_key_exists:2.33202195168 秒
Of course, this doesn't show time complexity, but it does show how the 2 functions compare to each other.
当然,这并没有显示时间复杂度,但它确实显示了这两个函数如何相互比较。
To test for time complexity, compare the amount of time it takes to run one of these functions on the first key and the last key.
要测试时间复杂度,请比较在第一个键和最后一个键上运行其中一个函数所需的时间。
回答by Josh Stern
If people were running into trouble in practice with key collisions, they would implement containers with a secondary hash lookup or balanced tree. The balanced tree would give O(log n) worst case behavior and O(1) avg. case (the hash itself). The overhead is not worth it in most practical in memory applications, but perhaps there are databases that implement this form of mixed strategy as their default case.
如果人们在实践中遇到密钥冲突的问题,他们会使用二级哈希查找或平衡树来实现容器。平衡树将给出 O(log n) 最坏情况行为和 O(1) 平均。案例(哈希本身)。在大多数内存应用程序中,开销并不值得,但也许有些数据库将这种形式的混合策略作为它们的默认情况。

