php - strpos 是在大量文本中搜索字符串的最快方法吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3874063/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 11:21:48  来源:igfitidea点击:

php - Is strpos the fastest way to search for a string in a large body of text?

phpstringhashstring-search

提问by Bob Cavezza

if (strpos(htmlentities($storage->getMessage($i)),'chocolate')) 

Hi, I'm using gmail oauth access to find specific text strings in email addresses. Is there a way to find text instances quicker and more efficiently than using strpos in the above code? Should I be using a hash technique?

您好,我正在使用 gmail oauth 访问来查找电子邮件地址中的特定文本字符串。有没有办法比在上面的代码中使用 strpos 更快、更有效地找到文本实例?我应该使用哈希技术吗?

回答by stevendesu

According to the PHP manual, yes- strpos()is the quickest way to determine if one string contains another.

根据 PHP 手册, yes-strpos()是确定一个字符串是否包含另一个字符串的最快方法。

Note:

If you only want to determine if a particular needle occurs within haystack, use the faster and less memory intensive function strpos() instead.

笔记:

如果您只想确定特定的针是否出现在 haystack 中,请改用速度更快、内存占用更少的函数 strpos()。

This is quoted time and again in any php.net article about other string comparators (I pulled this one from strstr())

这在任何关于其他字符串比较器的 php.net 文章中一次又一次地被引用(我从 中提取了这个strstr()

Although there are two changes that should be made to your statement.

尽管应该对您的陈述进行两项更改。

if (strpos($storage->getMessage($i),'chocolate') !== FALSE)

This is because if(0)evaluates to false (and therefore doesn't run), however strpos()can return 0 if the needle is at the very beginning (position 0) of the haystack. Also, removing htmlentities()will make your code run a lot faster. All that htmlentities()does is replace certain characters with their appropriate HTML equivalent. For instance, it replaces every &with &

这是因为if(0)计算结果为 false(因此不会运行),但是strpos()如果指针位于 haystack 的最开始(位置 0),则可以返回 0。此外,删除htmlentities()将使您的代码运行得更快。所有这一切htmlentities()确实是与他们相应的HTML等同替换某些字符。例如,它将 each 替换&&

As you can imagine, checking everycharacter in a string individually and replacing many of them takes extra memory and processor power. Not only that, but it's unnecessary if you plan on just doing a text comparison. For instance, compare the following statements:

可以想象,单独检查字符串中的每个字符并替换其中的许多字符需要额外的内存和处理器能力。不仅如此,如果您打算只进行文本比较,则没有必要。例如,比较以下语句:

strpos('Billy & Sally', '&'); // 6
strpos('Billy & Sally', '&'); // 6
strpos('Billy & Sally', 'S'); // 8
strpos('Billy & Sally', 'S') // 12

Or, in the worst case, you may even cause something true to evaluate to false.

或者,在最坏的情况下,您甚至可能导致某些真值被评估为假。

strpos('<img src...', '<'); // 0
strpos('&lt;img src...','<'); // FALSE

In order to circumvent this you'd end up using even more HTML entities.

为了避免这种情况,您最终会使用更多的 HTML 实体。

strpos('&lt;img src...', '&lt;'); // 0

But this, as you can imagine, is not only annoying to code but gets redundant. You're better off excluding HTML entities entirely. Usually HTML entities is only used when you're outputting text. Not comparing.

但是,正如您可以想象的那样,这不仅会使代码烦人,而且会变得多余。您最好完全排除 HTML 实体。通常 HTML 实体仅在您输出文本时使用。不比较。

回答by neopickaze

strposis likely to be faster than preg_matchand the alternatives in this case, the best idea would be to do some benchmarks of your own with real example data and see what is best for your needs, although that may be overdoing it. Don't worry too much about performance until it starts to become a problem

在这种情况下,strpos可能比preg_match和替代方案更快,最好的办法是用真实的示例数据做一些你自己的基准测试,看看什么最适合你的需求,尽管这可能会过头。不要太担心性能,直到它开始成为问题

回答by kingunits

strpos return the begin position of first occurence of string, if no match will return Null so statement is fairly usable.

strpos 返回字符串第一次出现的开始位置,如果没有匹配将返回 Null 所以语句是相当有用的。

if (!is_null(strpos($storage->getMessage($i),'chocolate'))