你如何在 php 中执行模式是数组的 preg_match ?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/683702/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:29:25  来源:igfitidea点击:

How do you perform a preg_match where the pattern is an array, in php?

phparrayspreg-match

提问by TravisO

I have an array full of patterns that I need matched. Any way to do that, other than a for() loop? Im trying to do it in the least CPU intensive way, since I will be doing dozens of these every minute.

我有一个充满我需要匹配的模式的数组。除了 for() 循环之外,还有什么方法可以做到这一点?我试图以最少的 CPU 密集型方式来做,因为我每分钟都会做几十个这样的事情。

Real world example is, Im building a link status checker, which will check links to various online video sites, to ensure that the videos are still live. Each domain has several "dead keywords", if these are found in the html of a page, that means the file was deleted. These are stored in the array. I need to match the contents pf the array, against the html output of the page.

现实世界的例子是,我正在构建一个链接状态检查器,它将检查到各种在线视频网站的链接,以确保视频仍然有效。每个域都有几个“死关键字”,如果在页面的 html 中找到这些,则表示该文件已被删除。这些存储在数组中。我需要将数组中的内容与页面的 html 输出进行匹配。

回答by danieltalsky

First of all, if you literally are only doing dozensevery minute, then I wouldn't worry terribly about the performance in this case. These matches are pretty quick, and I don't think you're going to have a performance problem by iterating through your patterns array and calling preg_match separately like this:

首先,如果你真的每分钟只做几十次,那么在这种情况下我不会非常担心性能。这些匹配非常快,我认为通过遍历模式数组并像这样单独调用 preg_match 不会有性能问题:

$matches = false;
foreach ($pattern_array as $pattern)
{
  if (preg_match($pattern, $page))
  {
    $matches = true;
  } 
}

You can indeed combine all the patterns into one using the oroperator like some people are suggesting, but don't just slap them together with a |. This will break badly if any of your patterns containthe or operator.

您确实可以or像某些人建议的那样使用运算符将所有模式组合为一个,但不要只是将它们与|. 如果您的任何模式包含or 运算符,这将严重破坏。

I would recommend at least grouping your patterns using parenthesis like:

我建议至少使用括号对您的模式进行分组,例如:

foreach ($patterns as $pattern)
{
  $grouped_patterns[] = "(" . $pattern . ")";
}
$master_pattern = implode($grouped_patterns, "|");

But... I'm not really sure if this ends up being faster. Somethinghas to loop through them, whether it's the preg_match or PHP. If I had to guess I'd guess that individual matches would be close to as fast and easier to read and maintain.

但是......我不确定这是否会更快。 有些东西必须遍历它们,无论是 preg_match 还是 PHP。如果我不得不猜测,我会猜测单个匹配将接近于快速且易于阅读和维护。

Lastly, if performance is what you're looking for here, I think the most important thing to do is pull out the non regex matches into a simple "string contains" check. I would imagine that some of your checks must be simple string checks like looking to see if "This Site is Closed" is on the page.

最后,如果您在这里寻找的是性能,我认为最重要的事情是将非正则表达式匹配项提取到一个简单的“字符串包含”检查中。我想您的某些检查必须是简单的字符串检查,例如查看页面上是否显示“此站点已关闭”。

So doing this:

所以这样做:

foreach ($strings_to_match as $string_to_match)
{
  if (strpos($page, $string_to_match) !== false))
  {
    // etc.
    break;
  }
}
foreach ($pattern_array as $pattern)
{
  if (preg_match($pattern, $page))
  {
    // etc.
    break;
  } 
}

and avoiding as many preg_match()as possible is probably going to be your best gain. strpos()is a lotfaster than preg_match().

避免尽可能多的preg_match()可能是你最好的收获。 strpos()是一个很大的速度比preg_match()

回答by TravisO

// assuming you have something like this
$patterns = array('a','b','\w');

// converts the array into a regex friendly or list
$patterns_flattened = implode('|', $patterns);

if ( preg_match('/'. $patterns_flattened .'/', $string, $matches) )
{
}

// PS: that's off the top of my head, I didn't check it in a code editor

回答by chiborg

If your patterns don't contain many whitespaces, another option would be to eschew the arrays and use the /xmodifier. Now your list of regular expressions would look like this:

如果您的模式不包含很多空格,另一种选择是避开数组并使用/x修饰符。现在您的正则表达式列表将如下所示:

$regex = "/
pattern1|   # search for occurences of 'pattern1'
pa..ern2|   # wildcard search for occurences of 'pa..ern2'
pat[ ]tern| # search for 'pat tern', whitespace is escaped
mypat       # Note that the last pattern does NOT have a pipe char
/x";

With the /xmodifier, whitespace is completely ignored, except when in a character class or preceded by a backslash. Comments like above are also allowed.

使用/x修饰符,空格将被完全忽略,除非在字符类中或前面有反斜杠。也允许像上面这样的评论。

This would avoid the looping through the array.

这将避免遍历数组。

回答by David Snabel-Caunt

If you're merely searching for the presence of a string in another string, use strpos as it is faster.

如果您只是在另一个字符串中搜索字符串的存在,请使用 strpos,因为它更快。

Otherwise, you could just iterate over the array of patterns, calling preg_match each time.

否则,您可以迭代模式数组,每次调用 preg_match。

回答by Seb

If you have a bunch of patterns, what you can do is concatenate them in a single regular expression and match that. No need for a loop.

如果您有一堆模式,您可以做的是将它们连接到一个正则表达式中并进行匹配。不需要循环。

回答by MiSHuTka

You can combine all the patterns from the list to single regular expression using implode()php function. Then test your string at once using preg_match()php function.

您可以使用implode()php 函数将列表中的所有模式组合为单个正则表达式。然后使用preg_match()php 函数立即测试您的字符串。

$patterns = array(
  'abc',
  '\d+h',
  '[abc]{6,8}\-\s*[xyz]{6,8}',
);

$master_pattern = '/(' . implode($patterns, ')|(') . ')/'

if(preg_match($master_pattern, $string_to_check))
{
  //do something
}

Of course there could be even less code using implode() inline in "if()" condition instead of $master_patternvariable.

当然,在“if()”条件而​​不是$master_pattern变量中使用 implode() 内联的代码可能更少。

回答by Darryl Hein

What about doing a str_replace()on the HTML you get using your array and then checking if the original HTML is equal to the original? This would be very fast:

str_replace()对使用数组获得的 HTML执行 a然后检查原始 HTML 是否等于原始 HTML 怎么样?这将非常快:

 $sites = array(
      'you_tube' => array('dead', 'moved'),
      ...
 );
 foreach ($sites as $site => $deadArray) {
     // get $html
     if ($html == str_replace($deadArray, '', $html)) { 
         // video is live
     }
 }