php 如何检查字符串是否包含特定单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4366730/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 12:50:45  来源:igfitidea点击:

How do I check if a string contains a specific word?

phpstringsubstringcontainsstring-matching

提问by Charles Yeung

Consider:

考虑:

$a = 'How are you?';

if ($a contains 'are')
    echo 'true';

Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')?

假设我有上面的代码,写语句的正确方法是什么if ($a contains 'are')

回答by codaddict

You can use the strpos()function which is used to find the occurrence of one string inside another one:

您可以使用strpos()用于查找一个字符串在另一个字符串中出现的函数:

$a = 'How are you?';

if (strpos($a, 'are') !== false) {
    echo 'true';
}

Note that the use of !== falseis deliberate (neither != falsenor === truewill return the desired result); strpos()returns either the offset at which the needle string begins in the haystack string, or the boolean falseif the needle isn't found. Since 0 is a valid offset and 0 is "falsey", we can't use simpler constructs like !strpos($a, 'are').

请注意,使用!== false是故意的(既!= false不会也=== true不会返回所需的结果);strpos()返回针字符串在 haystack 字符串中开始的偏移量,或者false如果找不到针则返回布尔值。由于 0 是一个有效的偏移量,而 0 是“falsey”,我们不能使用更简单的结构,如!strpos($a, 'are').

回答by Breezer

You could use regular expressions, it's better for word matching compared to strposas mentioned by other users it will also return true for strings such as fare, care, stare, etc. This can simply be avoided in regular expression by using word boundaries.

您可以使用正则表达式,与strpos其他用户提到的相比,它更适合单词匹配,它也会对诸如 fare、care、stare 等字符串返回 true。这可以通过使用单词边界在正则表达式中简单地避免。

A simple match for are could look something like this:

are 的简单匹配可能如下所示:

$a = 'How are you?';

if (preg_match('/\bare\b/', $a)) {
    echo 'true';
}

On the performance side, strposis about three times faster and have in mind, when I did one million compares at once, it took preg_match1.5 seconds to finish and for strposit took 0.5 seconds.

在性能方面,strpos大约快了三倍,请记住,当我一次进行 100 万次比较时,preg_match完成需要1.5 秒,因为strpos它需要 0.5 秒。

Edit: In order to search any part of the string, not just word by word, I would recommend using a regular expression like

编辑:为了搜索字符串的任何部分,而不仅仅是逐字搜索,我建议使用像这样的正则表达式

$a = 'How are you?';
$search = 'are y';
if(preg_match("/{$search}/i", $a)) {
    echo 'true';
}

The iat the end of regular expression changes regular expression to be case-insensitive, if you do not want that, you can leave it out.

i在正则表达式的结尾改变正则表达式是区分大小写的,如果你不希望出现这种情况,你可以离开它。

Now, this can be quite problematic in some cases as the $search string isn't sanitized in any way, I mean, it might not pass the check in some cases as if $searchis a user input they can add some string that might behave like some different regular expression...

现在,这在某些情况下可能会很成问题,因为 $search 字符串没有以任何方式清理,我的意思是,在某些情况下它可能无法通过检查,就好像$search是用户输入一样,他们可以添加一些可能表现得像的字符串一些不同的正则表达式...

Also, here's a great tool for testing and seeing explanations of various regular expressions Regex101

此外,这里有一个很好的工具,用于测试和查看各种正则表达式的解释Regex101

To combine both sets of functionality into a single multi-purpose function (including with selectable case sensitivity), you could use something like this:

要将两组功能组合成一个单一的多用途功能(包括可选的区分大小写),您可以使用以下内容:

function FindString($needle,$haystack,$i,$word)
{   // $i should be "" or "i" for case insensitive
    if (strtoupper($word)=="W")
    {   // if $word is "W" then word search instead of string in string search.
        if (preg_match("/\b{$needle}\b/{$i}", $haystack)) 
        {
            return true;
        }
    }
    else
    {
        if(preg_match("/{$needle}/{$i}", $haystack)) 
        {
            return true;
        }
    }
    return false;
    // Put quotes around true and false above to return them as strings instead of as bools/ints.
}

回答by ejunker

Here is a little utility function that is useful in situations like this

这是一个小实用函数,在这种情况下很有用

// returns true if $needle is a substring of $haystack
function contains($needle, $haystack)
{
    return strpos($haystack, $needle) !== false;
}

回答by FtDRbwLXw6

While most of these answers will tell you if a substring appears in your string, that's usually not what you want if you're looking for a particular word, and not a substring.

虽然这些答案中的大多数会告诉您字符串中是否出现子字符串,但如果您要查找特定单词而不是字符串,这通常不是您想要的。

What's the difference? Substrings can appear within other words:

有什么不同?子串可以出现在其他词中:

  • The "are" at the beginning of "area"
  • The "are" at the end of "hare"
  • The "are" in the middle of "fares"
  • “area”开头的“are”
  • “hare”结尾的“are”
  • “票价”中间的“是”

One way to mitigate this would be to use a regular expression coupled with word boundaries(\b):

缓解这种情况的一种方法是使用正则表达式加上单词边界( \b):

function containsWord($str, $word)
{
    return !!preg_match('#\b' . preg_quote($word, '#') . '\b#i', $str);
}

This method doesn't have the same false positives noted above, but it does have some edge cases of its own. Word boundaries match on non-word characters (\W), which are going to be anything that isn't a-z, A-Z, 0-9, or _. That means digits and underscores are going to be counted as word characters and scenarios like this will fail:

这种方法没有上面提到的相同的误报,但它确实有一些自己的边缘情况。单词边界匹配非单词字符 ( \W),这些字符将是不是a-zA-Z0-9、 或 的任何内容_。这意味着数字和下划线将被计为单词字符,这样的场景将失败:

  • The "are" in "What _are_ you thinking?"
  • The "are" in "lol u dunno wut those are4?"
  • “你在想什么_是_?”中的“是”
  • “大声笑,你不知道那些是 4?”中的“是”吗?

If you want anything more accurate than this, you'll have to start doing English language syntax parsing, and that's a pretty big can of worms (and assumes proper use of syntax, anyway, which isn't always a given).

如果你想要比这更准确的东西,你就必须开始做英语语法解析,这是一大堆蠕虫(并且假设正确使用语法,无论如何,这并不总是给定的)。

回答by Jose Vega

To determine whether a string contains another string you can use the PHP function strpos().

要确定一个字符串是否包含另一个字符串,您可以使用 PHP 函数strpos()

int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

<?php

$haystack = 'how are you';
$needle = 'are';

if (strpos($haystack,$needle) !== false) {
    echo "$haystack contains $needle";
}

?>

CAUTION:

警告:

If the needle you are searching for is at the beginning of the haystack it will return position 0, if you do a ==compare that will not work, you will need to do a ===

如果您要搜索的针位于干草堆的开头,它将返回位置 0,如果您进行的==比较不起作用,则需要执行===

A ==sign is a comparison and tests whether the variable / expression / constant to the left has the same value as the variable / expression / constant to the right.

==符号是比较和试验可变/表达/常数向左是否具有相同的值作为变量/表达/恒定到右侧。

A ===sign is a comparison to see whether two variables / expresions / constants are equal ANDhave the same type - i.e. both are strings or both are integers.

一个===标志是一个比较,看两个变量/ expresions /常数是否等于AND有相同的类型-即两个都是字符串或都是整数。

回答by Haim Evgi

Look at strpos():

看看strpos()

<?php
    $mystring = 'abc';
    $findme   = 'a';
    $pos = strpos($mystring, $findme);

    // Note our use of ===. Simply, == would not work as expected
    // because the position of 'a' was the 0th (first) character.
    if ($pos === false) {
        echo "The string '$findme' was not found in the string '$mystring'.";
    }
    else {
        echo "The string '$findme' was found in the string '$mystring',";
        echo " and exists at position $pos.";
    }
?>

回答by glutorange

Using strstr()or stristr()if your search should be case insensitive would be another option.

使用strstr()orstristr()如果您的搜索不区分大小写将是另一种选择。

回答by RafaSashi

Peer to SamGoody and Lego Stormtroopr comments.

查看 SamGoody 和 Lego Stormtroopr 的评论。

If you are looking for a PHP algorithm to rank search results based on proximity/relevanceof multiple words here comes a quick and easy way of generating search results with PHP only:

如果您正在寻找一种 PHP 算法来根据多个单词的接近度/相关性搜索结果进行排名,这里提供了一种仅使用 PHP 生成搜索结果的快速简便方法:

Issues with the other boolean search methods such as strpos(), preg_match(), strstr()or stristr()

与其他布尔搜索方法的问题,如strpos()preg_match()strstr()stristr()

  1. can't search for multiple words
  2. results are unranked
  1. 无法搜索多个词
  2. 结果未排序

PHP method based on Vector Space Modeland tf-idf (term frequency–inverse document frequency):

基于向量空间模型tf-idf(词频-逆文档频率)的PHP方法

It sounds difficult but is surprisingly easy.

这听起来很难,但出奇的容易。

If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?

如果我们想在一个字符串中搜索多个单词,核心问题是我们如何为每个单词分配一个权重?

If we could weight the terms in a string based on how representative they are of the string as a whole, we could order our results by the ones that best match the query.

如果我们可以根据字符串作为一个整体的代表性对字符串中的术语进行加权,我们就可以按照与查询最匹配的结果对结果进行排序。

This is the idea of the vector space model, not far from how SQL full-text search works:

这是向量空间模型的思想,与SQL 全文搜索的工作原理相差不远:

function get_corpus_index($corpus = array(), $separator=' ') {

    $dictionary = array();

    $doc_count = array();

    foreach($corpus as $doc_id => $doc) {

        $terms = explode($separator, $doc);

        $doc_count[$doc_id] = count($terms);

        // tf–idf, short for term frequency–inverse document frequency, 
        // according to wikipedia is a numerical statistic that is intended to reflect 
        // how important a word is to a document in a corpus

        foreach($terms as $term) {

            if(!isset($dictionary[$term])) {

                $dictionary[$term] = array('document_frequency' => 0, 'postings' => array());
            }
            if(!isset($dictionary[$term]['postings'][$doc_id])) {

                $dictionary[$term]['document_frequency']++;

                $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0);
            }

            $dictionary[$term]['postings'][$doc_id]['term_frequency']++;
        }

        //from http://phpir.com/simple-search-the-vector-space-model/

    }

    return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}

function get_similar_documents($query='', $corpus=array(), $separator=' '){

    $similar_documents=array();

    if($query!=''&&!empty($corpus)){

        $words=explode($separator,$query);

        $corpus=get_corpus_index($corpus, $separator);

        $doc_count=count($corpus['doc_count']);

        foreach($words as $word) {

            if(isset($corpus['dictionary'][$word])){

                $entry = $corpus['dictionary'][$word];


                foreach($entry['postings'] as $doc_id => $posting) {

                    //get term frequency–inverse document frequency
                    $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2);

                    if(isset($similar_documents[$doc_id])){

                        $similar_documents[$doc_id]+=$score;

                    }
                    else{

                        $similar_documents[$doc_id]=$score;

                    }
                }
            }
        }

        // length normalise
        foreach($similar_documents as $doc_id => $score) {

            $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id];

        }

        // sort from  high to low

        arsort($similar_documents);

    }   

    return $similar_documents;
}

CASE 1

情况1

$query = 'are';

$corpus = array(
    1 => 'How are you?',
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULT

结果

Array
(
    [1] => 0.52832083357372
)

CASE 2

案例二

$query = 'are';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

结果

Array
(
    [1] => 0.54248125036058
    [3] => 0.21699250014423
)

CASE 3

案例3

$query = 'we are done';

$corpus = array(
    1 => 'how are you today?',
    2 => 'how do you do',
    3 => 'here you are! how are you? Are we done yet?'
);

$match_results=get_similar_documents($query,$corpus);
echo '<pre>';
    print_r($match_results);
echo '</pre>';

RESULTS

结果

Array
(
    [3] => 0.6813781191217
    [1] => 0.54248125036058
)

There are plenty of improvements to be made but the model provides a way of getting good results from natural queries, which don't have boolean operators such as strpos(), preg_match(), strstr()or stristr().

有很多的改进,进行但该模型提供了获得自然查询,这没有布尔运算符,如良好的效果的一种方式strpos()preg_match()strstr()stristr()

NOTA BENE

诺塔贝尼

Optionally eliminating redundancy prior to search the words

可选地在搜索词之前消除冗余

  • thereby reducing index size and resulting in less storage requirement

  • less disk I/O

  • faster indexing and a consequently faster search.

  • 从而减少索引大小并导致更少的存储需求

  • 更少的磁盘 I/O

  • 更快的索引和因此更快的搜索。

1. Normalisation

1. 规范化

  • Convert all text to lower case
  • 将所有文本转换为小写

2. Stopword elimination

2. 停用词消除

  • Eliminate words from the text which carry no real meaning (like 'and', 'or', 'the', 'for', etc.)
  • 从文本中删除没有实际意义的单词(如“and”、“or”、“the”、“for”等)

3. Dictionary substitution

3. 字典替换

  • Replace words with others which have an identical or similar meaning. (ex:replace instances of 'hungrily' and 'hungry' with 'hunger')

  • Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.

  • The replacement of colour names with their hexadecimal equivalents

  • The reduction of numeric values by reducing precision are other ways of normalising the text.

  • 用其他具有相同或相似含义的词替换。(例如:用“饥饿”替换“饥饿”和“饥饿”的实例)

  • 可以执行进一步的算法测量(滚雪球)以进一步将单词简化为它们的基本含义。

  • 用十六进制等价物替换颜色名称

  • 通过降低精度来减少数值是规范化文本的另一种方法。

RESOURCES

资源

回答by Shankar Damodaran

Make use of case-insensitve matchingusing stripos():

利用案例钝感配套使用stripos()

if (stripos($string,$stringToSearch) !== false) {
    echo 'true';
}

回答by Alan Piralla

If you want to avoid the "falsey" and "truthy" problem, you can use substr_count:

如果你想避免“虚假”和“真实”的问题,你可以使用substr_count:

if (substr_count($a, 'are') > 0) {
    echo "at least one 'are' is present!";
}

It's a bit slower than strpos but it avoids the comparison problems.

它比 strpos 慢一点,但它避免了比较问题。