php 如何选择一个句子的前10个单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5956610/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 23:01:35  来源:igfitidea点击:

How to select first 10 words of a sentence?

phpstringsubstringtrim

提问by AAA

How do I, from an output, only select the first 10 words?

我如何从输出中只选择前 10 个单词?

回答by Kelly

implode(' ', array_slice(explode(' ', $sentence), 0, 10));

To add support for other word breaks like commas and dashes, preg_matchgives a quick way and doesn't require splitting the string:

要添加对逗号和破折号等其他分词符的支持,preg_match提供了一种快速的方法并且不需要拆分字符串:

function get_words($sentence, $count = 10) {
  preg_match("/(?:\w+(?:\W+|$)){0,$count}/", $sentence, $matches);
  return $matches[0];
}

As Pebbl mentions, PHP doesn't handle UTF-8 or Unicode all that well, so if that is a concern then you can replace \wfor [^\s,\.;\?\!]and \Wfor [\s,\.;\?\!].

正如 Pebbl 提到的,PHP 不能很好地处理 UTF-8 或 Unicode,所以如果这是一个问题,那么你可以替换\wfor[^\s,\.;\?\!]\Wfor [\s,\.;\?\!]

回答by Pebbl

Simply splitting on spaces will function incorrectly if there is an unexpected character in place of a space in the sentence structure, or if the sentence contains multiple conjoined spaces.

如果句子结构中出现意外字符代替空格,或者如果句子包含多个连接的空格,则简单地拆分空格将无法正常工作。

The following version will work no matter what kind of "space" you use between words and can be easily extended to handle other characters... it currently supports any white space character plus , . ; ? !

无论您在单词之间使用哪种“空格”,以下版本都可以使用,并且可以轻松扩展以处理其他字符……它目前支持任何空白字符加 , 。; ? !

function get_snippet( $str, $wordCount = 10 ) {
  return implode( 
    '', 
    array_slice( 
      preg_split(
        '/([\s,\.;\?\!]+)/', 
        $str, 
        $wordCount*2+1, 
        PREG_SPLIT_DELIM_CAPTURE
      ),
      0,
      $wordCount*2-1
    )
  );
}

Regular expressions are perfect for this issue, because you can easily make the code as flexible or strict as you like. You do have to be careful however. I specifically approached the above targeting the gaps between words — rather than the words themselves — because it is rather difficult to state unequivocally what will define a word.

正则表达式非常适合这个问题,因为您可以轻松地使代码变得灵活或严格。但是,您必须小心。我专门针对单词之间的差距而不是单词本身来处理上述问题,因为很难明确说明什么将定义一个单词。

Take the \wword boundary, or its inverse \W. I rarely rely on these, mainly because — depending on the software you are using (like certain versions of PHP) — they don't always include UTF-8 or Unicode characters.

\w词边界,或其逆\W。我很少依赖这些,主要是因为——取决于你使用的软件(比如某些版本的 PHP)——它们并不总是包含 UTF-8 或 Unicode 字符

In regular expressions it is better to be specific, at all times. So that your expressions can handle things like the following, no matter where they are rendered:

在正则表达式中,最好始终是具体的。这样您的表达式就可以处理以下内容,无论它们在何处呈现:

echo get_snippet('Это не те дроиды, которые вы ищете', 5);

/// outputs: Это не те дроиды, которые

Avoiding splitting could be worthwhile however, in terms of performance. So you could use Kelly's updated approach but switch \wfor [^\s,\.;\?\!]+and \Wfor [\s,\.;\?\!]+. Although, personally I like the simplicity of the splitting expression used above, it is easier to read and therefore modify. The stack of PHP functions however, is a bit ugly :)

然而,就性能而言,避免拆分可能是值得的。所以,你可以用凯利的更新方法,但切换\w[^\s,\.;\?\!]+\W[\s,\.;\?\!]+。虽然,我个人喜欢上面使用的拆分表达式的简单性,但它更易于阅读和修改。然而,PHP 函数堆栈有点难看 :)

回答by Spyros

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

http://snipplr.com/view/8480/a-php-function-to-return-the-first-n-words-from-a-string/

function shorten_string($string, $wordsreturned)
{
    $retval = $string;  //  Just in case of a problem
    $array = explode(" ", $string);
    /*  Already short enough, return the whole thing*/
    if (count($array)<=$wordsreturned)
    {
        $retval = $string;
    }
    /*  Need to chop of some words*/
    else
    {
        array_splice($array, $wordsreturned);
        $retval = implode(" ", $array)." ...";
    }
    return $retval;
}

回答by jawira

I suggest to use str_word_count:

我建议使用str_word_count

<?php
$str = "Lorem ipsum       dolor sit    amet, 
        consectetur        adipiscing elit";
print_r(str_word_count($str, 1));
?>

The above example will output:

上面的例子将输出:

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
)

The use a loop to get the words you want.

使用循环来获取您想要的单词。

Source: http://php.net/str_word_count

来源:http: //php.net/str_word_count

回答by Milad Rahimi

To select 10 words of the given text you can implement following function:

要选择给定文本的 10 个单词,您可以实现以下功能:

function first_words($text, $count=10)
{
    $words = explode(' ', $text);

    $result = '';
    for ($i = 0; $i < $count && isset($words[$i]); $i++) {
        $result .= $words[$i];
    }

    return $result;
}

回答by Rowlingso

This can easily be done using str_word_count()

这可以很容易地使用 str_word_count()

$first10words = implode(' ', array_slice(str_word_count($sentence,1), 0, 10));

回答by saleem ahmed

Try this

尝试这个

$str = 'Lorem ipsum dolor sit amet,consectetur adipiscing elit. Mauris ornare luctus diam sit amet mollis.';
 $arr = explode(" ", str_replace(",", ", ", $str));
 for ($index = 0; $index < 10; $index++) {
 echo $arr[$index]. " ";
}

I know this is not time to answer , but let the new comers choose their own answers.

我知道现在不是回答的时候,而是让新人选择自己的答案。

回答by Ankur Rastogi

This might help you. Function to return N no. of words

这可能对你有帮助。返回 N 号的函数。词的

public function getNWordsFromString($text,$numberOfWords = 6)
{
    if($text != null)
    {
        $textArray = explode(" ", $text);
        if(count($textArray) > $numberOfWords)
        {
            return implode(" ",array_slice($textArray, 0, $numberOfWords))."...";
        }
        return $text;
    }
    return "";
    }
}

回答by Rizwan Gill

It is totally what we are searching Just cut n pasted into your program and ran.

这完全是我们正在搜索的内容 只需剪切 n 粘贴到您的程序中并运行即可。

function shorten_string($string, $wordsreturned)
/*  Returns the first $wordsreturned out of $string.  If string
contains fewer words than $wordsreturned, the entire string
is returned.
*/
{
$retval = $string;      //  Just in case of a problem

$array = explode(" ", $string);
if (count($array)<=$wordsreturned)
/*  Already short enough, return the whole thing
*/
{
$retval = $string;
}
else
/*  Need to chop of some words
*/
{
array_splice($array, $wordsreturned);
$retval = implode(" ", $array)." ...";
}
return $retval;
}

and just call the function in your block of code just as

并像调用代码块中的函数一样

$data_itr = shorten_string($Itinerary,25);

回答by Vaci

I do it this way:

我这样做:

function trim_by_words($string, $word_count = 10) {
    $string = explode(' ', $string);
    if (empty($string) == false) {
        $string = array_chunk($string, $word_count);
        $string = $string[0];
    }
    $string = implode(' ', $string);
    return $string;
}

Its UTF8 compatible...

它的UTF8兼容...