如何将 PHP 中的字符串截断为最接近一定数量字符的单词?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/79960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 21:23:50  来源:igfitidea点击:

How to Truncate a string in PHP to the word closest to a certain number of characters?

phpstringfunction

提问by Brian

I have a code snippet written in PHP that pulls a block of text from a database and sends it out to a widget on a webpage. The original block of text can be a lengthy article or a short sentence or two; but for this widget I can't display more than, say, 200 characters. I could use substr() to chop off the text at 200 chars, but the result would be cutting off in the middle of words-- what I really want is to chop the text at the end of the last wordbefore 200 chars.

我有一个用 PHP 编写的代码片段,它从数据库中提取一段文本并将其发送到网页上的小部件。原始文本块可以是一篇长文章,也可以是一两句短句;但是对于这个小部件,我不能显示超过 200 个字符。我可以使用 substr() 在 200 个字符处切断文本,但结果将在单词中间切断——我真正想要的是在 200 个字符之前的最后一个单词的末尾切断文本。

回答by Grey Panther

By using the wordwrapfunction. It splits the texts in multiple lines such that the maximum width is the one you specified, breaking at word boundaries. After splitting, you simply take the first line:

通过使用自动换行功能。它将文本分成多行,这样最大宽度就是您指定的宽度,在单词边界处中断。拆分后,您只需取第一行:

substr($string, 0, strpos(wordwrap($string, $your_desired_width), "\n"));

One thing this oneliner doesn't handle is the case when the text itself is shorter than the desired width. To handle this edge-case, one should do something like:

这个 oneliner 不能处理的一件事是文本本身比所需宽度短的情况。要处理这种边缘情况,应该执行以下操作:

if (strlen($string) > $your_desired_width) 
{
    $string = wordwrap($string, $your_desired_width);
    $string = substr($string, 0, strpos($string, "\n"));
}


The above solution has the problem of prematurely cutting the text if it contains a newline before the actual cutpoint. Here a version which solves this problem:

上述解决方案如果在实际剪切点之前包含换行符,则存在过早剪切文本的问题。这是解决此问题的版本:

function tokenTruncate($string, $your_desired_width) {
  $parts = preg_split('/([\s\n\r]+)/', $string, null, PREG_SPLIT_DELIM_CAPTURE);
  $parts_count = count($parts);

  $length = 0;
  $last_part = 0;
  for (; $last_part < $parts_count; ++$last_part) {
    $length += strlen($parts[$last_part]);
    if ($length > $your_desired_width) { break; }
  }

  return implode(array_slice($parts, 0, $last_part));
}

Also, here is the PHPUnit testclass used to test the implementation:

此外,这里是用于测试实现的 PHPUnit 测试类:

class TokenTruncateTest extends PHPUnit_Framework_TestCase {
  public function testBasic() {
    $this->assertEquals("1 3 5 7 9 ",
      tokenTruncate("1 3 5 7 9 11 14", 10));
  }

  public function testEmptyString() {
    $this->assertEquals("",
      tokenTruncate("", 10));
  }

  public function testShortString() {
    $this->assertEquals("1 3",
      tokenTruncate("1 3", 10));
  }

  public function testStringTooLong() {
    $this->assertEquals("",
      tokenTruncate("toooooooooooolooooong", 10));
  }

  public function testContainingNewline() {
    $this->assertEquals("1 3\n5 7 9 ",
      tokenTruncate("1 3\n5 7 9 11 14", 10));
  }
}

EDIT :

编辑 :

Special UTF8 characters like 'à' are not handled. Add 'u' at the end of the REGEX to handle it:

不处理像“à”这样的特殊 UTF8 字符。在 REGEX 末尾添加 'u' 来处理它:

$parts = preg_split('/([\s\n\r]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);

$parts = preg_split('/([\s\n\r]+)/u', $string, null, PREG_SPLIT_DELIM_CAPTURE);

回答by mattmac

This will return the first 200 characters of words:

这将返回单词的前 200 个字符:

preg_replace('/\s+?(\S+)?$/', '', substr($string, 0, 201));

回答by Dave

$WidgetText = substr($string, 0, strrpos(substr($string, 0, 200), ' '));

And there you have it — a reliable method of truncating any string to the nearest whole word, while staying under the maximum string length.

这就是你所拥有的 - 一种将任何字符串截断为最接近的整个单词的可靠方法,同时保持在最大字符串长度以下。

I've tried the other examples above and they did not produce the desired results.

我已经尝试了上面的其他示例,但它们没有产生预期的结果。

回答by Sergiy Sokolenko

The following solution was born when I've noticed a $break parameter of wordwrapfunction:

当我注意到wordwrap函数的 $break 参数时,以下解决方案诞生了:

string wordwrap ( string $str [, int $width = 75 [, string $break = "\n" [, bool $cut = false ]]] )

string wordwrap ( string $str [, int $width = 75 [, string $break = "\n" [, bool $cut = false ]]] )

Here is the solution:

这是解决方案

/**
 * Truncates the given string at the specified length.
 *
 * @param string $str The input string.
 * @param int $width The number of chars at which the string will be truncated.
 * @return string
 */
function truncate($str, $width) {
    return strtok(wordwrap($str, $width, "...\n"), "\n");
}

Example #1.

示例#1。

print truncate("This is very long string with many chars.", 25);

The above example will output:

上面的例子将输出:

This is very long string...

Example #2.

示例#2。

print truncate("This is short string.", 25);

The above example will output:

上面的例子将输出:

This is short string.

回答by Garrett Albright

Keep in mind whenever you're splitting by "word" anywhere that some languages such as Chinese and Japanese do not use a space character to split words. Also, a malicious user could simply enter text without any spaces, or using some Unicode look-alike to the standard space character, in which case any solution you use may end up displaying the entire text anyway. A way around this may be to check the string length after splitting it on spaces as normal, then, if the string is still above an abnormal limit - maybe 225 characters in this case - going ahead and splitting it dumbly at that limit.

请记住,当您在任何地方按“单词”进行拆分时,某些语言(例如中文和日语)不使用空格字符来拆分单词。此外,恶意用户可以简单地输入没有任何空格的文本,或者使用一些与标准空格字符相似的 Unicode 字符,在这种情况下,您使用的任何解决方案最终都可能会显示整个文本。解决此问题的一种方法可能是在正常将字符串拆分为空格后检查字符串长度,然后,如果字符串仍然高于异常限制 - 在这种情况下可能是 225 个字符 - 继续并在该限制下愚蠢地拆分它。

One more caveat with things like this when it comes to non-ASCII characters; strings containing them may be interpreted by PHP's standard strlen() as being longer than they really are, because a single character may take two or more bytes instead of just one. If you just use the strlen()/substr() functions to split strings, you may split a string in the middle of a character! When in doubt, mb_strlen()/mb_substr()are a little more foolproof.

当涉及到非 ASCII 字符时,还有一个类似的警告;包含它们的字符串可能会被 PHP 的标准 strlen() 解释为比实际长度更长,因为单个字符可能需要两个或更多字节而不是一个。如果只是使用 strlen()/substr() 函数来分割字符串,可能会在字符中间分割一个字符串!如有疑问,mb_strlen()/ mb_substr()更万无一失。

回答by Lucas Oman

Use strpos and substr:

使用 strpos 和 substr:

<?php

$longString = "I have a code snippet written in PHP that pulls a block of text.";
$truncated = substr($longString,0,strpos($longString,' ',30));

echo $truncated;

This will give you a string truncated at the first space after 30 characters.

这将为您提供一个在 30 个字符后的第一个空格处截断的字符串。

回答by UnkwnTech

Here you go:

干得好:

function neat_trim($str, $n, $delim='…') {
   $len = strlen($str);
   if ($len > $n) {
       preg_match('/(.{' . $n . '}.*?)\b/', $str, $matches);
       return rtrim($matches[1]) . $delim;
   }
   else {
       return $str;
   }
}

回答by Camsoft

Here is my function based on @Cd-MaN's approach.

这是我基于@Cd-MaN 方法的函数。

function shorten($string, $width) {
  if(strlen($string) > $width) {
    $string = wordwrap($string, $width);
    $string = substr($string, 0, strpos($string, "\n"));
  }

  return $string;
}

回答by hlcs

$shorttext = preg_replace('/^([\s\S]{1,200})[\s]+?[\s\S]+/', '', $fulltext);

Description:

描述:

  • ^- start from beginning of string
  • ([\s\S]{1,200})- get from 1 to 200 of any character
  • [\s]+?- not include spaces at the end of short text so we can avoid word ...instead of word...
  • [\s\S]+- match all other content
  • ^- 从字符串的开头开始
  • ([\s\S]{1,200})- 获得 1 到 200 个任意字符
  • [\s]+?- 在短文本末尾不包含空格,因此我们可以避免word ...代替word...
  • [\s\S]+- 匹配所有其他内容

Tests:

测试:

  1. regex101.comlet's add to orfew other r
  2. regex101.comorrrrexactly 200 characters.
  3. regex101.comafter fifth rorrrrrexcluded.
  1. regex101.com让我们添加or其他一些r
  2. regex101.comorrrr正好 200 个字符。
  3. regex101.com经过第五次rorrrrr排除。

Enjoy.

享受。

回答by orrd

It's surprising how tricky it is to find the perfect solution to this problem. I haven't yet found an answer on this page that doesn't fail in at least some situations (especially if the string contains newlines or tabs, or if the word break is anything other than a space, or if the string has UTF-8 multibyte characters).

令人惊讶的是,找到这个问题的完美解决方案是多么棘手。我还没有在此页面上找到至少在某些情况下不会失败的答案(特别是如果字符串包含换行符或制表符,或者如果单词 break 不是空格,或者字符串具有 UTF- 8 个多字节字符)。

Here is a simple solution that works in all cases. There were similar answers here, but the "s" modifier is important if you want it to work with multi-line input, and the "u" modifier makes it correctly evaluate UTF-8 multibyte characters.

这是一个适用于所有情况的简单解决方案。这里有类似的答案,但是如果您希望它与多行输入一起使用,“s”修饰符很重要,而“u”修饰符使其正确评估 UTF-8 多字节字符。

function wholeWordTruncate($s, $characterCount) 
{
    if (preg_match("/^.{1,$characterCount}\b/su", $s, $match)) return $match[0];
    return $s;
}

One possible edge case with this... if the string doesn't have any whitespace at all in the first $characterCount characters, it will return the entire string. If you prefer it forces a break at $characterCount even if it isn't a word boundary, you can use this:

一个可能的边缘情况...如果字符串在第一个 $characterCount 字符中根本没有任何空格,它将返回整个字符串。如果您更喜欢它在 $characterCount 处强制中断,即使它不是单词边界,您可以使用:

function wholeWordTruncate($s, $characterCount) 
{
    if (preg_match("/^.{1,$characterCount}\b/su", $s, $match)) return $match[0];
    return mb_substr($return, 0, $characterCount);
}

One last option, if you want to have it add ellipsis if it truncates the string...

最后一个选项,如果你想让它在截断字符串时添加省略号......

function wholeWordTruncate($s, $characterCount, $addEllipsis = ' …') 
{
    $return = $s;
    if (preg_match("/^.{1,$characterCount}\b/su", $s, $match)) 
        $return = $match[0];
    else
        $return = mb_substr($return, 0, $characterCount);
    if (strlen($s) > strlen($return)) $return .= $addEllipsis;
    return $return;
}