如何使用 PHP 从 HTML 字符串中仅提取文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26343641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 18:17:31  来源:igfitidea点击:

How to extract only text from HTML string with PHP?

php

提问by New Co

I want to extract only text from a php string.

我只想从 php 字符串中提取文本。

This php string contains html code like tags or etc.

这个 php 字符串包含 html 代码,如标签等。

So I only need a simple text from this string.

所以我只需要这个字符串中的一个简单文本。

This is the actual string:

这是实际的字符串:

<div class="devblog-index-content battlelog-wordpress">
<p><strong>The celebration of the Recon class in our second </strong><a href="http://blogs.battlefield.com/2014/10/bf4-class-week-recon/" target="_blank">BF4 Class Week</a><strong> continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years&hellip;</strong></p>

<p>&nbsp;</p>

<p style="text-align:center"><a href="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37"><img alt="bf4-history-of-recon-1" class="aligncenter" src="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37" style="width:619px" /></a></p>

I want to show this from the string:

我想从字符串中显示:

The celebration of the Recon class in our second BF4 Class Week continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…

Actually this text will be placed in meta description tag so I don't need any HTML in meta tag. How can I perform this? Any ideas and thoughts about this technique ?

实际上,此文本将放置在元描述标记中,因此我不需要元标记中的任何 HTML。我该如何执行此操作?关于这项技术的任何想法和想法?

回答by MillaresRoo

You may try:

你可以试试:

echo(strip_tags($your_string));

More info here: http://php.net/manual/en/function.strip-tags.php

更多信息在这里:http: //php.net/manual/en/function.strip-tags.php

回答by Paulius Jacionis

Another option is to use Html2Text. It will do a much better job than strip_tags, especially if you want to parse complicated HTML code.

另一种选择是使用 Html2Text。它将比 strip_tags 做得更好,尤其是当您想解析复杂的 HTML 代码时。

Extracting text from HTML is tricky, so your best bet is to use a library built for this purpose.

从 HTML 中提取文本很棘手,因此最好的办法是使用为此目的构建的库。

https://github.com/mtibben/html2text

https://github.com/mtibben/html2text

Install using composer:

使用 Composer 安装:

composer require html2text/html2text

Basic usage:

基本用法:

$html = new \Html2Text\Html2Text('Hello, &quot;<b>world</b>&quot;');

echo $html->getText();  // Hello, "WORLD"

回答by jasonlam604

Adding another option for someone else who may need this, the Stringizerlibrary might be an option, see Strip Tags.

为可能需要此功能的其他人添加另一个选项,Stringizer库可能是一个选项,请参阅Strip Tags

Full disclosure I'm the owner of the project.

完全披露我是该项目的所有者。