如何使用 PHP 从 HTML 字符串中仅提取文本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26343641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract only text from HTML string with PHP?
提问by New Co
I want to extract only text from a php string.
我只想从 php 字符串中提取文本。
This php string contains html code like tags or etc.
这个 php 字符串包含 html 代码,如标签等。
So I only need a simple text from this string.
所以我只需要这个字符串中的一个简单文本。
This is the actual string:
这是实际的字符串:
<div class="devblog-index-content battlelog-wordpress">
<p><strong>The celebration of the Recon class in our second </strong><a href="http://blogs.battlefield.com/2014/10/bf4-class-week-recon/" target="_blank">BF4 Class Week</a><strong> continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…</strong></p>
<p> </p>
<p style="text-align:center"><a href="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37"><img alt="bf4-history-of-recon-1" class="aligncenter" src="http://eaassets-a.akamaihd.net/battlelog/prod/954660ddbe53df808c23a0ba948e7971/en_US/blog/wp-content/uploads/2014/10/bf4-history-of-recon-1.jpg?v=1412871863.37" style="width:619px" /></a></p>
I want to show this from the string:
我想从字符串中显示:
The celebration of the Recon class in our second BF4 Class Week continues with a sneaky stroll down memory lane. Learn more about how the Recon has changed in appearance, name and weaponry over the years…
Actually this text will be placed in meta description tag so I don't need any HTML in meta tag. How can I perform this? Any ideas and thoughts about this technique ?
实际上,此文本将放置在元描述标记中,因此我不需要元标记中的任何 HTML。我该如何执行此操作?关于这项技术的任何想法和想法?
回答by MillaresRoo
You may try:
你可以试试:
echo(strip_tags($your_string));
More info here: http://php.net/manual/en/function.strip-tags.php
回答by Paulius Jacionis
Another option is to use Html2Text. It will do a much better job than strip_tags, especially if you want to parse complicated HTML code.
另一种选择是使用 Html2Text。它将比 strip_tags 做得更好,尤其是当您想解析复杂的 HTML 代码时。
Extracting text from HTML is tricky, so your best bet is to use a library built for this purpose.
从 HTML 中提取文本很棘手,因此最好的办法是使用为此目的构建的库。
https://github.com/mtibben/html2text
https://github.com/mtibben/html2text
Install using composer:
使用 Composer 安装:
composer require html2text/html2text
Basic usage:
基本用法:
$html = new \Html2Text\Html2Text('Hello, "<b>world</b>"');
echo $html->getText(); // Hello, "WORLD"
回答by jasonlam604
Adding another option for someone else who may need this, the Stringizerlibrary might be an option, see Strip Tags.
为可能需要此功能的其他人添加另一个选项,Stringizer库可能是一个选项,请参阅Strip Tags。
Full disclosure I'm the owner of the project.
完全披露我是该项目的所有者。