PHP - 从 HTML 中提取文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2279965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP - Extracting text from HTML
提问by Dan
I have a long string of HTML that contains
我有一长串 HTML 包含
<p>
<img>
<span>
and a bunch of other tags.
和一堆其他标签。
Is there anyway of extracting ONLY the text within the tags from this string?
有没有办法只从这个字符串中提取标签内的文本?
回答by Pekka
If you want to extract all text within any tags, the simple way is to strip the tags: strip_tags()
如果要提取任何标签内的所有文本,简单的方法是剥离标签:strip_tags()
If you want to remove specific tags, maybe this SO questionshelps.
如果您想删除特定标签,也许这个 SO 问题会有所帮助。
回答by Tatu Ulmanen
I know I'll be getting a lot of bashing for this, but for a simple task like this I'd use regular expressions.
我知道我会为此受到很多抨击,但对于像这样的简单任务,我会使用正则表达式。
preg_match_all('~(<span>(.*?)</span>)~', $html, $matches);
$matches[0]will contain all the span tags and their contents, $matches[1]contains only the contents.
$matches[0]将包含所有 span 标签及其内容,$matches[1]仅包含内容。
For more complicated stuff you might want to take a look at PHP Simple HTML DOM Parseror similar:
对于更复杂的东西,你可能想看看PHP Simple HTML DOM Parser或类似的:
// Create DOM from URL or file
$html = str_get_html($html);
// Find all images
foreach($html->find('img') as $element) {
echo $element->src . '<br>';
}
Etc.
等等。

