php 如何删除php中标签之间的文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1361878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 02:11:48  来源:igfitidea点击:

How to remove text between tags in php?

phpregexstring

提问by MrFidge

Despite using PHP for years, I've never really learnt how to use expressions to truncate strings properly... which is now biting me in the backside!

尽管使用 PHP 多年,但我从来没有真正学会如何使用表达式来正确截断字符串......现在这让我很生气!

Can anyone provide me with some help truncating this? I need to chop out the text portion from the url, turning

谁能帮我截断这个?我需要从 url 中切出文本部分,转动

<a href="link.html">text</a>

into

进入

<a href="link.html"></a>

回答by Amber

$str = preg_replace('#(<a.*?>).*?(</a>)#', '', $str)

回答by karim79

Using SimpleHTMLDom:

使用SimpleHTMLDom

<?php
// example of how to modify anchor innerText
include('simple_html_dom.php');

// get DOM from URL or file
$html = file_get_html('http://www.example.com/');

//set innerText to null for each anchor
foreach($html->find('a') as $e) {
    $e->innerText = null;
}

// dump contents
echo $html;
?>

回答by Pascal MARTIN

What about something like this, considering you might want to re-use it with other hrefs :

考虑到您可能想与其他hrefs重用它,这样的事情怎么样:

$str = '<a href="link.html">text</a>';
$result = preg_replace('#(<a[^>]*>).*?(</a>)#', '', $str);
var_dump($result);

Which will get you :

这会让你:

string '<a href="link.html"></a>' (length=24)

(I'm considering you made a typo in the OP ? )

(我在考虑你在 OP 中打错了?)


If you don't need to match any other href, you could use something like :


如果您不需要匹配任何其他 href,则可以使用以下内容:

$str = '<a href="link.html">text</a>';
$result = preg_replace('#(<a href="link.html">).*?(</a>)#', '', $str);
var_dump($result);

Which will also get you :

这也会让你:

string '<a href="link.html"></a>' (length=24)


As a sidenote : for more complex HTML, don't try to use regular expressions : they work fine for this kind of simple situation, but for a real-life HTML portion, they don't really help, in general : HTML is not quite "regular" "enough" to be parsed by regexes.


作为旁注:对于更复杂的 HTML,不要尝试使用正则表达式:它们适用于这种简单的情况,但对于现实生活中的 HTML 部分,它们并没有真正的帮助,一般来说:HTML 不是相当“常规”“足够”可以被正则表达式解析。

回答by KB22

You could use substring in combination with stringpos, eventhough this is not a very nice approach.

您可以将 substring 与 stringpos 结合使用,尽管这不是一个很好的方法。

Check: PHP Manual - String functions

检查:PHP 手册 - 字符串函数

Another way would be to write a regular expression to match your criteria. But in order to get your problem solved quickly the string functions will do...

另一种方法是编写一个正则表达式来匹配您的条件。但是为了快速解决您的问题,字符串函数将执行...

EDIT: I underestimated the audience. ;) Go ahead with the regexes... ^^

编辑:我低估了观众。;) 继续使用正则表达式... ^^

回答by mickmackusa

You don't need to capture the tags themselves. Just target the text between the tags and replace it with an empty string. Super simple.

您不需要自己捕获标签。只需定位标签之间的文本并将其替换为空字符串。超级简单。

Demo of both techniques

两种技术的演示

Code:

代码:

$string = '<a href="link.html">text</a>';
echo preg_replace('/<a[^>]*>\K[^<]*/', '', $string);
// the opening tag--^^^^^^^^  ^^^^^-match everything before the end tag
//                          ^^-restart fullstring match

Output:

输出:

<a href="link.html"></a>

Or in fringe cases when the link text contains a <, use this: ~<a[^>]*>\K.*?(?=</a>)~

或者在链接文本包含 a 的边缘情况下<,请使用: ~<a[^>]*>\K.*?(?=</a>)~

This avoids the expense of capture groups using a lazy quantifier, the fullstring restarting \Kand a "lookahead".

这避免了使用惰性量词、全字符串重新启动\K和“前瞻”的捕获组的开销。



Older & wiser:

更老更聪明:

If you are parsing valid html, you should use a dom parser for stability/accuracy. Regex is DOM-ignorant, so if there is a tag attribute value containing a >, my snippet will fail.

如果您正在解析有效的 html,您应该使用 dom 解析器来确保稳定性/准确性。正则表达式是 DOM 无知的,所以如果有一个包含 a 的标签属性值>,我的代码段将失败。

As a narrowly suited domdocument solution to offer some context:

作为一个非常适合的 domdocument 解决方案来提供一些上下文:

$dom = new DOMDocument;
$dom->loadHTML($string, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); // 2nd params to remove DOCTYPE);
$dom->getElementsByTagName('a')[0]->nodeValue = '';
echo $dom->saveHTML();

回答by oswald

Only use strip_tags(), that would get rid of the tags and left only the desired text between them

只使用strip_tags(),这将摆脱标签并只在它们之间留下所需的文本