php 获取 A 元素的 href 属性
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3820666/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grabbing the href attribute of an A element
提问by bergin
Trying to find the links on a page.
试图找到页面上的链接。
my regex is:
我的正则表达式是:
/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/
but seems to fail at
但似乎失败了
<a title="this" href="that">what?</a>
How would I change my regex to deal with href not placed first in the a tag?
我将如何更改我的正则表达式以处理未首先放在 a 标签中的 href?
回答by Gordon
Reliable Regex for HTML are difficult. Here is how to do it with DOM:
可靠的 HTML 正则表达式很困难。以下是如何使用DOM做到这一点:
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
}
The above would find and output the "outerHTML"of all A
elements in the $html
string.
以上将查找并输出字符串中所有元素的“outerHTML”。A
$html
To getall the text values of the node, you do
要获取节点的所有文本值,您可以
echo $node->nodeValue;
To checkif the href
attribute exists you can do
要检查是否href
属性存在,你可以做
echo $node->hasAttribute( 'href' );
To getthe href
attribute you'd do
要获得href
你要做的属性
echo $node->getAttribute( 'href' );
To changethe href
attribute you'd do
要更改href
您要做的属性
$node->setAttribute('href', 'something else');
To removethe href
attribute you'd do
要删除href
您要做的属性
$node->removeAttribute('href');
You can also query for the href
attribute directly with XPath
您也可以href
直接使用XPath查询属性
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/@href');
foreach($nodes as $href) {
echo $href->nodeValue; // echo current attribute value
$href->nodeValue = 'new value'; // set new attribute value
$href->parentNode->removeAttribute('href'); // remove attribute
}
Also see:
另见:
On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here
旁注:我确定这是重复的,您可以在此处的某处找到答案
回答by Toto
I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :
我同意 Gordon 的观点,你必须使用 HTML 解析器来解析 HTML。但如果你真的想要一个正则表达式,你可以试试这个:
/^<a.*?href=(["\'])(.*?).*$/
This matches <a
at the begining of the string, followed by any number of any char (non greedy) .*?
then href=
followed by the link surrounded by either "
or '
这<a
在字符串的开头匹配,后跟任意数量的任何字符(非贪婪),.*?
然后href=
是由其中之一"
或'
$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?).*$/', $str, $m);
var_dump($m);
Output:
输出:
array(3) {
[0]=>
string(37) "<a title="this" href="that">what?</a>"
[1]=>
string(1) """
[2]=>
string(4) "that"
}
回答by Alex Pliutau
The pattern you want to look for would be the link anchor pattern, like (something):
您要查找的模式将是链接锚点模式,例如(某事):
$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";
回答by Aif
why don't you just match
你为什么不匹配
"<a.*?href\s*=\s*['"](.*?)['"]"
<?php
$str = '<a title="this" href="that">what?</a>';
$res = array();
preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);
var_dump($res);
?>
then
然后
$ php test.php
array(2) {
[0]=>
array(1) {
[0]=>
string(27) "<a title="this" href="that""
}
[1]=>
array(1) {
[0]=>
string(4) "that"
}
}
which works. I've just removed the first capture braces.
哪个有效。我刚刚删除了第一个捕获括号。
回答by Milan Malani
For the one who still not get the solutions very easy and fast using SimpleXML
对于仍然无法使用 SimpleXML 轻松快速地获得解决方案的人
$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com
Its working for me
它对我有用
回答by Adam
I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()
我不确定您要在这里做什么,但是如果您要验证链接,请查看 PHP 的 filter_var()
If you really need to use a regular expression then check out this tool, it may help: http://regex.larsolavtorvik.com/
如果您确实需要使用正则表达式,请查看此工具,它可能会有所帮助:http: //regex.larsolavtorvik.com/
回答by Ruel
Using your regex, I modified it a bit to suit your need.
使用您的正则表达式,我对其进行了一些修改以满足您的需要。
<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>
<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>
I personally suggest you use a HTML Parser
我个人建议你使用HTML Parser
EDIT: Tested
编辑:经过测试
回答by CharlesLeaf
Quick test: <a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>
seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.
快速测试:<a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>
似乎可以解决问题,第一个匹配是“或”,第二个匹配是“href”值“那个”,第三个匹配是“什么?”。
The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.
我将 "/' 的第一个匹配项留在那里的原因是,您可以稍后使用它来反向引用它以关闭 "/',所以它是相同的。
See live example on: http://www.rubular.com/r/jsKyK2b6do
参见现场示例:http: //www.rubular.com/r/jsKyK2b6do
回答by Ravi Prakash
preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);
preg_match_all("/(] >)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);
It is tested and it fetch all a tag from any html code.
它经过测试并从任何 html 代码中获取所有标签。
回答by Meloman
The following is working for me and returns both href
and value
of the anchor tag.
以下是为我工作和回报都href
和value
锚标记。
preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si", $html, $match);
if($match) {
foreach($match[0] as $k => $e) {
$urls[] = array(
'anchor' => $e,
'href' => $match[1][$k],
'value' => $match[2][$k]
);
}
}
The multidimensional array called $urls
contains now associative sub-arrays that are easy to use.
被调用的多维数组$urls
现在包含易于使用的关联子数组。