php 获取 A 元素的 href 属性

Question

提问by bergin

Trying to find the links on a page.

试图找到页面上的链接。

my regex is:

我的正则表达式是：

/<a\s[^>]*href=(\"\'??)([^\"\' >]*?)[^>]*>(.*)<\/a>/

but seems to fail at

但似乎失败了

<a title="this" href="that">what?</a>

How would I change my regex to deal with href not placed first in the a tag?

我将如何更改我的正则表达式以处理未首先放在 a 标签中的 href？

Answer 1

回答by Gordon

Reliable Regex for HTML are difficult. Here is how to do it with DOM:

可靠的 HTML 正则表达式很困难。以下是如何使用DOM做到这一点：

$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
    echo $dom->saveHtml($node), PHP_EOL;
}

The above would find and output the "outerHTML"of all Aelements in the $htmlstring.

以上将查找并输出字符串中所有元素的“outerHTML”。A$html

To getall the text values of the node, you do

要获取节点的所有文本值，您可以

echo $node->nodeValue;

To checkif the hrefattribute exists you can do

要检查是否href属性存在，你可以做

echo $node->hasAttribute( 'href' );

To getthe hrefattribute you'd do

要获得href你要做的属性

echo $node->getAttribute( 'href' );

To changethe hrefattribute you'd do

要更改href您要做的属性

$node->setAttribute('href', 'something else');

To removethe hrefattribute you'd do

要删除href您要做的属性

$node->removeAttribute('href');

You can also query for the hrefattribute directly with XPath

您也可以href直接使用XPath查询属性

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a/@href');
foreach($nodes as $href) {
    echo $href->nodeValue;                       // echo current attribute value
    $href->nodeValue = 'new value';              // set new attribute value
    $href->parentNode->removeAttribute('href');  // remove attribute
}

Also see:

另见：

On a sidenote: I am sure this is a duplicate and you can find the answer somewhere in here

旁注：我确定这是重复的，您可以在此处的某处找到答案

Answer 2

回答by Toto

I agree with Gordon, you MUST use an HTML parser to parse HTML. But if you really want a regex you can try this one :

我同意 Gordon 的观点，你必须使用 HTML 解析器来解析 HTML。但如果你真的想要一个正则表达式，你可以试试这个：

/^<a.*?href=(["\'])(.*?).*$/

This matches <aat the begining of the string, followed by any number of any char (non greedy) .*?then href=followed by the link surrounded by either "or '

这<a在字符串的开头匹配，后跟任意数量的任何字符（非贪婪），.*?然后href=是由其中之一"或'

$str = '<a title="this" href="that">what?</a>';
preg_match('/^<a.*?href=(["\'])(.*?).*$/', $str, $m);
var_dump($m);

Output:

输出：

array(3) {
  [0]=>
  string(37) "<a title="this" href="that">what?</a>"
  [1]=>
  string(1) """
  [2]=>
  string(4) "that"
}

Answer 3

回答by Alex Pliutau

The pattern you want to look for would be the link anchor pattern, like (something):

您要查找的模式将是链接锚点模式，例如（某事）：

$regex_pattern = "/<a href=\"(.*)\">(.*)<\/a>/";

Answer 4

回答by Aif

why don't you just match

你为什么不匹配

"<a.*?href\s*=\s*['"](.*?)['"]"

<?php

$str = '<a title="this" href="that">what?</a>';

$res = array();

preg_match_all("/<a.*?href\s*=\s*['\"](.*?)['\"]/", $str, $res);

var_dump($res);

?>

then

然后

$ php test.php
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(27) "<a title="this" href="that""
  }
  [1]=>
  array(1) {
    [0]=>
    string(4) "that"
  }
}

which works. I've just removed the first capture braces.

哪个有效。我刚刚删除了第一个捕获括号。

Answer 5

回答by Milan Malani

For the one who still not get the solutions very easy and fast using SimpleXML

对于仍然无法使用 SimpleXML 轻松快速地获得解决方案的人

$a = new SimpleXMLElement('<a href="www.something.com">Click here</a>');
echo $a['href']; // will echo www.something.com

Its working for me

它对我有用

Answer 6

回答by Adam

I'm not sure what you're trying to do here, but if you're trying to validate the link then look at PHP's filter_var()

我不确定您要在这里做什么，但是如果您要验证链接，请查看 PHP 的 filter_var()

If you really need to use a regular expression then check out this tool, it may help: http://regex.larsolavtorvik.com/

如果您确实需要使用正则表达式，请查看此工具，它可能会有所帮助：http: //regex.larsolavtorvik.com/

Answer 7

回答by Ruel

Using your regex, I modified it a bit to suit your need.

使用您的正则表达式，我对其进行了一些修改以满足您的需要。

<a.*?href=("|')(.*?)("|').*?>(.*)<\/a>

I personally suggest you use a HTML Parser

我个人建议你使用HTML Parser

EDIT: Tested

编辑：经过测试

Answer 8

回答by CharlesLeaf

Quick test: <a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>seems to do the trick, with the 1st match being " or ', the second the 'href' value 'that', and the third the 'what?'.

快速测试：<a\s+[^>]*href=(\"\'??)([^\1]+)(?:\1)>(.*)<\/a>似乎可以解决问题，第一个匹配是“或”，第二个匹配是“href”值“那个”，第三个匹配是“什么？”。

The reason I left the first match of "/' in there is that you can use it to backreference it later for the closing "/' so it's the same.

我将 "/' 的第一个匹配项留在那里的原因是，您可以稍后使用它来反向引用它以关闭 "/'，所以它是相同的。

See live example on: http://www.rubular.com/r/jsKyK2b6do

参见现场示例：http: //www.rubular.com/r/jsKyK2b6do

Answer 9

回答by Ravi Prakash

preg_match_all("/(]>)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);

preg_match_all("/(] >)(.?)(</a)/", $contents, $impmatches, PREG_SET_ORDER);

It is tested and it fetch all a tag from any html code.

它经过测试并从任何 html 代码中获取所有标签。

Answer 10

回答by Meloman

The following is working for me and returns both hrefand valueof the anchor tag.

以下是为我工作和回报都href和value锚标记。

preg_match_all("'\<a.*?href=\"(.*?)\".*?\>(.*?)\<\/a\>'si", $html, $match);
if($match) {
    foreach($match[0] as $k => $e) {
        $urls[] = array(
            'anchor'    =>  $e,
            'href'      =>  $match[1][$k],
            'value'     =>  $match[2][$k]
        );
    }
}

The multidimensional array called $urlscontains now associative sub-arrays that are easy to use.

被调用的多维数组$urls现在包含易于使用的关联子数组。

php 获取 A 元素的 href 属性

提问by bergin

回答by Gordon

回答by Toto

回答by Alex Pliutau

回答by Aif

回答by Milan Malani

回答by Adam

回答by Ruel

回答by CharlesLeaf

回答by Ravi Prakash

回答by Meloman

相关推荐

最近更新

标签

php 获取 A 元素的 href 属性

提问by bergin

回答by Gordon

回答by Toto

回答by Alex Pliutau

回答by Aif

回答by Milan Malani

回答by Adam

回答by Ruel

回答by CharlesLeaf

回答by Ravi Prakash

回答by Meloman

相关推荐

php 在简单数组中查找元素的位置

将 PHP 日期字符串作为时间戳保存到 MySQL 数据库中

PHP SNMP - 找不到模块

php XML 解析器错误：未定义实体

相关推荐

最近更新

标签