php php正则表达式获取href标签内的字符串

Question

提问by David

I need a regex that will give me the string inside an href tag and inside the quotes also.

我需要一个正则表达式，它会给我一个 href 标签内和引号内的字符串。

For example i need to extract theurltoget.com in the following:

例如，我需要在以下内容中提取 theurltoget.com：

<a href="theurltoget.com">URL</a>

Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.htmli only want http://www.mydomain.com/

另外，我只想要基本 url 部分。即从http://www.mydomain.com/page.html我只想要http://www.mydomain.com/

Answer 1

回答by Drew Hunter

Dont use regex for this. You can use xpath and built in php functions to get what you want:

不要为此使用正则表达式。您可以使用 xpath 和内置的 php 函数来获得您想要的：

    $xml = simplexml_load_string($myHtml);
    $list = $xml->xpath("//@href");

    $preparedUrls = array();
    foreach($list as $item) {
        $item = parse_url($item);
        $preparedUrls[] = $item['scheme'] . '://' .  $item['host'] . '/';
    }
    print_r($preparedUrls);

Answer 2

回答by Alec

$html = '<a href="http://www.mydomain.com/page.html">URL</a>';

$url = preg_match('/<a href="(.+)">/', $html, $match);

$info = parse_url($match[1]);

echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com

Answer 3

回答by ishubin

this expression will handle 3 options:

此表达式将处理 3 个选项：

no quotes
double quotes
single quotes

没有引号
双引号
单引号

'/href=["\']?([^"\'>]+)["\']?/'

Answer 4

回答by Linkmichiel

Use the answer by @Alec if you're only looking for the base url part(the 2nd part of the question by @David)!

如果您只是在寻找基本网址部分（@David 问题的第二部分），请使用@Alec 的答案！

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);

This will give you:

这会给你：

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html" class="myclass" rel="myrel
)

So you can use $href = $info["scheme"] . "://" . $info["host"]Which gives you:

所以你可以使用$href = $info["scheme"] . "://" . $info["host"]which 给你：

// http://www.mydomain.com

When you are looking for the entire urlbetween the href, You should be using another regex, for instance the regex provided by @user2520237.

当您在 href 之间查找整个 url 时，您应该使用另一个正则表达式，例如@user2520237 提供的正则表达式。

$html = '<a href="http://www.mydomain.com/page.html" class="myclass" rel="myrel">URL</a>';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);

this will give you:

这会给你：

$info
Array
(
    [scheme] => http
    [host] => www.mydomain.com
    [path] => /page.html
)

Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];Which gives you:

现在您可以使用$href = $info["scheme"] . "://" . $info["host"] . $info["path"];which 为您提供：

// http://www.mydomain.com/page.html

Answer 5

回答by drudge

http://www.the-art-of-web.com/php/parse-links/

Let's start with the simplest case - a well formatted link with no extra attributes:

让我们从最简单的情况开始——一个没有额外属性的格式良好的链接：

/<a href=\"([^\"]*)\">(.*)<\/a>/iU

Answer 6

回答by Basani

For all href values replacement:

对于所有 href 值替换：

function replaceHref($html, $replaceStr)
{
    $match = array();
    $url   = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);

    if(count($match))
    {
        for($j=0; $j<count($match); $j++)
        {
            $html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
        }
    }
    return $html;
}
$replaceStr  = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);

echo $replaceHtml;

Answer 7

回答by kijin

This will handle the case where there are no quotes around the URL.

这将处理 URL 周围没有引号的情况。

/<a [^>]*href="?([^">]+)"?>/

But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.

但说真的，不要用 regex 解析 HTML。使用 DOM 或适当的解析库。

Answer 8

回答by Adam Byrtek

/href="(https?://[^/]*)/

I think you should be able to handle the rest.

我认为你应该能够处理剩下的事情。

Answer 9

回答by Pablo S G Pacheco

Because Positive and Negative Lookbehind are cool

因为 Positive 和 Negative Lookbehind 很酷

/(?<=href=\").+(?=\")/

It will match only what you want, without quotation marks

它只会匹配你想要的，没有引号

Array ( [0] => theurltoget.com )

数组 ( [0] => theurltoget.com )

php php正则表达式获取href标签内的字符串

提问by David

回答by Drew Hunter

回答by Alec

回答by ishubin

回答by Linkmichiel

回答by drudge

回答by Basani

回答by kijin

回答by Adam Byrtek

回答by Pablo S G Pacheco

相关推荐

最近更新

标签

php php正则表达式获取href标签内的字符串

提问by David

回答by Drew Hunter

回答by Alec

回答by ishubin

回答by Linkmichiel

回答by drudge

回答by Basani

回答by kijin

回答by Adam Byrtek

回答by Pablo S G Pacheco

相关推荐

从教程中复制 PHP 代码会在我的计算机上显示通知

php HTML 表单下拉菜单指定数字范围

php 当用户尝试下载文件时在 pdf 文件上应用水印

php 使用 jQuery ajax 通过 formData() 上传文件和表单数据

相关推荐

最近更新

标签