从 PHP 文本中提取 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/910912/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 00:18:11  来源:igfitidea点击:

Extract URLs from text in PHP

phphtmlregex

提问by ahmed

I have this text:

我有这样的文字:

$string = "this is my friend's website http://example.com I think it is coll";

How can I extract the link into another variable?

如何将链接提取到另一个变量中?

I know it should be by using regular expression especially preg_match()but I don't know how?

我知道应该使用正则表达式,preg_match()但我不知道如何使用?

回答by Nobu

Probably the safest way is using code snippets from WordPress. Download the latest one (currently 3.1.1) and see wp-includes/formatting.php. There's a function named make_clickable which has plain text for param and returns formatted string. You can grab codes for extracting URLs. It's pretty complex though.

可能最安全的方法是使用 WordPress 的代码片段。下载最新的(目前是 3.1.1)并查看 wp-includes/formatting.php。有一个名为 make_clickable 的函数,它具有 param 的纯文本并返回格式化的字符串。您可以抓取用于提取 URL 的代码。不过还是挺复杂的。

This one line regex might be helpful.

这一行正则表达式可能会有所帮助。

preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match);

But this regex still can't remove some malformed URLs (ex. http://google:ha.ckers.org).

但是这个正则表达式仍然无法删除一些格式错误的 URL(例如http://google:ha.ckers.org)。

See also: How to mimic StackOverflow Auto-Link Behavior

另请参阅: 如何模仿 StackOverflow 自动链接行为

回答by Mikael Roos

I tried to do as Nobu said, using Wordpress, but to much dependencies to other WordPress functions I instead opted to use Nobu's regular expression for preg_match_all()and turned it into a function, using preg_replace_callback(); a function which now replaces all links in a text with clickable links. It uses anonymous functionsso you'll need PHP 5.3 or you may rewrite the code to use an ordinary function instead.

我尝试像 Nobu 所说的那样,使用 Wordpress,但由于对其他 WordPress 函数的依赖性很大,我选择使用 Nobu 的正则表达式preg_match_all()并将其转换为函数,使用preg_replace_callback(); 现在用可点击的链接替换文本中的所有链接的功能。它使用匿名函数,因此您需要 PHP 5.3 或者您可以重写代码以使用普通函数。

<?php 

/**
 * Make clickable links from URLs in text.
 */

function make_clickable($text) {
    $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#';
    return preg_replace_callback($regex, function ($matches) {
        return "<a href=\'{$matches[0]}\'>{$matches[0]}</a>";
    }, $text);
}

回答by soulmerge

URLs have a quite complex definition— you must decide what you want to capture first. A simple example capturing anything starting with http://and https://could be:

URL 有一个相当复杂的定义——您必须首先决定要捕获的内容。一个简单的例子捕获任何以http://and开头的内容https://可能是:

preg_match_all('!https?://\S+!', $string, $matches);
$all_urls = $matches[0];

Note that this is very basic and could capture invalid URLs. I would recommend catching up on POSIXand PHP regular expressionsfor more complex things.

请注意,这是非常基本的,可以捕获无效的 URL。对于更复杂的事情,我建议您学习POSIXPHP 正则表达式

回答by Michael Borgwardt

If the text you extract the URLs from is user-submitted and you're going to display the result as links anywhere, you have to be very, VERY careful to avoid XSS vulnerabilities, most prominently "javascript:" protocol URLs, but also malformed URLsthat might trick your regexp and/or the displaying browser into executing them as Javascript URLs. At the very least, you should accept only URLs that start with "http", "https" or "ftp".

如果您从中提取 URL 的文本是用户提交的,并且您要将结果作为链接显示在任何地方,则您必须非常非常小心地避免XSS 漏洞,最突出的是“javascript:”协议 URL,但也有格式错误网址可能会诱使你的正则表达式和/或显示浏览器进入执行它们的JavaScript网址。至少,您应该只接受以“http”、“https”或“ftp”开头的 URL。

There's also a blog entryby Jeff where he describes some other problems with extracting URLs.

Jeff还撰写了 一篇博客文章,其中描述了提取 URL 的其他一些问题。

回答by Kai Noack

The code that worked for me (especially if you have several links in your $string) is:

对我有用的代码(特别是如果您的 $string 中有多个链接)是:

$string = "this is my friend's website https://www.example.com I think it is cool, but this one is cooler https://www.stackoverflow.com :)";
$regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i';
preg_match_all($regex, $string, $matches);
$urls = $matches[0];
// go over all links
foreach($urls as $url) 
{
    echo $url.'<br />';
}

Hope that helps others as well.

希望对其他人也有帮助。

回答by runfalk

preg_match_all('/[a-z]+:\/\/\S+/', $string, $matches);

This is an easy way that'd work for a lot of cases, not all. All the matches are put in $matches. Note that this do not cover links in anchor elements (<a href=""...), but that wasn't in your example either.

这是一种简单的方法,适用于很多情况,而不是所有情况。所有匹配项都放在 $matches 中。请注意,这不包括锚元素(<a href=""...)中的链接,但这也不在您的示例中。

回答by Shankar Damodaran

You could do like this..

你可以这样做..

<?php
$string = "this is my friend's website http://example.com I think it is coll";
echo explode(' ',strstr($string,'http://'))[0]; //"prints" http://example.com

回答by Shankar Damodaran

preg_match_all ("/a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/",
                $var, &$matches);

$matches = $matches[1];
$list = array();

foreach($matches as $var)
{    
    print($var."<br>");
}

回答by HTML5 developer

You could try this to find the link and revise the link (add the href link).

您可以尝试使用此方法查找链接并修改链接(添加 href 链接)。

$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";

// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://example.com";

if(preg_match($reg_exUrl, $text, $url)) {

       echo preg_replace($reg_exUrl, "<a href="{$url[0]}">{$url[0]}</a> ", $text);

} else {

       echo "No url in the text";

}

refer here: http://php.net/manual/en/function.preg-match.php

参考这里:http: //php.net/manual/en/function.preg-match.php

回答by vstelmakh

There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.

有很多带有 url 的边缘情况。像 url 可以包含括号或不包含协议等。这就是正则表达式不够的原因。

I created a PHP library that could deal with lots of edge cases: Url highlight.

我创建了一个可以处理很多边缘情况的 PHP 库:Url highlight

Example:

例子:

<?php

use VStelmakh\UrlHighlight\UrlHighlight;

$urlHighlight = new UrlHighlight();
$urlHighlight->getUrls("this is my friend's website http://example.com I think it is coll");
// return: ['http://example.com']

For more details see readme. For covered url cases see test.

有关更多详细信息,请参阅自述文件。对于覆盖的 url 情况,请参阅test