php 使用正则表达式匹配php中的url模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3904482/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 11:26:39  来源:igfitidea点击:

match url pattern in php using regular expression

phpregexurl

提问by Seema

I want to match a url link in wall post and replace this link with anchor tag, for this I use the regular expression below.

我想匹配墙贴中的 url 链接,并用锚标记替换此链接,为此我使用下面的正则表达式。

I would like the match 4 types of url:

我想要匹配 4 种类型的网址:

  1. http://example.com
  2. https://example.com
  3. www.example.com
  4. example.com
  1. http://example.com
  2. https://example.com
  3. www.example.com
  4. example.com
preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@',
             '<a href=""></a>', $subject);

This expression matches only first two types of url.

此表达式仅匹配前两种类型的 url。

If I use this expression for match url pattern '@(www?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@', then it only matches the third type of url pattern.

如果我将此表达式用于匹配 url 模式 '@(www?([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)@',则它仅匹配第三种类型的 url 模式。

How can I match all four type of url pattern with a single regular expression?

如何使用单个正则表达式匹配所有四种类型的 url 模式?

回答by Mārti?? Briedis

A complete working example using Nev Stokesgiven link:

使用Nev Stokes给定链接的完整工作示例:

public function clickableUrls($html){
    return $result = preg_replace(
        '%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s',
        '<a href=""></a>',
        $html
    );
}

回答by Nev Stokes

I'd use a different regex to be honest. Like this one that Gruber postedin 2009:

老实说,我会使用不同的正则表达式。就像 Gruber在 2009 年发布的这个:

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

or this updated version that Gruber postedin 2010 (thanks, @IMSoP):

或者 Gruber在 2010 年发布的这个更新版本(谢谢,@IMSoP):

(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>???“”‘']))

回答by uxtx

I looked around and didn't see any that were exactly what I needed. I found this onethat was close, so i modified it as follows:

我环顾四周,没有看到任何正是我需要的。我发现这个很接近,所以我修改如下:

^((([hH][tT][tT][pP][sS]?)\:\/\/)?([\w\-]+(\[\w\.\&%$\-]+)*)?((([^\s\(\)\<\>\\"\.\   [\]\,;:]+)(\.[^\s\(\)\<\>\\"\.\[\]\,;:]+)*(\.[a-zA-Z]{2,4}))|((([01]?\d{1,2}|2[0-4]\d|25[0-5])\.){3}([01]?\d{1,2}|2[0-4]\d|25[0-5])))(\b\:(6553[0-5]|655[0-2]\d|65[0-4]\d{2}|6[0-4]\d{3}|[1-5]\d{4}|[1-9]\d{0,3}|0)\b)?((\/[^\/][\w\.\,\?\'\\/\+&%$#\=~_\-]*)*[^\.\,\?\"\'\(\)\[\]!;<>{}\s\x7F-\xFF])?)$

check it out on debuggex.

debuggex查看

回答by Adnan

I just checked this post (after 2 years) might be you got the answer but for those who are beginners, you can use regular expression to strip every type of URL or Query String

我刚刚检查了这篇文章(2年后)可能你得到了答案,但对于初学者来说,你可以使用正则表达式来去除所有类型的 URL 或查询字符串

(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4})|\?([a-zA-Z0-9]+[\&\=\#a-z]+)

it will strip every type of URLs, take a look at the following list. I used different type of domains for those who want to ask "will it strip .us, .in or .pk etc type of domains or not.

它将去除所有类型的 URL,请查看以下列表。对于那些想询问“它是否会剥离 .us、.in 或 .pk 等类型的域的人”,我使用了不同类型的域。

  1. ftp://www.web.com
  2. web.net
  3. www.website.info
  4. website.us
  5. web.ws?query=true
  6. www.web.biz?query=true
  7. ftp://web.in?query=true
  8. media.google.com
  9. ns.google.pk
  10. ww1.smart.au
  11. www3.smart.br
  12. w1.smart.so
  13. ?ques==two&t=p
  14. http://website.info?ques==two&t=p
  15. https://www.weborwebsite.com
  1. ftp://www.web.com
  2. 网络
  3. www.website.info
  4. 网站.us
  5. web.ws?query=true
  6. www.web.biz?query=true
  7. ftp://web.in?query=true
  8. media.google.com
  9. ns.google.pk
  10. ww1.smart.au
  11. www3.smart.br
  12. w1.smart.so
  13. ? 问题==两个&t=p
  14. http://website.info?ques==two&t=p
  15. https://www.weborwebsite.com

Working Example (tested in PHP5+, Apache2+):

工作示例(在 PHP5+、Apache2+ 中测试):

$str = "ftp://www.web.com, web.net, www.website.info, website.us, web.ws?query=true, www.web.biz?query=true, ftp://web.in?query=true, media.google.com hello world, working more with ns ns.google.pk or ww1.smart.au and www3.smart.br w1.smart.so ?ques==two&t=p http://website.info?ques==two&t=p https://www.weborwebsite.com and ftp://www.hotmail.br";
echo preg_replace("/(https|http|ftp)\:\/\/|([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4})|([a-z0-9A-Z]+\.[a-zA-Z]{2,4})|\?([a-zA-Z0-9]+[\&\=\#a-z]+)/i", "", $str);

it will return

它会回来

, , , , , , , hello world, working more with ns or and and

回答by dutt

If you want to make that one work you need to make the "https?//" part optional, since you seem to have a fairly good grasp of regexps I won't show you, an excerise for the reader :)

如果你想让它工作,你需要将“https?//”部分设为可选,因为你似乎对正则表达式有相当好的掌握,我不会向你展示,读者的练习:)

But I generally agree with Nev, it's overly complicated for what it does.

但我总体上同意 Nev,它的功能过于复杂。

回答by M Rostami

use this pattern .

使用这种模式。

$regex = "(https?\:\/\/|ftp\:\/\/|www\.|[a-z0-9-]+)+([a-z0-9-]+)\.+([a-z]{2,4})((\/|\.)+([a-z0-9-_.\/]*)$|$)";

回答by Aldo Bassanini

My two cents (five years later!):

我的两分钱(五年后!):

preg_match("/^((https|http|ftp)\:\/\/)?([a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4}|[a-z0-9A-Z]+\.[a-z0-9A-Z]+\.[a-zA-Z]{2,4}|[a-z0-9A-Z]+\.[a-zA-Z]{2,4})$/i", $url)

回答by Andreas Mennel

This works great for me - including mailto check:

这对我很有用 - 包括 mailto 检查:

function LinkIt($text)
{
    $t = preg_replace("/(\b(?:(?:http(s)?|ftp):\/\/|(www\.)))([-a-zü???0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])/im", '<a target="_blank" href="http://" class="external-link" title="External Link"></a>', $text);
    return preg_replace("/([\w+\.\-]+@[\w+\-]+\.[a-zA-Z]{2,4})/im", strtolower('<a href="mailto:" class="mail" title="E-Mail"></a>'), $t);
}