javascript 正则表达式在文本中查找 url

Question

提问by rodi

I have to find the first url in the text with a regular expression:

我必须使用正则表达式找到文本中的第一个 url：

for example:

例如：

I love this website:http://www.youtube.com/music it's fantastic

or

或者

[ es. http://www.youtube.com/music] text

Answer 1

采纳答案by asthasr

You can't do this perfectly with a regular expression. You may be interested in this blog post. There is a bit more information on Regex Guru, but even those look very fragile. You will need to have additional checks outside of your regular expression to catch the edge cases.

你不能用正则表达式完美地做到这一点。您可能对这篇博文感兴趣。Regex Guru上有更多信息，但即使是那些看起来也很脆弱。您将需要在正则表达式之外进行额外检查以捕获边缘情况。

Answer 2

回答by ridgerunner

I looked into this issue last year and developed a solution that you may want to look at - See: URL Linkification (HTTP/FTP)This link is a test page for the Javascript solution with many examples of difficult-to-linkify URLs.

我去年研究了这个问题并开发了一个您可能想要查看的解决方案 - 请参阅：URL 链接化 (HTTP/FTP)此链接是 Javascript 解决方案的测试页面，其中包含许多难以链接的 URL 示例。

My regex solution, written for both PHP and Javascript - is not simple (but neither is the problem as it turns out.) For more information I would recommend also reading:

我为 PHP 和 Javascript 编写的正则表达式解决方案并不简单（但事实证明问题也不简单。）有关更多信息，我还建议您阅读：

The Problem With URLsby Jeff Atwood, and
An Improved Liberal, Accurate Regex Pattern for Matching URLsby John Gruber

杰夫·阿特伍德 (Jeff Atwood) 的URL 问题和约翰·格鲁伯 (John Gruber)
用于匹配 URL 的改进的自由、准确的正则表达式模式

The comments following Jeff's blog post are a must read if you want to do this right...

如果您想正确地做到这一点，则必须阅读 Jeff 博客文章后面的评论...

Note that this question gets asked a lot. Maybe do a search next time :)

请注意，这个问题经常被问到。也许下次搜索一下:)

Answer 3

回答by Vebjorn Ljosa

Identifying URLs is tricky because they are often surrounded by punctuation marks and because users frequently do not use the full form of the URL. Many JavaScript functions exist for replacing URLs with hyperlinks, but I was unable to find one that works as well as the urlizefilter in the Python-based web framework Django. I therefore ported Django's urlizefunction to JavaScript: https://github.com/ljosa/urlize.js

识别 URL 很棘手，因为它们通常被标点符号包围，而且用户经常不使用 URL 的完整形式。有许多 JavaScript 函数可以用超链接替换 URL，但我无法找到urlize与基于 Python 的 Web 框架 Django 中的过滤器一样有效的函数。因此，我将 Django 的urlize功能移植到 JavaScript：https: //github.com/ljosa/urlize.js

It actually would not pick up the URL in your example because there is a colon right before the URL. But if we modify the example a little:

它实际上不会在您的示例中选择 URL，因为 URL 之前有一个冒号。但是如果我们稍微修改一下例子：

urlize("I love this website: http://www.youtube.com/music it's fantastic", true, true)
=> 'I love this website: <a href="http://www.youtube.com/music" rel="nofollow">http://www.youtube.com/music</a> it&#39;s fantastic"'

Note the second argument which, if true, inserts rel="nofollow"and the third argument which, if true, quotes characters that have special meaning in HTML.

注意第二个参数，如果为真，则插入rel="nofollow"，第三个参数，如果为真，引用在 HTML 中具有特殊含义的字符。

Answer 4

回答by Shashank Agarwal

This might work->

这可能有效->

\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))

Found it somewhere

在某处找到它

Will find links ->

会找到链接->

http://foo.com/blah_blah/

(Something like http://foo.com/blah_blah)

（类似于http://foo.com/blah_blah）

http://foo.com/blah_blah_(wikipedia)

http://foo.com/blah_blah_（维基百科）

Hope this works....

希望这有效....

Answer 5

回答by Pavel Perna

i am using this regex : :) ( its translated ABNF )

我正在使用这个正则表达式 : :) （它的翻译 ABNF ）

[a-zA-Z]([a-zA-Z]|[0-9]|\+|\-|\.)*:\/\/((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:)*@)?(\[((([0-9A-Fa-f]{1,4}:){6}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|::([0-9A-Fa-f]{1,4}:){5}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|([0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){4}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,1}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){3}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:){2}([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}:([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})?::([0-9A-Fa-f]{1,4}:[0-9A-Fa-f]{1,4}|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9]))|(([0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})?::[0-9A-Fa-f]{1,4}|(([0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})?::)|v[0-9A-Fa-f]\.(([a-zA-Z]|[0-9]|-|\.|_|~)|[!$&'\(\)\*\+,;=]|:))\]|(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=])*)(:[0-9]*)?(((\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)*)*|\/((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)*)*)?|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)*)*|(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|@){1}(\/(([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)*)*))?\/?(\?((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?(\#((([a-zA-Z]|[0-9]|-|\.|_|~)|%[0-9A-Fa-f][0-9A-Fa-f]|[!$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?

Answer 6

回答by The90sArtist

You can use the following regex expression for extracting any type of url coming in message.

您可以使用以下正则表达式来提取消息中传入的任何类型的 url。

String regex = "(http(s)?:\/\/.)?(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&/=]*)";

javascript 正则表达式在文本中查找 url

提问by rodi

采纳答案by asthasr

回答by ridgerunner

回答by Vebjorn Ljosa

回答by Shashank Agarwal

回答by Pavel Perna

回答by The90sArtist

相关推荐

最近更新

标签

javascript 正则表达式在文本中查找 url

提问by rodi

采纳答案by asthasr

回答by ridgerunner

回答by Vebjorn Ljosa

回答by Shashank Agarwal

回答by Pavel Perna

回答by The90sArtist

相关推荐

javascript 带有 InfoBox 插件的 Google Maps API v3 事件鼠标悬停

javascript 发送电子邮件的问题

javascript jquery "$(document).ready(function () {" 在 IE 中不起作用

javascript 如何迭代对象中的内部对象/属性

相关推荐

最近更新

标签