JavaScript 正则表达式匹配文本字段中的 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8188645/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 05:01:34  来源:igfitidea点击:

JavaScript Regex to match a URL in a field of text

javascriptjqueryregex

提问by BillPull

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this

如何设置我的正则表达式以测试 URL 是否包含在 javascript 的文本块中。我无法弄清楚用于完成此操作的模式

 var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?"

 var txtfield = $('#msg').val() /*this is a textarea*/

 if ( urlpattern.test(txtfield) ){
        //do something about it
 }

EDIT:

编辑:

So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error

因此,我现在拥有的模式可在正则表达式测试器中用于我需要它执行的操作,但 chrome 会引发错误

  "Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,@?^=%&:/~+#]*[w-@?^=%&/~+#])?/: Range out of order in character class"

for the following code:

对于以下代码:

var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?' );

回答by Code Jockey

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) shouldwork, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.

尽管转义破折号字符(在字符类中可以作为字符范围说明符具有特殊含义)应该可行,但另一种去除其特殊含义的方法是将它们放在类定义的开头或结尾。

In addition, \+and \@in a character class are indeed interpreted as +and @respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.

此外,\+\@在字符类确实解释为+@分别由JavaScript引擎; 但是,转义不是必需的,并且可能会使试图从视觉上解释正则表达式的人感到困惑。

I would recommend the following regex for your purposes:

我会为您的目的推荐以下正则表达式:

(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?

this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):

这可以在 JavaScript 中指定,方法是将其传递给 RegExp 构造函数(就像您在示例中所做的那样):

var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?")

or by directly specifying a regex literal, using the //quoting method:

或直接指定正则表达式文字,使用//引用方法:

var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])?/

The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the //quoting method is more efficient, and is at certain times more readable. Both work.

如果您接受正则表达式作为字符串(例如来自用户输入或 AJAX 调用),则 RegExp 构造函数是必要的,并且可能更具可读性(就像在这种情况下一样)。我相当肯定//引用方法更有效,并且在某些时候更具可读性。两者都有效。

I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the hfrom the httpin the source, it fails to match, as it should!

我使用 Chrome 在 < JSFiddle> 和< RegexLib.com> 上测试了您的原始和此修改,使用客户端正则表达式引擎(浏览器)并特别选择了 JavaScript。虽然第一个因您所说的错误而失败,但我建议的修改成功了。如果我hhttp源中的中删除,它将无法匹配,因为它应该!

Edit

编辑

As noted by @noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:

正如@noa 在评论中所指出的,上面的表达式将不匹配本地网络(非互联网)服务器或任何其他用单个词(例如http://localhost/……或https://sharepoint-test-server/……)访问的服务器。如果需要匹配这种类型的 url(可能是也可能不是),以下可能更合适:

(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,@?^=%&amp;:/~+#-]*[\w@?^=%&amp;/~+#-])?

#------changed----here-------------^

<End Edit>

<结束编辑>

Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info- I highly recommend it if you want to learn regex (both what it can do and what it can't)!

最后,一个很好的资源教会了我 90% 的关于正则表达式的知识是Regular-Expressions.info- 如果你想学习正则表达式(它能做什么和不能做什么),我强烈推荐它!

回答by Toto

You have to escape the backslash when you are using new RegExp.

使用new RegExp.

Also you can put the dash -at the end of character class to avoid escaping it.

您也可以将破折​​号-放在字符类的末尾以避免转义它。

&amp;inside a character class means & or a or m or p or ;, you just need to put &and ;, a, m and pare already match by \w.

&amp;在字符类中意味着& or a or m or p or ;,您只需要放置&;a, m and p都已经匹配了\w

So, your regex becomes:

所以,你的正则表达式变成:

var urlexp = new RegExp( '(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w-.,@?^=%&:/~+#-]*[\w@?^=%&;/~+#-])?' );

回答by Dan Levy

Here's the most complete single URL parsing pattern.

这是最完整的单个 URL 解析模式。

It works with ANY URI/URL in ANY substring!

它适用于任何子字符串中的任何 URI/URL!

https://regex101.com/r/jO8bC4/5

https://regex101.com/r/jO8bC4/5

Example JS code with output - every URL is turned into a 5-part array of its 'parts':

带有输出的示例 JS 代码 - 每个 URL 都变成了一个由 5 部分组成的“部分”数组:

var re = /([a-z]+\:\/+)([^\/\s]*)([a-z0-9\-@\^=%&;\/~\+]*)[\?]?([^ \#]*)#?([^ \#]*)/ig; 
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;

while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    console.log(m);
}

Will give you the following:

会给你以下内容:

["https://www.facebook.com",
  "https://",
  "www.facebook.com",
  "",
  "",
  ""
]

["https://github.com/justsml?tab=activity#top",
  "https://",
  "github.com",
  "/justsml",
  "tab=activity",
  "top"
]

BAM! RegEx FTW!

砰!正则表达式 FTW!

回答by matthiasmullie

I've cleaned up your regex:

我已经清理了你的正则表达式:

var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,@\?^=%&;:/~\+#]*[a-z0-9\-@\?^=%&;/~\+#])?', 'i');

Tested and works just fine ;)

经过测试并且工作正常;)

回答by Khadijah J Shtayat

Try this general regex for many URL format

为许多 URL 格式尝试这个通用正则表达式

/(([A-Za-z]{3,9})://)?([-;:&=\+$,\w]+@{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%@\.\w]+)?(#[\w]+)?)?/g

回答by Vinit

try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

尝试 (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?

回答by PotatoEngineer

The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

问题在于字符类(括号)中的“-”被解析为一个范围:[az] 表示“a 和 z 之间的任何字符”。正如 Vini-T 建议的那样,您需要使用反斜杠对字符类中的“-”字符进行转义。

回答by Tolga ?skender

try this worked for me

试试这对我有用

/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/

that is so simple and understandable

如此简单易懂