Javascript:从字符串(包括查询字符串)中提取 URL 并返回数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11209016/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 12:24:30  来源:igfitidea点击:

Javascript: extract URLs from string (inc. querystring) and return array

javascriptjqueryparsingurlextract

提问by SW4

I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.

我知道这之前已经被问过一千次(道歉),但是搜索 SO/Google 等我还没有得到最终的答案。

Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:

基本上,我需要一个 JS 函数,它在传递字符串时,根据正则表达式识别和提取所有 URL,返回所有找到的数组。例如:

function findUrls(searchText){
    var regex=???
    result= searchText.match(regex);
    if(result){return result;}else{return false;}
}

The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:

该函数应该能够检测并返回任何潜在的 url。我知道这个固有的困难/问题(右括号等),所以我觉得这个过程需要:

Split the string (searchText) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.

将字符串 ( searchText)拆分为不同的部分,开始/结束),其两侧没有任何内容、空格或回车,从而产生不同的内容块,例如进行拆分。

For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).

对于拆分产生的每个内容块,查看它是否符合任何结构的 URL 的逻辑,即它是否包含紧跟文本的句点(限定潜在 URL 的一个常量规则)。

The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.

正则表达式应查看句点后是否紧跟其他文本、tld、目录结构和查询字符串允许的类型,以及 URL 允许类型的文本之前。

I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.

我知道可能会导致误报,但是将通过调用 URL 本身来检查任何返回的值,因此可以忽略它。我发现的其他函数通常也不会返回 URL 查询字符串(如果存在)。

From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!

因此,从文本块中,该函数应该能够返回任何类型的 URL,即使这意味着将 will.i.am 识别为有效的 URL!

eg. http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// etc...and any derivation thereof with a query string should be returned...

例如。http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// 等等...及其任何带有查询字符串的派生词应该退回...

Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..

非常感谢,如果这在 SO 上的其他地方存在,再次道歉,但我的搜索没有返回它..

回答by chovy

I just use URI.js -- makes it easy.

我只使用 URI.js —— 让它变得简单。

var source = "Hello www.example.com,\n"
    + "http://google.com is a search engine, like http://www.bing.com\n"
    + "http://ex?mple.org/foo.html?baz=la#bumm is an IDN URL,\n"
    + "http://123.123.123.123/foo.html is IPv4 and "
    + "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n"
    + "links can also be in parens (http://example.org) "
    + "or quotes ?http://example.org?.";

var result = URI.withinString(source, function(url) {
    return "<a>" + url + "</a>";
});

/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://ex?mple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes ?<a>http://example.org</a>?.
*/

回答by rodneyrehm

You could use the regex from URI.js:

您可以使用URI.js 中的正则表达式:

// gruber revised expression - http://rodneyrehm.de/t/url-regex.html
var uri_pattern = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>???“”‘']))/ig;

String#matchand or String#replacemay help…

String#match和或String#replace可能会有所帮助...

回答by Naigel

try this

试试这个

var expression = /[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi;

you could use this website to test regexp http://gskinner.com/RegExr/

你可以使用这个网站来测试正则表达式http://gskinner.com/RegExr/

回答by Manoj Selvin

Following regular expression extract URLs from string (inc. query string) and returns array

以下正则表达式从字符串(包括查询字符串)中提取 URL 并返回数组

var url = "asdasdla hakjsdh aaskjdh https://www.google.com/search?q=add+a+element+to+dom+tree&oq=add+a+element+to+dom+tree&aqs=chrome..69i57.7462j1j1&sourceid=chrome&ie=UTF-8 askndajk nakjsdn aksjdnakjsdnkjsn";

var matches = strings.match(/\bhttps?::\/\/\S+/gi) || strings.match(/\bhttps?:\/\/\S+/gi);

Output:

输出:

["https://www.google.com/search?q=format+to+6+digir&…s=chrome..69i57.5983j1j1&sourceid=chrome&ie=UTF-8"]

Note:This handles both http:// with single colon and http::// with double colon in string, vice versa for https, So it's safe for you to use. :)

注意:这可以处理带有单冒号的 http:// 和字符串中带有双冒号的 http://,对于 https 反之亦然,因此您可以安全使用。:)