Javascript：从字符串（包括查询字符串）中提取 URL 并返回数组

Question

提问by SW4

I know this has been asked a thousand times before (apologies), but searching SO/Google etc I am yet to get a conclusive answer.

我知道这之前已经被问过一千次（道歉），但是搜索 SO/Google 等我还没有得到最终的答案。

Basically, I need a JS function which when passed a string, identifies & extracts all URLs based on a regex, returning an array of all found. e.g:

基本上，我需要一个 JS 函数，它在传递字符串时，根据正则表达式识别和提取所有 URL，返回所有找到的数组。例如：

function findUrls(searchText){
    var regex=???
    result= searchText.match(regex);
    if(result){return result;}else{return false;}
}

The function should be able to detect and return any potential urls. I am aware of the inherant difficulties/isses with this (closing parentheses etc), so I have a feeling the process needs to be:

该函数应该能够检测并返回任何潜在的 url。我知道这个固有的困难/问题（右括号等），所以我觉得这个过程需要：

Split the string (searchText) into distinct sections starting/ending) with either nothing, a space or carriage return either side of it, resulting in distinct content chunks, e.g. do a split.

将字符串 ( searchText)拆分为不同的部分，开始/结束），其两侧没有任何内容、空格或回车，从而产生不同的内容块，例如进行拆分。

For each content chunk that results from the split, see whether it fits the logic for a URL of any construction, namely, does it contain a period immediately followed the text (the one constant rule for qualifying a potential URL).

对于拆分产生的每个内容块，查看它是否符合任何结构的 URL 的逻辑，即它是否包含紧跟文本的句点（限定潜在 URL 的一个常量规则）。

The regex should see whether the period is immediately followed by other text, of the type allowable for a tld, directory structure & query string, and preceded by text of the allowable type for a URL.

正则表达式应查看句点后是否紧跟其他文本、tld、目录结构和查询字符串允许的类型，以及 URL 允许类型的文本之前。

I am aware false positives may result, however any returned values will then be checked with a call to the URL itself, so this can be ignored. The other functions I have found often dont return the URLs query string too, if present.

我知道可能会导致误报，但是将通过调用 URL 本身来检查任何返回的值，因此可以忽略它。我发现的其他函数通常也不会返回 URL 查询字符串（如果存在）。

From a block of text, the function should thus be able to return any type of URL, even if it means identifying will.i.am as a valid one!

因此，从文本块中，该函数应该能够返回任何类型的 URL，即使这意味着将 will.i.am 识别为有效的 URL！

eg. http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// etc...and any derivation thereof with a query string should be returned...

例如。http://www.google.com, google.com, www.google.com, http://google.com, ftp.google.com, https:// 等等...及其任何带有查询字符串的派生词应该退回...

Many thanks, apologies again if this exists elsewhere on SO but my searches havent returned it..

非常感谢，如果这在 SO 上的其他地方存在，再次道歉，但我的搜索没有返回它..

Answer 1

回答by chovy

I just use URI.js -- makes it easy.

我只使用 URI.js —— 让它变得简单。

var source = "Hello www.example.com,\n"
    + "http://google.com is a search engine, like http://www.bing.com\n"
    + "http://ex?mple.org/foo.html?baz=la#bumm is an IDN URL,\n"
    + "http://123.123.123.123/foo.html is IPv4 and "
    + "http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html is IPv6.\n"
    + "links can also be in parens (http://example.org) "
    + "or quotes ?http://example.org?.";

var result = URI.withinString(source, function(url) {
    return "<a>" + url + "</a>";
});

/* result is:
Hello <a>www.example.com</a>,
<a>http://google.com</a> is a search engine, like <a>http://www.bing.com</a>
<a>http://ex?mple.org/foo.html?baz=la#bumm</a> is an IDN URL,
<a>http://123.123.123.123/foo.html</a> is IPv4 and <a>http://fe80:0000:0000:0000:0204:61ff:fe9d:f156/foobar.html</a> is IPv6.
links can also be in parens (<a>http://example.org</a>) or quotes ?<a>http://example.org</a>?.
*/

Answer 2

回答by rodneyrehm

You could use the regex from URI.js:

您可以使用URI.js 中的正则表达式：

// gruber revised expression - http://rodneyrehm.de/t/url-regex.html
var uri_pattern = /\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>???“”‘']))/ig;

String#matchand or String#replacemay help…

String#match和或String#replace可能会有所帮助...

Answer 3

回答by Naigel

try this

试试这个

var expression = /[-a-zA-Z0-9@:%_\+.~#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~#?&//=]*)?/gi;

you could use this website to test regexp http://gskinner.com/RegExr/

你可以使用这个网站来测试正则表达式http://gskinner.com/RegExr/

Answer 4

回答by Manoj Selvin

Following regular expression extract URLs from string (inc. query string) and returns array

以下正则表达式从字符串（包括查询字符串）中提取 URL 并返回数组

var url = "asdasdla hakjsdh aaskjdh https://www.google.com/search?q=add+a+element+to+dom+tree&oq=add+a+element+to+dom+tree&aqs=chrome..69i57.7462j1j1&sourceid=chrome&ie=UTF-8 askndajk nakjsdn aksjdnakjsdnkjsn";

var matches = strings.match(/\bhttps?::\/\/\S+/gi) || strings.match(/\bhttps?:\/\/\S+/gi);

Output:

输出：

["https://www.google.com/search?q=format+to+6+digir&…s=chrome..69i57.5983j1j1&sourceid=chrome&ie=UTF-8"]

Note:This handles both http:// with single colon and http::// with double colon in string, vice versa for https, So it's safe for you to use. :)

注意：这可以处理带有单冒号的 http:// 和字符串中带有双冒号的 http://，对于 https 反之亦然，因此您可以安全使用。:)

Javascript：从字符串（包括查询字符串）中提取 URL 并返回数组

提问by SW4

回答by chovy

回答by rodneyrehm

回答by Naigel

回答by Manoj Selvin

相关推荐

最近更新

标签

Javascript：从字符串（包括查询字符串）中提取 URL 并返回数组

提问by SW4

回答by chovy

回答by rodneyrehm

回答by Naigel

回答by Manoj Selvin

相关推荐

javascript 通过 API 调用进行 Google Play 搜索查询的 JSON 输出？

如何让 HTML5 JavaScript 画布识别来自 iphone/ipad 的触摸/点击？

javascript 性能：CSS3 动画与 HTML5 Canvas

javascript 如何防止“您确定要离开此页面吗？” 提交表单时发出警报？

相关推荐

最近更新

标签