Javascript 从字符串中提取主机名

Question

提问by Chamilyan

I would like to match just the root of a URL and not the whole URL from a text string. Given:

我只想匹配 URL 的根，而不是文本字符串中的整个 URL。鉴于：

http://www.youtube.com/watch?v=ClkQA2Lb_iE
http://youtu.be/ClkQA2Lb_iE
http://www.example.com/12xy45
http://example.com/random

I want to get the 2 last instances resolving to the www.example.comor example.comdomain.

我想将最后两个实例解析为www.example.comorexample.com域。

I heard regex is slow and this would be my second regex expression on the page so If there is anyway to do it without regex let me know.

我听说正则表达式很慢，这将是我在页面上的第二个正则表达式，所以如果没有正则表达式，请告诉我。

I'm seeking a JS/jQuery version of this solution.

我正在寻找此解决方案的 JS/jQuery 版本。

Answer 1

回答by Filip Roséen - refp

A neat trick without using regular expressions:

一个不使用正则表达式的巧妙技巧：

var tmp        = document.createElement ('a');
;   tmp.href   = "http://www.example.com/12xy45";

// tmp.hostname will now contain 'www.example.com'
// tmp.host will now contain hostname and port 'www.example.com:80'

Wrap the above in a function such as the below and you have yourself a superb way of snatching the domain part out of an URI.

将上述内容包装在如下函数中，您就拥有了一种从 URI 中获取域部分的绝妙方法。

function url_domain(data) {
  var    a      = document.createElement('a');
         a.href = data;
  return a.hostname;
}

Answer 2

回答by lewdev

I recommend using the npm package psl (Public Suffix List). The "Public Suffix List" is a list of all valid domain suffixes and rules, not just Country Code Top-Level domains, but unicode characters as well that would be considered the root domain (i.e. www.食狮.公司.cn, b.c.kobe.jp, etc.). Read more about it here.

我推荐使用 npm 包psl (Public Suffix List)。“公共后缀列表”是所有有效域名后缀和规则的列表，不仅是国家代码顶级域名，还有被视为根域名的unicode字符（即www.食狮.公司.cn，bckobe .jp 等）。在此处阅读更多相关信息。

Try:

尝试：

npm install --save psl

Then with my "extractHostname" implementation run:

然后用我的“extractHostname”实现运行：

let psl = require('psl');
let url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
psl.get(extractHostname(url)); // returns youtube.com

I can't use an npm package, so below only tests extractHostname.

我不能使用 npm 包，所以下面只测试extractHostname。

function extractHostname(url) {
    var hostname;
    //find & remove protocol (http, ftp, etc.) and get hostname

    if (url.indexOf("//") > -1) {
        hostname = url.split('/')[2];
    }
    else {
        hostname = url.split('/')[0];
    }

    //find & remove port number
    hostname = hostname.split(':')[0];
    //find & remove "?"
    hostname = hostname.split('?')[0];

    return hostname;
}

//test the code
console.log("== Testing extractHostname: ==");
console.log(extractHostname("http://www.blog.classroom.me.uk/index.php"));
console.log(extractHostname("http://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("https://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("ftps://ftp.websitename.com/dir/file.txt"));
console.log(extractHostname("websitename.com:1234/dir/file.txt"));
console.log(extractHostname("ftps://websitename.com:1234/dir/file.txt"));
console.log(extractHostname("example.com?param=value"));
console.log(extractHostname("https://facebook.github.io/jest/"));
console.log(extractHostname("//youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("http://localhost:4200/watch?v=ClkQA2Lb_iE"));

Regardless having the protocol or even port number, you can extract the domain. This is a very simplified, non-regex solution, so I think this will do.

无论有协议甚至端口号，您都可以提取域。这是一个非常简化的非正则表达式解决方案，所以我认为这样做可以。

*Thank you @Timmerz, @rentheitroadb, @rineez, @BigDong, @ra00l, @ILikeBeansTacos, @CharlesRobertson for your suggestions! @ross-allen, thank you for reporting the bug!

*感谢@Timmerz、@rentheitroadb、@rineez、@BigDong、@ra00l、@ILikeBeansTacos、@CharlesRobertson 的建议！@ross-allen，感谢您报告错误！

Answer 3

回答by Pavlo

There is no need to parse the string, just pass your URL as an argument to URLconstructor:

无需解析字符串，只需将您的 URL 作为参数传递给URL构造函数：

var url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
var hostname = (new URL(url)).hostname;

assert(hostname === 'www.youtube.com');

Answer 4

回答by gilly3

Try this:

尝试这个：

var matches = url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i);
var domain = matches && matches[1];  // domain will be null if no match is found

If you want to exclude the port from your result, use this expression instead:

如果要从结果中排除端口，请改用以下表达式：

/^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/i

Edit:To prevent specific domains from matching, use a negative lookahead. (?!youtube.com)

编辑：要防止特定域匹配，请使用负前瞻。(?!youtube.com)

/^https?\:\/\/(?!(?:www\.)?(?:youtube\.com|youtu\.be))([^\/:?#]+)(?:[\/:?#]|$)/i

Answer 5

回答by Andrew White

Parsing a URL can be tricky because you can have port numbers and special chars. As such, I recommend using something like parseUrito do this for you. I doubt performance is going to be a issue unless you are parsing hundreds of URLs.

解析 URL 可能很棘手，因为您可以拥有端口号和特殊字符。因此，我建议使用parseUri 之类的东西来为你做这件事。我怀疑除非您解析数百个 URL，否则性能将成为一个问题。

Answer 6

回答by Robin Métral

Use `URL.hostname`for readability

使用`URL.hostname`的可读性

In the Babel era, the cleanest and easiest solution is to use URL.hostname.

在 Babel 时代，最干净、最简单的解决方案是使用URL.hostname.

const getHostname = (url) => {
  // use URL constructor and return hostname
  return new URL(url).hostname;
}

// tests
console.log(getHostname("https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/"));
console.log(getHostname("https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname"));

URL.hostnameis part of the URL API, supported by all major browsers except IE (caniuse). Use a URL polyfillif you need to support legacy browsers.

URL.hostname是URL API 的一部分，除 IE ( caniuse)外，所有主流浏览器都支持。如果您需要支持旧浏览器，请使用URL polyfill。

Using this solution will also give you access to other URL properties and methods. This will be useful if you also want to extract the URL's pathnameor query string params, for example.

使用此解决方案还可以让您访问其他URL 属性和方法。例如，如果您还想提取 URL 的路径名或查询字符串 params，这将非常有用。

Use RegEx for performance

使用 RegEx 提高性能

URL.hostnameis faster than using the anchor solutionor parseUri. However it's still much slower than gilly3's regex:

URL.hostname比使用锚解决方案或parseUri 更快。但是它仍然比gilly3的正则表达式慢得多：

const getHostnameFromRegex = (url) => {
  // run against regex
  const matches = url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i);
  // extract hostname (will be null if no match is found)
  return matches && matches[1];
}

// tests
console.log(getHostnameFromRegex("https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/"));
console.log(getHostnameFromRegex("https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname"));

Test it yourself on this jsPerf

自己在这个jsPerf上测试一下

If you need to process a very large number of URLs (where performance would be a factor), I recommend using this solution instead. Otherwise, choose URL.hostnamefor readability.

如果您需要处理大量 URL（性能将是一个因素），我建议改用此解决方案。否则，选择URL.hostname可读性。

Answer 7

回答by BlackDivine

I tried to use the Given solutions, the Chosen one was an overkill for my purpose and "Creating a element" one messes up for me.

我尝试使用 Given 解决方案，Chosen 解决方案对我的目的来说太过分了，而“创建元素”对我来说却是一团糟。

It's not ready for Port in URL yet. I hope someone finds it useful

它还没有为 URL 中的端口做好准备。我希望有人觉得它有用

function parseURL(url){
    parsed_url = {}

    if ( url == null || url.length == 0 )
        return parsed_url;

    protocol_i = url.indexOf('://');
    parsed_url.protocol = url.substr(0,protocol_i);

    remaining_url = url.substr(protocol_i + 3, url.length);
    domain_i = remaining_url.indexOf('/');
    domain_i = domain_i == -1 ? remaining_url.length - 1 : domain_i;
    parsed_url.domain = remaining_url.substr(0, domain_i);
    parsed_url.path = domain_i == -1 || domain_i + 1 == remaining_url.length ? null : remaining_url.substr(domain_i + 1, remaining_url.length);

    domain_parts = parsed_url.domain.split('.');
    switch ( domain_parts.length ){
        case 2:
          parsed_url.subdomain = null;
          parsed_url.host = domain_parts[0];
          parsed_url.tld = domain_parts[1];
          break;
        case 3:
          parsed_url.subdomain = domain_parts[0];
          parsed_url.host = domain_parts[1];
          parsed_url.tld = domain_parts[2];
          break;
        case 4:
          parsed_url.subdomain = domain_parts[0];
          parsed_url.host = domain_parts[1];
          parsed_url.tld = domain_parts[2] + '.' + domain_parts[3];
          break;
    }

    parsed_url.parent_domain = parsed_url.host + '.' + parsed_url.tld;

    return parsed_url;
}

Running this:

运行这个：

parseURL('https://www.facebook.com/100003379429021_356001651189146');

Result:

结果：

Object {
    domain : "www.facebook.com",
    host : "facebook",
    path : "100003379429021_356001651189146",
    protocol : "https",
    subdomain : "www",
    tld : "com"
}

Answer 8

回答by Luis Lopes

If you end up on this page and you are looking for the best REGEX of URLS try this one:

如果您最终出现在此页面上并且正在寻找最佳的 URL 正则表达式，请尝试以下方法：

^(?:https?:)?(?:\/\/)?([^\/\?]+)

https://regex101.com/r/pX5dL9/1

It works for urls without http:// , with http, with https, with just // and dont grab the path and query path as well.

它适用于没有 http:// 的网址，有 http，有 https，只有 // 并且不抓取路径和查询路径。

Good Luck

祝你好运

Answer 9

回答by whitneyland

All url properties, no dependencies, no JQuery, easy to understand

所有url属性，无依赖，无JQuery，简单易懂

This solution gives your answer plus additional properties. No JQuery or other dependencies required, paste and go.

此解决方案提供您的答案以及其他属性。不需要 JQuery 或其他依赖项，粘贴即可。

Usage

用法

getUrlParts("https://news.google.com/news/headlines/technology.html?ned=us&hl=en")

Output

输出

{
  "origin": "https://news.google.com",
  "domain": "news.google.com",
  "subdomain": "news",
  "domainroot": "google.com",
  "domainpath": "news.google.com/news/headlines",
  "tld": ".com",
  "path": "news/headlines/technology.html",
  "query": "ned=us&hl=en",
  "protocol": "https",
  "port": 443,
  "parts": [
    "news",
    "google",
    "com"
  ],
  "segments": [
    "news",
    "headlines",
    "technology.html"
  ],
  "params": [
    {
      "key": "ned",
      "val": "us"
    },
    {
      "key": "hl",
      "val": "en"
    }
  ]
}

Code
The code is designed to be easy to understand rather than super fast. It can be called easily 100 times per second, so it's great for front end or a few server usages, but not for high volume throughput.

代码
代码旨在易于理解而不是超快。它每秒可以轻松调用 100 次，因此它非常适合前端或少数服务器使用，但不适用于高容量吞吐量。

function getUrlParts(fullyQualifiedUrl) {
    var url = {},
        tempProtocol
    var a = document.createElement('a')
    // if doesn't start with something like https:// it's not a url, but try to work around that
    if (fullyQualifiedUrl.indexOf('://') == -1) {
        tempProtocol = 'https://'
        a.href = tempProtocol + fullyQualifiedUrl
    } else
        a.href = fullyQualifiedUrl
    var parts = a.hostname.split('.')
    url.origin = tempProtocol ? "" : a.origin
    url.domain = a.hostname
    url.subdomain = parts[0]
    url.domainroot = ''
    url.domainpath = ''
    url.tld = '.' + parts[parts.length - 1]
    url.path = a.pathname.substring(1)
    url.query = a.search.substr(1)
    url.protocol = tempProtocol ? "" : a.protocol.substr(0, a.protocol.length - 1)
    url.port = tempProtocol ? "" : a.port ? a.port : a.protocol === 'http:' ? 80 : a.protocol === 'https:' ? 443 : a.port
    url.parts = parts
    url.segments = a.pathname === '/' ? [] : a.pathname.split('/').slice(1)
    url.params = url.query === '' ? [] : url.query.split('&')
    for (var j = 0; j < url.params.length; j++) {
        var param = url.params[j];
        var keyval = param.split('=')
        url.params[j] = {
            'key': keyval[0],
            'val': keyval[1]
        }
    }
    // domainroot
    if (parts.length > 2) {
        url.domainroot = parts[parts.length - 2] + '.' + parts[parts.length - 1];
        // check for country code top level domain
        if (parts[parts.length - 1].length == 2 && parts[parts.length - 1].length == 2)
            url.domainroot = parts[parts.length - 3] + '.' + url.domainroot;
    }
    // domainpath (domain+path without filenames) 
    if (url.segments.length > 0) {
        var lastSegment = url.segments[url.segments.length - 1]
        var endsWithFile = lastSegment.indexOf('.') != -1
        if (endsWithFile) {
            var fileSegment = url.path.indexOf(lastSegment)
            var pathNoFile = url.path.substr(0, fileSegment - 1)
            url.domainpath = url.domain
            if (pathNoFile)
                url.domainpath = url.domainpath + '/' + pathNoFile
        } else
            url.domainpath = url.domain + '/' + url.path
    } else
        url.domainpath = url.domain
    return url
}

Answer 10

回答by portik

Just use the URL() constructor:

只需使用URL() 构造函数：

new URL(url).host

Javascript 从字符串中提取主机名

提问by Chamilyan

回答by Filip Roséen - refp

回答by lewdev

回答by Pavlo

回答by gilly3

回答by Andrew White

回答by Robin Métral

Use `URL.hostname`for readability

使用`URL.hostname`的可读性

Use RegEx for performance

使用 RegEx 提高性能

回答by BlackDivine

回答by Luis Lopes

回答by whitneyland

All url properties, no dependencies, no JQuery, easy to understand

所有url属性，无依赖，无JQuery，简单易懂

回答by portik

相关推荐

最近更新

标签

Javascript 从字符串中提取主机名

提问by Chamilyan

回答by Filip Roséen - refp

回答by lewdev

回答by Pavlo

回答by gilly3

回答by Andrew White

回答by Robin Métral

Use URL.hostnamefor readability

使用URL.hostname的可读性

Use RegEx for performance

使用 RegEx 提高性能

回答by BlackDivine

回答by Luis Lopes

回答by whitneyland

All url properties, no dependencies, no JQuery, easy to understand

所有url属性，无依赖，无JQuery，简单易懂

回答by portik

相关推荐

Javascript 如何循环遍历 JSON 数组？

Javascript jquery-ui datepicker 更改 z-index

Javascript 如何从jquery中的这个日期减去一周？

Javascript 单击按钮时使用 jQuery 播放音频文件

相关推荐

最近更新

标签

Use `URL.hostname`for readability

使用`URL.hostname`的可读性