Javascript 从字符串中提取主机名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8498592/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Extract hostname name from string
提问by Chamilyan
I would like to match just the root of a URL and not the whole URL from a text string. Given:
我只想匹配 URL 的根,而不是文本字符串中的整个 URL。鉴于:
http://www.youtube.com/watch?v=ClkQA2Lb_iE
http://youtu.be/ClkQA2Lb_iE
http://www.example.com/12xy45
http://example.com/random
I want to get the 2 last instances resolving to the www.example.com
or example.com
domain.
我想将最后两个实例解析为www.example.com
orexample.com
域。
I heard regex is slow and this would be my second regex expression on the page so If there is anyway to do it without regex let me know.
我听说正则表达式很慢,这将是我在页面上的第二个正则表达式,所以如果没有正则表达式,请告诉我。
I'm seeking a JS/jQuery version of this solution.
我正在寻找此解决方案的 JS/jQuery 版本。
回答by Filip Roséen - refp
A neat trick without using regular expressions:
一个不使用正则表达式的巧妙技巧:
var tmp = document.createElement ('a');
; tmp.href = "http://www.example.com/12xy45";
// tmp.hostname will now contain 'www.example.com'
// tmp.host will now contain hostname and port 'www.example.com:80'
Wrap the above in a function such as the below and you have yourself a superb way of snatching the domain part out of an URI.
将上述内容包装在如下函数中,您就拥有了一种从 URI 中获取域部分的绝妙方法。
function url_domain(data) {
var a = document.createElement('a');
a.href = data;
return a.hostname;
}
回答by lewdev
I recommend using the npm package psl (Public Suffix List). The "Public Suffix List" is a list of all valid domain suffixes and rules, not just Country Code Top-Level domains, but unicode characters as well that would be considered the root domain (i.e. www.食狮.公司.cn, b.c.kobe.jp, etc.). Read more about it here.
我推荐使用 npm 包psl (Public Suffix List)。“公共后缀列表”是所有有效域名后缀和规则的列表,不仅是国家代码顶级域名,还有被视为根域名的unicode字符(即www.食狮.公司.cn,bckobe .jp 等)。在此处阅读更多相关信息。
Try:
尝试:
npm install --save psl
Then with my "extractHostname" implementation run:
然后用我的“extractHostname”实现运行:
let psl = require('psl');
let url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
psl.get(extractHostname(url)); // returns youtube.com
I can't use an npm package, so below only tests extractHostname.
我不能使用 npm 包,所以下面只测试extractHostname。
function extractHostname(url) {
var hostname;
//find & remove protocol (http, ftp, etc.) and get hostname
if (url.indexOf("//") > -1) {
hostname = url.split('/')[2];
}
else {
hostname = url.split('/')[0];
}
//find & remove port number
hostname = hostname.split(':')[0];
//find & remove "?"
hostname = hostname.split('?')[0];
return hostname;
}
//test the code
console.log("== Testing extractHostname: ==");
console.log(extractHostname("http://www.blog.classroom.me.uk/index.php"));
console.log(extractHostname("http://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("https://www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("www.youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("ftps://ftp.websitename.com/dir/file.txt"));
console.log(extractHostname("websitename.com:1234/dir/file.txt"));
console.log(extractHostname("ftps://websitename.com:1234/dir/file.txt"));
console.log(extractHostname("example.com?param=value"));
console.log(extractHostname("https://facebook.github.io/jest/"));
console.log(extractHostname("//youtube.com/watch?v=ClkQA2Lb_iE"));
console.log(extractHostname("http://localhost:4200/watch?v=ClkQA2Lb_iE"));
Regardless having the protocol or even port number, you can extract the domain. This is a very simplified, non-regex solution, so I think this will do.
无论有协议甚至端口号,您都可以提取域。这是一个非常简化的非正则表达式解决方案,所以我认为这样做可以。
*Thank you @Timmerz, @rentheitroadb, @rineez, @BigDong, @ra00l, @ILikeBeansTacos, @CharlesRobertson for your suggestions! @ross-allen, thank you for reporting the bug!
*感谢@Timmerz、@rentheitroadb、@rineez、@BigDong、@ra00l、@ILikeBeansTacos、@CharlesRobertson 的建议!@ross-allen,感谢您报告错误!
回答by Pavlo
There is no need to parse the string, just pass your URL as an argument to URL
constructor:
无需解析字符串,只需将您的 URL 作为参数传递给URL
构造函数:
var url = 'http://www.youtube.com/watch?v=ClkQA2Lb_iE';
var hostname = (new URL(url)).hostname;
assert(hostname === 'www.youtube.com');
回答by gilly3
Try this:
尝试这个:
var matches = url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i);
var domain = matches && matches[1]; // domain will be null if no match is found
If you want to exclude the port from your result, use this expression instead:
如果要从结果中排除端口,请改用以下表达式:
/^https?\:\/\/([^\/:?#]+)(?:[\/:?#]|$)/i
Edit:To prevent specific domains from matching, use a negative lookahead. (?!youtube.com)
编辑:要防止特定域匹配,请使用负前瞻。(?!youtube.com)
/^https?\:\/\/(?!(?:www\.)?(?:youtube\.com|youtu\.be))([^\/:?#]+)(?:[\/:?#]|$)/i
回答by Andrew White
Parsing a URL can be tricky because you can have port numbers and special chars. As such, I recommend using something like parseUrito do this for you. I doubt performance is going to be a issue unless you are parsing hundreds of URLs.
解析 URL 可能很棘手,因为您可以拥有端口号和特殊字符。因此,我建议使用parseUri 之类的东西来为你做这件事。我怀疑除非您解析数百个 URL,否则性能将成为一个问题。
回答by Robin Métral
Use URL.hostname
for readability
使用URL.hostname
的可读性
In the Babel era, the cleanest and easiest solution is to use URL.hostname
.
在 Babel 时代,最干净、最简单的解决方案是使用URL.hostname
.
const getHostname = (url) => {
// use URL constructor and return hostname
return new URL(url).hostname;
}
// tests
console.log(getHostname("https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/"));
console.log(getHostname("https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname"));
URL.hostname
is part of the URL API, supported by all major browsers except IE (caniuse). Use a URL polyfillif you need to support legacy browsers.
URL.hostname
是URL API 的一部分,除 IE ( caniuse)外,所有主流浏览器都支持。如果您需要支持旧浏览器,请使用URL polyfill。
Using this solution will also give you access to other URL properties and methods. This will be useful if you also want to extract the URL's pathnameor query string params, for example.
使用此解决方案还可以让您访问其他URL 属性和方法。例如,如果您还想提取 URL 的路径名或查询字符串 params,这将非常有用。
Use RegEx for performance
使用 RegEx 提高性能
URL.hostname
is faster than using the anchor solutionor parseUri. However it's still much slower than gilly3's regex:
URL.hostname
比使用锚解决方案或parseUri 更快。但是它仍然比gilly3的正则表达式慢得多:
const getHostnameFromRegex = (url) => {
// run against regex
const matches = url.match(/^https?\:\/\/([^\/?#]+)(?:[\/?#]|$)/i);
// extract hostname (will be null if no match is found)
return matches && matches[1];
}
// tests
console.log(getHostnameFromRegex("https://stackoverflow.com/questions/8498592/extract-hostname-name-from-string/"));
console.log(getHostnameFromRegex("https://developer.mozilla.org/en-US/docs/Web/API/URL/hostname"));
Test it yourself on this jsPerf
自己在这个jsPerf上测试一下
If you need to process a very large number of URLs (where performance would be a factor), I recommend using this solution instead. Otherwise, choose URL.hostname
for readability.
如果您需要处理大量 URL(性能将是一个因素),我建议改用此解决方案。否则,选择URL.hostname
可读性。
回答by BlackDivine
I tried to use the Given solutions, the Chosen one was an overkill for my purpose and "Creating a element" one messes up for me.
我尝试使用 Given 解决方案,Chosen 解决方案对我的目的来说太过分了,而“创建元素”对我来说却是一团糟。
It's not ready for Port in URL yet. I hope someone finds it useful
它还没有为 URL 中的端口做好准备。我希望有人觉得它有用
function parseURL(url){
parsed_url = {}
if ( url == null || url.length == 0 )
return parsed_url;
protocol_i = url.indexOf('://');
parsed_url.protocol = url.substr(0,protocol_i);
remaining_url = url.substr(protocol_i + 3, url.length);
domain_i = remaining_url.indexOf('/');
domain_i = domain_i == -1 ? remaining_url.length - 1 : domain_i;
parsed_url.domain = remaining_url.substr(0, domain_i);
parsed_url.path = domain_i == -1 || domain_i + 1 == remaining_url.length ? null : remaining_url.substr(domain_i + 1, remaining_url.length);
domain_parts = parsed_url.domain.split('.');
switch ( domain_parts.length ){
case 2:
parsed_url.subdomain = null;
parsed_url.host = domain_parts[0];
parsed_url.tld = domain_parts[1];
break;
case 3:
parsed_url.subdomain = domain_parts[0];
parsed_url.host = domain_parts[1];
parsed_url.tld = domain_parts[2];
break;
case 4:
parsed_url.subdomain = domain_parts[0];
parsed_url.host = domain_parts[1];
parsed_url.tld = domain_parts[2] + '.' + domain_parts[3];
break;
}
parsed_url.parent_domain = parsed_url.host + '.' + parsed_url.tld;
return parsed_url;
}
Running this:
运行这个:
parseURL('https://www.facebook.com/100003379429021_356001651189146');
Result:
结果:
Object {
domain : "www.facebook.com",
host : "facebook",
path : "100003379429021_356001651189146",
protocol : "https",
subdomain : "www",
tld : "com"
}
回答by Luis Lopes
If you end up on this page and you are looking for the best REGEX of URLS try this one:
如果您最终出现在此页面上并且正在寻找最佳的 URL 正则表达式,请尝试以下方法:
^(?:https?:)?(?:\/\/)?([^\/\?]+)
https://regex101.com/r/pX5dL9/1
https://regex101.com/r/pX5dL9/1
It works for urls without http:// , with http, with https, with just // and dont grab the path and query path as well.
它适用于没有 http:// 的网址,有 http,有 https,只有 // 并且不抓取路径和查询路径。
Good Luck
祝你好运
回答by whitneyland
All url properties, no dependencies, no JQuery, easy to understand
所有url属性,无依赖,无JQuery,简单易懂
This solution gives your answer plus additional properties. No JQuery or other dependencies required, paste and go.
此解决方案提供您的答案以及其他属性。不需要 JQuery 或其他依赖项,粘贴即可。
Usage
用法
getUrlParts("https://news.google.com/news/headlines/technology.html?ned=us&hl=en")
Output
输出
{
"origin": "https://news.google.com",
"domain": "news.google.com",
"subdomain": "news",
"domainroot": "google.com",
"domainpath": "news.google.com/news/headlines",
"tld": ".com",
"path": "news/headlines/technology.html",
"query": "ned=us&hl=en",
"protocol": "https",
"port": 443,
"parts": [
"news",
"google",
"com"
],
"segments": [
"news",
"headlines",
"technology.html"
],
"params": [
{
"key": "ned",
"val": "us"
},
{
"key": "hl",
"val": "en"
}
]
}
Code
The code is designed to be easy to understand rather than super fast. It can be called easily 100 times per second, so it's great for front end or a few server usages, but not for high volume throughput.
代码
代码旨在易于理解而不是超快。它每秒可以轻松调用 100 次,因此它非常适合前端或少数服务器使用,但不适用于高容量吞吐量。
function getUrlParts(fullyQualifiedUrl) {
var url = {},
tempProtocol
var a = document.createElement('a')
// if doesn't start with something like https:// it's not a url, but try to work around that
if (fullyQualifiedUrl.indexOf('://') == -1) {
tempProtocol = 'https://'
a.href = tempProtocol + fullyQualifiedUrl
} else
a.href = fullyQualifiedUrl
var parts = a.hostname.split('.')
url.origin = tempProtocol ? "" : a.origin
url.domain = a.hostname
url.subdomain = parts[0]
url.domainroot = ''
url.domainpath = ''
url.tld = '.' + parts[parts.length - 1]
url.path = a.pathname.substring(1)
url.query = a.search.substr(1)
url.protocol = tempProtocol ? "" : a.protocol.substr(0, a.protocol.length - 1)
url.port = tempProtocol ? "" : a.port ? a.port : a.protocol === 'http:' ? 80 : a.protocol === 'https:' ? 443 : a.port
url.parts = parts
url.segments = a.pathname === '/' ? [] : a.pathname.split('/').slice(1)
url.params = url.query === '' ? [] : url.query.split('&')
for (var j = 0; j < url.params.length; j++) {
var param = url.params[j];
var keyval = param.split('=')
url.params[j] = {
'key': keyval[0],
'val': keyval[1]
}
}
// domainroot
if (parts.length > 2) {
url.domainroot = parts[parts.length - 2] + '.' + parts[parts.length - 1];
// check for country code top level domain
if (parts[parts.length - 1].length == 2 && parts[parts.length - 1].length == 2)
url.domainroot = parts[parts.length - 3] + '.' + url.domainroot;
}
// domainpath (domain+path without filenames)
if (url.segments.length > 0) {
var lastSegment = url.segments[url.segments.length - 1]
var endsWithFile = lastSegment.indexOf('.') != -1
if (endsWithFile) {
var fileSegment = url.path.indexOf(lastSegment)
var pathNoFile = url.path.substr(0, fileSegment - 1)
url.domainpath = url.domain
if (pathNoFile)
url.domainpath = url.domainpath + '/' + pathNoFile
} else
url.domainpath = url.domain + '/' + url.path
} else
url.domainpath = url.domain
return url
}