Javascript/Regex 用于仅查找没有子域的根域名
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3439863/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript/Regex for finding just the root domain name without sub domains
提问by jamesmhaley
I had a search and found lot's of similar regex examples, but not quite what I need.
我进行了搜索,发现了很多类似的正则表达式示例,但并不是我所需要的。
I want to be able to pass in the following urls and return the results:
我希望能够传入以下网址并返回结果:
www.google.comreturns google.com
sub.domains.are.cool.google.comreturns google.com
doesntmatterhowlongasubdomainis.idont.wantit.google.comreturns google.com
sub.domain.google.com/no/thanksreturns google.com
www.google.com返回google.com
sub.domains.are.cool.google.com返回google.com
dontmatterhowlongasubdomainis.idont.wantit.google.com返回google.com
sub.domain.google.com/no/thanks返回google.com
Hope that makes sense :) Thanks in advance!-James
希望这是有道理的:) 提前致谢!-詹姆斯
采纳答案by Tatham Oddie
You can't do this with a regular expression because you don't know how many blocks are in the suffix.
您不能使用正则表达式执行此操作,因为您不知道后缀中有多少个块。
For example google.comhas a suffix of com. To get from subdomain.google.comto google.comyou'd have to take the last two blocks - one for the suffix and one for google.
例如google.com的后缀为com。要从subdomain.google.com到google.com,您必须使用最后两个块 - 一个用于后缀,另一个用于google。
If you apply this logic to subdomain.google.co.ukthough you would end up with co.uk.
如果您将此逻辑应用于subdomain.google.co.uk ,尽管您最终会得到co.uk。
You will actually need to look up the suffix from a list like http://publicsuffix.org/
您实际上需要从像http://publicsuffix.org/这样的列表中查找后缀
回答by stormsweeper
Don't use regex, use the .split() method and work from there.
不要使用正则表达式,使用 .split() 方法并从那里开始工作。
var s = domain.split('.');
If your use case is fairly narrow you could then check the TLDs as needed, and then return the last 2 or 3 segments as appropriate:
如果您的用例相当狭窄,您可以根据需要检查 TLD,然后根据需要返回最后 2 或 3 个段:
return s.slice(-2).join('.');
It'll make your eyes bleed less than any regex solution.
它会让你的眼睛流血比任何正则表达式解决方案都少。
回答by theraccoonbear
I've not done a lot of testing on this, but if I understand what you're asking for, this should be a decent starting point...
我没有对此进行大量测试,但是如果我理解您的要求,这应该是一个不错的起点......
([A-Za-z0-9-]+\.([A-Za-z]{3,}|[A-Za-z]{2}\.[A-Za-z]{2}|[A-za-z]{2}))\b
EDIT:
编辑:
To clarify, it's looking for:
为了澄清,它正在寻找:
one or more alpha-numeric characters or dashes, followed by a literal dot
一个或多个字母数字字符或破折号,后跟一个文字点
and then one of three things...
然后是三件事之一......
- three or more alpha characters (i.e. com/net/mil/coop, etc.)
- two alpha characters, followed by a literal dot, followed by two more alphas (i.e. co.uk)
- two alpha characters (i.e. us/uk/to, etc)
- 三个或更多字母字符(即 com/net/mil/coop 等)
- 两个字母字符,后跟一个文字点,再跟两个字母(即 co.uk)
- 两个字母字符(即 us/uk/to 等)
and at the end of that, a word boundary (\b) meaning the end of the string, a space, or a non-word character (in regex word characters are typically alpha-numerics, and underscore).
最后,一个单词边界 (\b) 表示字符串的结尾、一个空格或一个非单词字符(在正则表达式中,单词字符通常是字母数字和下划线)。
As I say, I didn't do much testing, but it seemed a reasonable jumping off point. You'd likely need to try it and tune it some, and even then, it's unlikely that you'll get 100% for all test cases. There are considerations like Unicode domain names and all sorts of technically-valid-but-you'll-likely-not-encounter-in-the-wild things that'll trip up a simple regex like this, but this'll probably get you 90%+ of the way there.
正如我所说,我没有做太多测试,但这似乎是一个合理的起点。您可能需要尝试并对其进行一些调整,即便如此,您也不太可能获得所有测试用例的 100% 测试结果。有一些考虑因素,比如 Unicode 域名和各种技术上有效但你可能不会遇到的东西,这些东西会绊倒这样一个简单的正则表达式,但这可能会得到你 90%+ 的路在那里。
回答by Gajus
If you have limited subset of data, I suggest to keep the regex simple, e.g.
如果您的数据子集有限,我建议保持正则表达式简单,例如
(([a-z\-]+)(?:\.com|\.fr|\.co.uk))
This will match:
这将匹配:
www.google.com --> google.com
www.google.co.uk --> google.co.uk
www.foo-bar.com --> foo-bar.com
In my case, I know that all relevant URLs will be matched using this regex.
就我而言,我知道所有相关的 URL 都将使用此正则表达式进行匹配。
Collect a sample dataset and test it against your regex. While prototyping, you can do that using a tool such https://regex101.com/r/aG9uT0/1. In development, automate it using a test script.
收集示例数据集并针对您的正则表达式对其进行测试。在进行原型设计时,您可以使用诸如https://regex101.com/r/aG9uT0/1 之类的工具来完成。在开发中,使用测试脚本将其自动化。
回答by Emeka
Without testing the validity of top level domain, I'm using an adaptation of stormsweeper's solution:
在没有测试顶级域的有效性的情况下,我使用的是 Stormsweeper 解决方案的改编版:
domain = 'sub.domains.are.cool.google.com'
s = domain.split('.')
tld = s.slice(-2..-1).join('.')

