基于 Bash 的正则表达式域名验证
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15268987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash based regex domain name validation
提问by Peter
I want to create a script that will add new domains to our DNS Servers. I found that Fully qualified domain name validationREGEX. However, when I use it with sed, it is not working as I would expect:
我想创建一个脚本,将新域添加到我们的 DNS 服务器。我发现完全合格的域名验证REGEX。但是,当我将它与 sed 一起使用时,它并没有像我预期的那样工作:
echo test | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test
echo test.com | sed '/(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(:[a-zA-Z]{2,})$)/p'
--------
Output is:
test.com
I expected that the output of the first command should be a blank line. What do I do wrong?
我希望第一个命令的输出应该是一个空行。我做错了什么?
回答by Doktor J
I find this to be a more comprehensive regex:
我发现这是一个更全面的正则表达式:
(?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$)
(?=^.{4,253}$)(^(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$)
- RFC 1034§3: Allows for a length of 4-253, with the shortest operational domain I'm aware of, "t.co", still matching where the other answers don't. 255 bytes is the maximum length, minus the length octet for each label (TLD and "primary" subdomain) gives us 253:
(?=^.{4,253}$)- RFC 3696§2: Single-letter TLDs aretechnically permitted, meaning the minimum length would be 3, but as there are currently no single-letter TLDs a minimum length of 4 is practical.
- RFC 1034§3: Allows numbers in subdomains, which Conor Clafferty's apparently doesn't (by not distinguishing other subdomains from "primary" subdomains -- i.e. the domain you register -- which the DNS spec doesn't)
- RFC 1034§3: Restricts individual labels to 63 characters, permitting hyphens in the middle while restricting the beginning and end to alphanumerics
(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){,61}[a-zA-Z0-9])?\.) - Requires a two-letter or larger TLD
[a-zA-Z]{2,}- RFC 3696§2: The DNS spec technically permits numerics in the TLD, as well as single-letter TLDs; however, there are currently no single-letter TLDs or TLDs with numbers currently, and all-numeric TLDs are not permitted, so this part of the regex has been simplified.
- RFC 1034§3:允许长度为 4-25 3,使用我所知道的最短操作域“t.co”,仍然匹配其他答案不匹配的地方。255 字节是最大长度,减去每个标签(TLD 和“主要”子域)的长度八位字节给我们 253:
(?=^.{4,253}$)- RFC3696§2:单字母的TLD是技术上允许的,这意味着最小长度将是3,但作为目前没有的单字母TLD的4的最小长度是可行的。
- RFC 1034§3:允许子域中的数字,Conor Clafferty 显然不允许(通过不区分其他子域和“主要”子域——即您注册的域——DNS 规范没有)
- RFC 1034§3:将单个标签限制为 63 个字符,允许在中间使用连字符,同时将开头和结尾限制为字母数字
(?:[a-zA-Z0-9](?:(?:[a-zA-Z0-9\-]){,61}[a-zA-Z0-9])?\.) - 需要两个字母或更大的 TLD
[a-zA-Z]{2,}- RFC 3696§2:DNS 规范在技术上允许 TLD 中的数字以及单字母 TLD;但是,目前没有单字母 TLD 或带有数字的 TLD,并且不允许使用全数字 TLD,因此这部分正则表达式已被简化。
回答by Pilou
You are missing a question mark in your regex :
您的正则表达式中缺少问号:
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)
You can test your regex here
你可以在这里测试你的正则表达式
You can do what you want with grep :
你可以用 grep 做你想做的事:
$ echo test.com | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
test.com
$ echo test | grep -P '(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+(?:[a-zA-Z]{2,})$)'
$
回答by tripleee
No sedimplementation I am aware of supports the various Perl extensions you are using in that regex. Try with Perl or grep -Por pcregrep, or simplify the regex to something sedcan cope with. Here is a quick and dirty adaptation which splits the regex into a script of three different regexes, and rejects when something fails to match (or matches, in the middlemost case).
sed我知道没有实现支持您在该正则表达式中使用的各种 Perl 扩展。尝试使用 Perl 或grep -P或pcregrep,或将正则表达式简化为sed可以处理的内容。这是一个快速而肮脏的改编,它将正则表达式拆分为三个不同正则表达式的脚本,并在某些内容不匹配(或匹配,在最中间的情况下)时拒绝。
echo 'test' | sed -r '/^.{5,254}$/!d
/^([^.]*\.)*[0-9]+\./d # Seems incorrect; 112.com is valid
/^([a-zA-Z0-9_\-]{1,63}\.?)+([a-zA-Z]{2,})$/!d' # should disallow underscore
# also, what's with the question mark after the literal dot?
This also completely fails to accept IDNA domains (which can contain dashes and numbers in the TLD, among other things) so I would definitely not recommend this, but hopefully it shows you how to adapt something like this to sedif you wish to.
这也完全不能接受 IDNA 域(它可以在 TLD 中包含破折号和数字等),所以我绝对不会推荐这个,但希望它向您展示如何适应这样的东西,sed如果你愿意的话。
回答by Bob van Luijt
I use grep -Pto do this.
我grep -P用来做这个。
echo test | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is:
echo www.test.com | grep -P "^[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9](?:\.[a-zA-Z]{2,})+$"
--------
Output is: www.test.com
回答by Conor Clafferty
Pierre-Louis' answer didn't quite work for me. e.g. "kittens" is considered a domain name. I added one slight adjustment to ensure that the domain at least had a dot in it.
Pierre-Louis 的回答对我来说不太适用。例如“小猫”被认为是一个域名。我添加了一个轻微的调整以确保域中至少有一个点。
(?=^.{5,254}$)(^(?:(?!\d+\.)[a-zA-Z0-9_\-]{1,63}\.?)+\.(?:[a-z]{2,})$)
Theres an extra \.just before it reads the last portion of the domain.
\.在它读取域的最后一部分之前有一个额外的。
回答by Dirk Hoffmann
if the domain has to exist you can try:
如果域必须存在,您可以尝试:
$ cat test.sh
#!/bin/bash
for h in "bert" "ernie" "www.google.com"
do
host $h 2>&1 > /dev/null
if [ $? -eq 0 ]
then
echo "$h is a FQDN"
else
echo "$h is not a FQDN"
fi
done
jalderman@mba:/tmp$ ./test.sh
bert is not a FQDN
ernie is not a FQDN
www.google.com is a FQDN

