python 从主机名中提取域名

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/825694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:55:06  来源:igfitidea点击:

Extract domain name from a host name

pythondnshostname

提问by

Is there a programatic way to find the domain name from a given hostname?

是否有一种编程方式可以从给定的主机名中查找域名?

given -> www.yahoo.co.jp return -> yahoo.co.jp

给定 -> www.yahoo.co.jp 返回 -> yahoo.co.jp

The approach that works but is very slow is:

有效但速度很慢的方法是:

split on "." and remove 1 group from the left, join and query an SOA record using dnspython when a valid SOA record is returned, consider that a domain

拆分为“。” 并从左侧删除 1 个组,当返回有效的 SOA 记录时,使用 dnspython 加入和查询 SOA 记录,考虑一个域

Is there a cleaner/faster way to do this without using regexps?

有没有更干净/更快的方法来做到这一点而不使用正则表达式?

回答by Alnitak

There's no trivial definition of which "domain name" is the parent of any particular "host name".

对于哪个“域名”是任何特定“主机名”的父级,没有简单的定义。

Your current method of traversing up the tree until you see an SOArecord is actually the most correct.

您当前遍历树直到看到SOA记录的方法实际上是最正确的。

Technically, what you're doing there is finding a "zone cut", and in the vast majority of cases that will correspond to the point at which the domain was delegated from its TLD.

从技术上讲,您在那里做的是找到“区域切割”,并且在绝大多数情况下,这将对应于从其 TLD 委派域的时间点。

Any method that relies on mere text parsing of the host name without reference to the DNS is doomed to failure.

任何仅依赖主机名文本解析而不参考 DNS 的方法都注定要失败。

Alternatively, make use of the centrally maintained lists of delegation-centric domains from http://publicsuffix.org/, but beware that these lists can be incomplete and/or out of date.

或者,使用来自http://publicsuffix.org/的集中维护的以委托为中心的域列表,但要注意这些列表可能不完整和/或过时。

See also this questionwhere all of this has been gone over before...

另请参阅此问题,之前所有这些都已讨论过...

回答by Dave Webb

You can use partitioninstead of split:

您可以使用partition代替split

>>> 'www.yahoo.co.jp'.partition('.')[2]
'yahoo.co.jp'

This will help with the parsing but obviously won't check if the returned string is a valid domain.

这将有助于解析,但显然不会检查返回的字符串是否是有效域。

回答by bortzmeyer

Your algorithm is the right one. Since zone cuts are notreflected in the domain name (you see domain cuts - the dots - but not zone cuts), it is the only correct one.

你的算法是正确的。由于区域削减反映在域名中(您会看到域削减 - 点 - 但不是区域削减),因此它是唯一正确的。

An approximatealgorithm is to use a list of zones, like the one mentioned by Alnitak. Remember that these static lists are not authoritative, they lack many registries, they are stale, etc.

一种近似算法是使用区域列表,就像 Alnitak 提到的那样。请记住,这些静态列表不是权威的,它们缺少许多注册表,它们是陈旧的,等等。