您将如何在 Ruby 中解析 url 以获取主域？

Question

提问by Justin Meltzer

I want to be able to parse any url with ruby to get the main part of the domain without the www(just the XXXX.com)

我希望能够使用 ruby 解析任何 url 以获取域的主要部分www（仅 XXXX.com）

Answer 1

回答by Simone Carletti

Please note there is no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain(the policies differ with each registry), the only method is to create a list of all top-level domains and the level at which domains can be registered.

请注意，没有算法方法可以找到可以为特定顶级域注册域的最高级别（每个注册管理机构的政策不同），唯一的方法是创建所有顶级域的列表，并且可以注册域的级别。

This is the reason why the Public Suffix Listexists.

这就是公共后缀列表存在的原因。

I'm the author of PublicSuffix, a Ruby library that decomposes a domain into the different parts.

我是PublicSuffix的作者，这是一个将域分解为不同部分的 Ruby 库。

Here's an example

这是一个例子

require 'uri/http'

uri = URI.parse("http://toolbar.google.com")
domain = PublicSuffix.parse(uri.host)
# => "toolbar.google.com"
domain.domain
# => "google.com"

uri = URI.parse("http://www.google.co.uk")
domain = PublicSuffix.parse(uri.host)
# => "www.google.co.uk"
domain.domain
# => "google.co.uk"

Answer 2

回答by Mischa

This should work with pretty much any URL:

这几乎适用于任何 URL：

# URL always gets parsed twice
def get_host_without_www(url)
  url = "http://#{url}" if URI.parse(url).scheme.nil?
  host = URI.parse(url).host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

Or:

或者：

# Only parses twice if url doesn't start with a scheme
def get_host_without_www(url)
  uri = URI.parse(url)
  uri = URI.parse("http://#{url}") if uri.scheme.nil?
  host = uri.host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

You may have to require 'uri'.

你可能不得不require 'uri'。

Answer 3

回答by nlsrchtr

Just a short note: to overcome the second parsing of the url from Mischas second example, you could make a string comparison instead of URI.parse.

只是一个简短的说明：为了克服 Mischas 第二个示例中 url 的第二次解析，您可以进行字符串比较而不是 URI.parse。

# Only parses once
def get_host_without_www(url)
  url = "http://#{url}" unless url.start_with?('http')
  uri = URI.parse(url)
  host = uri.host.downcase
  host.start_with?('www.') ? host[4..-1] : host
end

The downside of this approach is, that it is limiting the url to http(s) based urls, which is widely the standard. But if you will use it more general (f.e. for ftp links) you have to adjust accordingly.

这种方法的缺点是，它将 url 限制为基于 http(s) 的 url，这是广泛的标准。但是，如果您将使用它更一般（fe 为 ftp 链接），您必须相应地进行调整。

Answer 4

回答by Sam

Addressableis probably the right answer in 2018, especially uses the PublicSuffix gem to parse domains.

Addressable可能是 2018 年的正确答案，尤其是使用 PublicSuffix gem 来解析域。

However, I need to do this kind of parsing in multiple places, from various data sources, and found it a bit verbose to use repeatedly. So I created a wrapper around it, Adomain:

但是，我需要在多个地方，从各种数据源进行这种解析，并且发现重复使用有点冗长。所以我围绕它创建了一个包装器，Adomain：

require 'adomain'

Adomain["https://toolbar.google.com"]
# => "toolbar.google.com"

Adomain["https://www.google.com"]
# => "google.com"

Adomain["stackoverflow.com"]
# => "stackoverflow.com"

I hope this helps others.

我希望这对其他人有帮助。

Answer 5

回答by pguardiario

Here's one that works better with .co.uk and .com.fr - type domains

这是一个更适合 .co.uk 和 .com.fr 类型的域

domain = uri.host[/[^.\s\/]+\.([a-z]{3,}|([a-z]{2}|com)\.[a-z]{2})$/]

Answer 6

回答by Daniel Antonio Nu?ez Carhuayo

Well you can write this method:

那么你可以写这个方法：

require 'URI'
def domain_name(url, arg={:with_dot_principal=>false})
  arg[:with_dot_principal] ? URI(url).hostname.split('.').last(2).join('.') : URI(url).hostname.split('.').last(2).first
end

And using:

并使用：

domain_name("https://www.google.com/?gws_rd=ssl&safe=active&ssui=on")
# => "google"
domain_name("http://google.com", with_dot_principal: true)
# => "google.com"

Answer 7

回答by Tudor Constantin

if the URL is in format http://www.google.com, then you could do something like:

如果 URL 是 format http://www.google.com，那么您可以执行以下操作：

a = 'http://www.google.com'
puts a.split(/\./)[1] + '.' + a.split(/\./)[2]

Or

或者

a =~ /http:\/\/www\.(.*?)$/
puts

您将如何在 Ruby 中解析 url 以获取主域？

提问by Justin Meltzer

回答by Simone Carletti

回答by Mischa

回答by nlsrchtr

回答by Sam

回答by pguardiario

回答by Daniel Antonio Nu?ez Carhuayo

回答by Tudor Constantin

相关推荐

最近更新

标签

您将如何在 Ruby 中解析 url 以获取主域？

提问by Justin Meltzer

回答by Simone Carletti

回答by Mischa

回答by nlsrchtr

回答by Sam

回答by pguardiario

回答by Daniel Antonio Nu?ez Carhuayo

回答by Tudor Constantin

相关推荐

Ruby-on-rails Rails 3：如何在控制器中获取图像路径？

Ruby-on-rails Rails 自定义验证

Ruby-on-rails 带有子域的 Capybara - default_host

ruby regex - 如何匹配所有字符直到字符 -

相关推荐

最近更新

标签