ruby 如何解析 URL 并提取所需的子字符串

Question

提问by marcamillion

Say I have a string like this: "http://something.example.com/directory/"

假设我有一个这样的字符串： "http://something.example.com/directory/"

What I want to do is to parse this string, and extract the "something"from the string.

我想要做的是解析这个字符串，并"something"从字符串中提取。

The first step, is to obviously check to make sure that the string contains "http://"- otherwise, it should ignore the string.

第一步，显然是检查以确保字符串包含"http://"- 否则，它应该忽略该字符串。

But, how do I then just extract the "something"in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://").

但是，我如何只提取该"something"字符串中的？假设这将评估的所有字符串都具有相似的结构（即，我正在尝试提取 URL 的子域 - 如果正在检查的字符串确实是有效的 URL - 其中有效以开头"http://"）。

Thanks.

谢谢。

P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://"but that doesn't solve the full problem because that will produce "http://something.example.com/directory/". All I want is the "something", nothing else.

PS 我知道如何检查第一部分，即我可以简单地将字符串拆分为，"http://"但这并不能解决全部问题，因为那样会产生"http://something.example.com/directory/". 我想要的只是"something"，别无他物。

Answer 1

回答by the Tin Man

I'd do it this way:

我会这样做：

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URIis built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIsthen look at Addressable::URI.

URI内置于 Ruby 中。它不是功能最齐全的，但它有足够的能力为大多数 URL 执行此任务。如果您有IRI，请查看Addressable::URI。

Answer 2

回答by oldergod

You could use URI like

您可以使用 URI 之类的

uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"

and you could then just work on the host.
Or there is a gem domainatrixfrom Remove subdomain from string in ruby

然后你就可以在主机上工作了。
或者是有宝石domainatrix从删除的子域从红宝石串

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain              # => "pauldix"
url.subdomain           # => "foo.bar"
url.path                # => "/asdf.html?q=arg"
url.canonical           # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

and you could just take the subdomain.

你可以只使用子域。

Answer 3

回答by resilva87

Well, you can use regular expressions. Something like /http:\/\/([^\.]+)/, that is, the first group of non '.' letters after http.

好吧，您可以使用正则表达式。类似的东西/http:\/\/([^\.]+)/，也就是第一组非'.' 之后的字母http。

Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.

查看http://rubular.com/。您也可以针对一组测试来测试您的正则表达式，这对学习此工具非常有用。

ruby 如何解析 URL 并提取所需的子字符串

提问by marcamillion

回答by the Tin Man

回答by oldergod

回答by resilva87

相关推荐

最近更新

标签

ruby 如何解析 URL 并提取所需的子字符串

提问by marcamillion

回答by the Tin Man

回答by oldergod

回答by resilva87

相关推荐

如何使用 mkmf.log 说找不到 libiconv 安装 Nokogiri Ruby gem？

如何将 Eclipse 用于 Ruby on Rails (RoR)

如何从终端运行 Ruby 代码？

如何使用#{variable} 在 Ruby 中格式化带有浮点数的字符串？

相关推荐

最近更新

标签