ruby 如何解析 URL 并提取所需的子字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13243195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to parse a URL and extract the required substring
提问by marcamillion
Say I have a string like this: "http://something.example.com/directory/"
假设我有一个这样的字符串: "http://something.example.com/directory/"
What I want to do is to parse this string, and extract the "something"from the string.
我想要做的是解析这个字符串,并"something"从字符串中提取。
The first step, is to obviously check to make sure that the string contains "http://"- otherwise, it should ignore the string.
第一步,显然是检查以确保字符串包含"http://"- 否则,它应该忽略该字符串。
But, how do I then just extract the "something"in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://").
但是,我如何只提取该"something"字符串中的 ?假设这将评估的所有字符串都具有相似的结构(即,我正在尝试提取 URL 的子域 - 如果正在检查的字符串确实是有效的 URL - 其中有效以 开头"http://")。
Thanks.
谢谢。
P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://"but that doesn't solve the full problem because that will produce "http://something.example.com/directory/". All I want is the "something", nothing else.
PS 我知道如何检查第一部分,即我可以简单地将字符串拆分为 ,"http://"但这并不能解决全部问题,因为那样会产生"http://something.example.com/directory/". 我想要的只是"something",别无他物。
回答by the Tin Man
I'd do it this way:
我会这样做:
require 'uri'
uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"
URIis built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIsthen look at Addressable::URI.
URI内置于 Ruby 中。它不是功能最齐全的,但它有足够的能力为大多数 URL 执行此任务。如果您有IRI,请查看Addressable::URI。
回答by oldergod
You could use URI like
您可以使用 URI 之类的
uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"
and you could then just work on the host.
Or there is a gem domainatrixfrom Remove subdomain from string in ruby
然后你就可以在主机上工作了。
或者是有宝石domainatrix从删除的子域从红宝石串
require 'rubygems'
require 'domainatrix'
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
and you could just take the subdomain.
你可以只使用子域。
回答by resilva87
Well, you can use regular expressions.
Something like /http:\/\/([^\.]+)/, that is, the first group of non '.' letters after http.
好吧,您可以使用正则表达式。类似的东西/http:\/\/([^\.]+)/,也就是第一组非'.' 之后的字母http。
Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.
查看http://rubular.com/。您也可以针对一组测试来测试您的正则表达式,这对学习此工具非常有用。

