ruby 如何解析 URL 并提取所需的子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13243195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 05:30:40  来源:igfitidea点击:

How to parse a URL and extract the required substring

rubyparsing

提问by marcamillion

Say I have a string like this: "http://something.example.com/directory/"

假设我有一个这样的字符串: "http://something.example.com/directory/"

What I want to do is to parse this string, and extract the "something"from the string.

我想要做的是解析这个字符串,并"something"从字符串中提取。

The first step, is to obviously check to make sure that the string contains "http://"- otherwise, it should ignore the string.

第一步,显然是检查以确保字符串包含"http://"- 否则,它应该忽略该字符串。

But, how do I then just extract the "something"in that string? Assume that all the strings that this will be evaluating will have a similar structure (i.e. I am trying to extract the subdomain of the URL - if the string being examined is indeed a valid URL - where valid is starts with "http://").

但是,我如何只提取该"something"字符串中的 ?假设这将评估的所有字符串都具有相似的结构(即,我正在尝试提取 URL 的子域 - 如果正在检查的字符串确实是有效的 URL - 其中有效以 开头"http://")。

Thanks.

谢谢。

P.S. I know how to check the first part, i.e. I can just simply split the string at the "http://"but that doesn't solve the full problem because that will produce "http://something.example.com/directory/". All I want is the "something", nothing else.

PS 我知道如何检查第一部分,即我可以简单地将字符串拆分为 ,"http://"但这并不能解决全部问题,因为那样会产生"http://something.example.com/directory/". 我想要的只是"something",别无他物。

回答by the Tin Man

I'd do it this way:

我会这样做:

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URIis built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIsthen look at Addressable::URI.

URI内置于 Ruby 中。它不是功能最齐全的,但它有足够的能力为大多数 URL 执行此任务。如果您有IRI,请查看Addressable::URI

回答by oldergod

You could use URI like

您可以使用 URI 之类的

uri = URI.parse("http://something.example.com/directory/")
puts uri.host
# "something.example.com"

and you could then just work on the host.
Or there is a gem domainatrixfrom Remove subdomain from string in ruby

然后你就可以在主机上工作了。
或者是有宝石domainatrix删除的子域从红宝石串

require 'rubygems'
require 'domainatrix'

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix       # => "co.uk"
url.domain              # => "pauldix"
url.subdomain           # => "foo.bar"
url.path                # => "/asdf.html?q=arg"
url.canonical           # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

and you could just take the subdomain.

你可以只使用子域。

回答by resilva87

Well, you can use regular expressions. Something like /http:\/\/([^\.]+)/, that is, the first group of non '.' letters after http.

好吧,您可以使用正则表达式。类似的东西/http:\/\/([^\.]+)/,也就是第一组非'.' 之后的字母http

Check out http://rubular.com/. You can test your regular expressions against a set of tests too, it's great for learning this tool.

查看http://rubular.com/。您也可以针对一组测试来测试您的正则表达式,这对学习此工具非常有用。