Ruby-on-rails Rails:验证链接(URL)的好方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7167895/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 01:50:48  来源:igfitidea点击:

Rails: What's a good way to validate links (URLs)?

ruby-on-railsrubyregexvalidationurl

提问by jay

I was wondering how I would best validate URLs in Rails. I was thinking of using a regular expression, but am not sure if this is the best practice.

我想知道如何最好地验证 Rails 中的 URL。我正在考虑使用正则表达式,但不确定这是否是最佳实践。

And, if I were to use a regex, could someone suggest one to me? I am still new to Regex.

而且,如果我要使用正则表达式,有人可以向我推荐一个吗?我还是 Regex 的新手。

回答by Simone Carletti

Validating an URL is a tricky job. It's also a very broad request.

验证 URL 是一项棘手的工作。这也是一个非常广泛的要求。

What do you want to do, exactly? Do you want to validate the format of the URL, the existence, or what? There are several possibilities, depending on what you want to do.

你到底想做什么?您要验证 URL 的格式、存在性还是什么?有几种可能性,这取决于您想要做什么。

A regular expression can validate the format of the URL. But even a complex regular expression cannot ensure you are dealing with a valid URL.

正则表达式可以验证 URL 的格式。但即使是复杂的正则表达式也不能确保您正在处理有效的 URL。

For instance, if you take a simple regular expression, it will probably reject the following host

例如,如果您采用简单的正则表达式,它可能会拒绝以下主机

http://invalid##host.com

but it will allow

但它会允许

http://invalid-host.foo

that is a valid host, but not a valid domain if you consider the existing TLDs. Indeed, the solution would work if you want to validate the hostname, not the domain because the following one is a valid hostname

这是一个有效的主机,但如果您考虑现有的 TLD,则它不是一个有效的域。实际上,如果您想验证主机名而不是域,该解决方案将起作用,因为以下是有效的主机名

http://host.foo

as well the following one

还有下面的

http://localhost

Now, let me give you some solutions.

现在,让我给你一些解决方案。

If you want to validate a domain, then you need to forget about regular expressions. The best solution available at the moment is the Public Suffix List, a list maintained by Mozilla. I created a Ruby library to parse and validate domains against the Public Suffix List, and it's called PublicSuffix.

如果要验证域,则需要忘记正则表达式。目前可用的最佳解决方案是公共后缀列表,这是一个由 Mozilla 维护的列表。我创建了一个 Ruby 库来根据公共后缀列表解析和验证域,它被称为PublicSuffix

If you want to validate the format of an URI/URL, then you might want to use regular expressions. Instead of searching for one, use the built-in Ruby URI.parsemethod.

如果要验证 URI/URL 的格式,则可能需要使用正则表达式。不要搜索,而是使用内置的 RubyURI.parse方法。

require 'uri'

def valid_url?(uri)
  uri = URI.parse(uri) && !uri.host.nil?
rescue URI::InvalidURIError
  false
end

You can even decide to make it more restrictive. For instance, if you want the URL to be an HTTP/HTTPS URL, then you can make the validation more accurate.

您甚至可以决定使其更具限制性。例如,如果您希望 URL 是 HTTP/HTTPS URL,那么您可以使验证更加准确。

require 'uri'

def valid_url?(url)
  uri = URI.parse(url)
  uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
  false
end

Of course, there are tons of improvements you can apply to this method, including checking for a path or a scheme.

当然,您可以将大量改进应用于此方法,包括检查路径或方案。

Last but not least, you can also package this code into a validator:

最后但并非最不重要的是,您还可以将此代码打包到验证器中:

class HttpUrlValidator < ActiveModel::EachValidator

  def self.compliant?(value)
    uri = URI.parse(value)
    uri.is_a?(URI::HTTP) && !uri.host.nil?
  rescue URI::InvalidURIError
    false
  end

  def validate_each(record, attribute, value)
    unless value.present? && self.class.compliant?(value)
      record.errors.add(attribute, "is not a valid HTTP URL")
    end
  end

end

# in the model
validates :example_attribute, http_url: true

回答by Matteo Collina

I use a one liner inside my models:

我在我的模型中使用了一个衬垫:

validates :url, format: URI::regexp(%w[http https])

validates :url, format: URI::regexp(%w[http https])

I think is good enough and simple to use. Moreover it should be theoretically equivalent to the Simone's method, as it use the very same regexp internally.

我认为足够好且易于使用。此外,它在理论上应该等同于 Simone 的方法,因为它在内部使用完全相同的正则表达式。

回答by jlfenaux

Following Simone's idea, you can easily create you own validator.

按照 Simone 的想法,您可以轻松创建自己的验证器。

class UrlValidator < ActiveModel::EachValidator
  def validate_each(record, attribute, value)
    return if value.blank?
    begin
      uri = URI.parse(value)
      resp = uri.kind_of?(URI::HTTP)
    rescue URI::InvalidURIError
      resp = false
    end
    unless resp == true
      record.errors[attribute] << (options[:message] || "is not an url")
    end
  end
end

and then use

然后使用

validates :url, :presence => true, :url => true

in your model.

在你的模型中。

回答by dolzenko

There is also validate_url gem(which is just a nice wrapper for Addressable::URI.parsesolution).

还有validate_url gem(它只是一个很好的Addressable::URI.parse解决方案包装器)。

Just add

只需添加

gem 'validate_url'

to your Gemfile, and then in models you can

到您的Gemfile,然后在模型中您可以

validates :click_through_url, url: true

回答by Stefan Pettersson

This question is already answered, but what the heck, I propose the solution I'm using.

这个问题已经回答了,但到底是什么,我提出了我正在使用的解决方案。

The regexp works fine with all urls I've met. The setter method is to take care if no protocol is mentioned (let's assume http://).

正则表达式适用于我遇到的所有网址。如果没有提到协议(让我们假设 http://),setter 方法会小心。

And finally, we make a try to fetch the page. Maybe I should accept redirects and not only HTTP 200 OK.

最后,我们尝试获取页面。也许我应该接受重定向,而不仅仅是 HTTP 200 OK。

# app/models/my_model.rb
validates :website, :allow_blank => true, :uri => { :format => /(^$)|(^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$)/ix }

def website= url_str
  unless url_str.blank?
    unless url_str.split(':')[0] == 'http' || url_str.split(':')[0] == 'https'
        url_str = "http://" + url_str
    end
  end  
  write_attribute :website, url_str
end

and...

和...

# app/validators/uri_vaidator.rb
require 'net/http'

# Thanks Ilya! http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/
# Original credits: http://blog.inquirylabs.com/2006/04/13/simple-uri-validation/
# HTTP Codes: http://www.ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTPResponse.html

class UriValidator < ActiveModel::EachValidator
  def validate_each(object, attribute, value)
    raise(ArgumentError, "A regular expression must be supplied as the :format option of the options hash") unless options[:format].nil? or options[:format].is_a?(Regexp)
    configuration = { :message => I18n.t('errors.events.invalid_url'), :format => URI::regexp(%w(http https)) }
    configuration.update(options)

    if value =~ configuration[:format]
      begin # check header response
        case Net::HTTP.get_response(URI.parse(value))
          when Net::HTTPSuccess then true
          else object.errors.add(attribute, configuration[:message]) and false
        end
      rescue # Recover on DNS failures..
        object.errors.add(attribute, configuration[:message]) and false
      end
    else
      object.errors.add(attribute, configuration[:message]) and false
    end
  end
end

回答by Roman Ralovets

You can also try valid_urlgem which allows URLs without the scheme, checks domain zone and ip-hostnames.

您还可以尝试valid_urlgem,它允许没有方案的 URL,检查域区域和 ip-hostnames。

Add it to your Gemfile:

将其添加到您的 Gemfile 中:

gem 'valid_url'

gem 'valid_url'

And then in model:

然后在模型中:

class WebSite < ActiveRecord::Base
  validates :url, :url => true
end

回答by heriberto perez

The solution that worked for me was:

对我有用的解决方案是:

validates_format_of :url, :with => /\A(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w\.-]*)*\/?\Z/i

I did try to use some of the example that you attached but I'm supporting url like so:

我确实尝试使用您附加的一些示例,但我支持这样的 url:

Notice the use of A and Z because if you use ^ and $ you will see this warning security from Rails validators.

请注意 A 和 Z 的使用,因为如果您使用 ^ 和 $,您将看到来自 Rails 验证器的警告安全性。

 Valid ones:
 'www.crowdint.com'
 'crowdint.com'
 'http://crowdint.com'
 'http://www.crowdint.com'

 Invalid ones:
  'http://www.crowdint. com'
  'http://fake'
  'http:fake'

回答by lafeber

Just my 2 cents:

只是我的 2 美分:

before_validation :format_website
validate :website_validator

private

def format_website
  self.website = "http://#{self.website}" unless self.website[/^https?/]
end

def website_validator
  errors[:website] << I18n.t("activerecord.errors.messages.invalid") unless website_valid?
end

def website_valid?
  !!website.match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-=\?]*)*\/?$/)
end

EDIT: changed regex to match parameter urls.

编辑:更改正则表达式以匹配参数 url。

回答by severin

I ran into the same problem lately (I needed to validate urls in a Rails app) but I had to cope with the additional requirement of unicode urls (e.g. http://кц.рф)...

我最近遇到了同样的问题(我需要在 Rails 应用程序中验证 url),但我不得不应对 unicode url 的额外要求(例如http://кц.рф)...

I researched a couple of solutions and came across the following:

我研究了几个解决方案并遇到了以下问题:

  • 第一个也是最建议的事情是使用URI.parse. 查看 Simone Carletti 的答案以了解详细信息。这可以正常工作,但不适用于 unicode url。
  • 我看到的第二种方法是 Ilya Grigorik 的方法:http://www.igvita.com/2006/09/07/validating-url-in-ruby-on-rails/基本上,他试图向网址;如果它有效,它是有效的......
  • 我发现的第三种方法(也是我更喜欢的方法)是一种类似于URI.parse但使用addressablegem 而不是URIstdlib 的方法。这种方法在这里有详细说明:http: //rawsyntax.com/blog/url-validation-in-rails-3-and-ruby-in-general/

回答by JJD

Here is an updated version of the validator posted by David James. It has been published by Benjamin Fleischer. Meanwhile, I pushed an updated fork which can be found here.

这是David James 发布验证器的更新版本。它已由本杰明·弗莱舍 (Benjamin Fleischer) 出版。同时,我推送了一个更新的 fork,可以在这里找到。

require 'addressable/uri'

# Source: http://gist.github.com/bf4/5320847
# Accepts options[:message] and options[:allowed_protocols]
# spec/validators/uri_validator_spec.rb
class UriValidator < ActiveModel::EachValidator

  def validate_each(record, attribute, value)
    uri = parse_uri(value)
    if !uri
      record.errors[attribute] << generic_failure_message
    elsif !allowed_protocols.include?(uri.scheme)
      record.errors[attribute] << "must begin with #{allowed_protocols_humanized}"
    end
  end

private

  def generic_failure_message
    options[:message] || "is an invalid URL"
  end

  def allowed_protocols_humanized
    allowed_protocols.to_sentence(:two_words_connector => ' or ')
  end

  def allowed_protocols
    @allowed_protocols ||= [(options[:allowed_protocols] || ['http', 'https'])].flatten
  end

  def parse_uri(value)
    uri = Addressable::URI.parse(value)
    uri.scheme && uri.host && uri
  rescue URI::InvalidURIError, Addressable::URI::InvalidURIError, TypeError
  end

end

...

...

require 'spec_helper'

# Source: http://gist.github.com/bf4/5320847
# spec/validators/uri_validator_spec.rb
describe UriValidator do
  subject do
    Class.new do
      include ActiveModel::Validations
      attr_accessor :url
      validates :url, uri: true
    end.new
  end

  it "should be valid for a valid http url" do
    subject.url = 'http://www.google.com'
    subject.valid?
    subject.errors.full_messages.should == []
  end

  ['http://google', 'http://.com', 'http://ftp://ftp.google.com', 'http://ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is a invalid http url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.full_messages.should == []
    end
  end

  ['http:/www.google.com','<>hi'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['www.google.com','google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("is an invalid URL")
    end
  end

  ['ftp://ftp.google.com','ssh://google.com'].each do |invalid_url|
    it "#{invalid_url.inspect} is an invalid url" do
      subject.url = invalid_url
      subject.valid?
      subject.errors.should have_key(:url)
      subject.errors[:url].should include("must begin with http or https")
    end
  end
end

Please notice that there are still strange HTTP URIs that are parsed as valid addresses.

请注意,仍有一些奇怪的 HTTP URI 被解析为有效地址。

http://google  
http://.com  
http://ftp://ftp.google.com  
http://ssh://google.com

Here is a issue for the addressablegemwhich covers the examples.

这是涵盖示例addressablegem问题