如何在 Ruby 中获取网页的 HTML 源代码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4217223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to get the HTML source of a webpage in Ruby
提问by Eric
In browsers such as Firefox or Safari, with a website open, I can right click the page, and select something like: "View Page Source" or "View Source." This shows the HTML source for the page.
在 Firefox 或 Safari 等浏览器中,打开网站后,我可以右键单击该页面,然后选择诸如“查看页面源代码”或“查看源代码”之类的内容。这显示了页面的 HTML 源代码。
In Ruby, is there a function (maybe a library) that allows me to store this HTML source as a variable? Something like this:
在 Ruby 中,是否有一个函数(可能是一个库)允许我将此 HTML 源代码存储为变量?像这样的东西:
source = view_source(http://stackoverflow.com)
where source would be this text:
来源将是此文本:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Stack Overflow</title>
etc
回答by robbrit
回答by Nakilon
require 'open-uri'
source = open(url){|f|f.read}
UPD: more modern syntax
UPD:更现代的语法
require 'open-uri'
source = open(url, &:read)
回答by Matt Rose
require 'open-uri'
source = open(url).read
short, simple, sweet.
简短,简单,甜蜜。
回答by Skilldrick
Yes, like this:
是的,像这样:
require 'open-uri'
open('http://stackoverflow.com') do |file|
#use the source Eric
#e.g. file.each_line { |line| puts line }
end
回答by Josh Lee
You could use the builtin Net::HTTP:
您可以使用内置的Net::HTTP:
>> require 'net/http'
>> Net::HTTP.get 'stackoverflow.com', '/'
Or one of the several libraries suggested in "Equivalent of cURL for Ruby?".
或者“相当于 Ruby 的 cURL?”中建议的几个库之一。
回答by Beanish
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://google.com/')
puts page.body
you can then do a lot of other cool stuff with mechanize as well.
然后你也可以用机械化做很多其他很酷的事情。
回答by Topher Fangio
回答by Phrogz
If you have cURLinstalled, you could simply:
如果您安装了cURL,您可以简单地:
url = 'http://stackoverflow.com'
html = `curl #{url}`
If you want to use pure Ruby, look at the Net::HTTP
library:
如果要使用纯 Ruby,请查看Net::HTTP
库:
require 'net/http'
stack = Net::HTTP.new 'stackoverflow.com'
# ...later...
page = '/questions/4217223/how-to-get-the-html-source-of-a-webpage-in-ruby'
html = stack.get(page).body