如何在 Ruby 中获取网页的 HTML 源代码

Question

提问by Eric

In browsers such as Firefox or Safari, with a website open, I can right click the page, and select something like: "View Page Source" or "View Source." This shows the HTML source for the page.

在 Firefox 或 Safari 等浏览器中，打开网站后，我可以右键单击该页面，然后选择诸如“查看页面源代码”或“查看源代码”之类的内容。这显示了页面的 HTML 源代码。

In Ruby, is there a function (maybe a library) that allows me to store this HTML source as a variable? Something like this:

在 Ruby 中，是否有一个函数（可能是一个库）允许我将此 HTML 源代码存储为变量？像这样的东西：

source = view_source(http://stackoverflow.com)

where source would be this text:

来源将是此文本：

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Stack Overflow</title>
etc

Answer 1

回答by robbrit

Use Net::HTTP:

使用Net::HTTP：

require 'net/http'

source = Net::HTTP.get('stackoverflow.com', '/index.html')

Answer 2

回答by Nakilon

require 'open-uri'
source = open(url){|f|f.read}

UPD: more modern syntax

UPD：更现代的语法

require 'open-uri'
source = open(url, &:read)

Answer 3

回答by Matt Rose

require 'open-uri'
source = open(url).read

short, simple, sweet.

简短，简单，甜蜜。

Answer 4

回答by Skilldrick

Yes, like this:

是的，像这样：

require 'open-uri'

open('http://stackoverflow.com') do |file|
    #use the source Eric
    #e.g. file.each_line { |line| puts line }
end

Answer 5

回答by Josh Lee

You could use the builtin Net::HTTP:

您可以使用内置的Net::HTTP：

>> require 'net/http'
>> Net::HTTP.get 'stackoverflow.com', '/'

Or one of the several libraries suggested in "Equivalent of cURL for Ruby?".

或者“相当于 Ruby 的 cURL？”中建议的几个库之一。

Answer 6

回答by Beanish

require 'mechanize'

agent = Mechanize.new
page = agent.get('http://google.com/')

puts page.body

you can then do a lot of other cool stuff with mechanize as well.

然后你也可以用机械化做很多其他很酷的事情。

Answer 7

回答by Topher Fangio

Another thing you might be interested in is Nokogiri. It is an HTML, XML, etc. parser that is very easy to use. Their front page has some example code that should get you started and see if it's right for what you need.

您可能感兴趣的另一件事是Nokogiri。它是一个非常易于使用的 HTML、XML 等解析器。他们的首页有一些示例代码，可以帮助您入门并查看它是否适合您的需要。

Answer 8

回答by Phrogz

If you have cURLinstalled, you could simply:

如果您安装了cURL，您可以简单地：

url = 'http://stackoverflow.com'
html = `curl #{url}`

If you want to use pure Ruby, look at the Net::HTTPlibrary:

如果要使用纯 Ruby，请查看Net::HTTP库：

require 'net/http'
stack = Net::HTTP.new 'stackoverflow.com'
# ...later...
page = '/questions/4217223/how-to-get-the-html-source-of-a-webpage-in-ruby'
html = stack.get(page).body

如何在 Ruby 中获取网页的 HTML 源代码

提问by Eric

回答by robbrit

回答by Nakilon

回答by Matt Rose

回答by Skilldrick

回答by Josh Lee

回答by Beanish

回答by Topher Fangio

回答by Phrogz

相关推荐

最近更新

标签

如何在 Ruby 中获取网页的 HTML 源代码

提问by Eric

回答by robbrit

回答by Nakilon

回答by Matt Rose

回答by Skilldrick

回答by Josh Lee

回答by Beanish

回答by Topher Fangio

回答by Phrogz

相关推荐

Html 单击 td 空间时重定向到 url

Html Django 模板上的 URL 编码

HTML/CSS 字体颜色与跨度样式

Html 如何将 CSS url 设置为绝对位置？

相关推荐

最近更新

标签