解析本地 HTML 文件

Question

提问by Steven Penny

I can use PowerShell to parse an HTML page

我可以使用 PowerShell 解析 HTML 页面

PS > $foo = Invoke-WebRequest http://example.com

PS > $foo.Links.Count
1

However if I download the page

但是，如果我下载页面

PS > Invoke-WebRequest -OutFile example.htm http://example.com

and then try to parse the downloaded page it gives unexpected result

然后尝试解析下载的页面，它给出了意想不到的结果

PS > $foo = Invoke-WebRequest file://$pwd/example.htm

PS > $foo.Links.Count
0

How can I parse the local downloaded page?

如何解析本地下载的页面？

Answer 1

采纳答案by Steven Penny

You can use the file with a web serverto get around the dumb limitation of Invoke-WebRequest

您可以将文件与 Web 服务器一起使用来绕过 Invoke-WebRequest 的愚蠢限制

PS > $foo = Invoke-WebRequest http://localhost:8080/example.htm

PS > $foo.Links.Count
1

Note this will work even with no connection, example

请注意，即使没有连接，这也可以工作，例如

PS > Invoke-WebRequest http://example.com
Invoke-WebRequest : The remote name could not be resolved: 'example.com'

Answer 2

回答by PeterK

It appears that Invoke-WebRequestloads fileprotocol URIs just fine, but fails to parse them even in PowerShell 4.0 (where it is officially supported).

似乎Invoke-WebRequest加载file协议 URI 很好，但即使在 PowerShell 4.0（官方支持）中也无法解析它们。

An alternative that does not require setting up a website would be to load and parse HTML directly into MSHTML.

另一种不需要设置网站的替代方法是将 HTML 直接加载并解析为 MSHTML。

$html = New-Object -ComObject "HTMLFile";
$source = Get-Content -Path "file.html" -Raw;
$html.IHTMLDocument2_write($source);

$html.links.length;

Note that when I tested this, a single

请注意，当我对此进行测试时，单个

<meta http-equiv="X-UA-Compatible" content="IE=edge" />

header prevented my HTML from parsing and I have no idea why -- the document had similar XHTML-style headers and MSHTML had no issues with those.

header 阻止了我的 HTML 解析，我不知道为什么 - 文档具有类似的 XHTML 样式的标题，而 MSHTML 没有这些问题。

Answer 3

回答by user2235686

Use file-link format

使用文件链接格式

$foo = Invoke-WebRequest "file://<path-to-file>"

Fix my mistake

修正我的错误

If html is valid xml then you can use select-xml:

如果 html 是有效的 xml，那么您可以使用select-xml：

[xml]$html = Get-Content '<path_to_html_file>'
Select-Xml $html -XPath '//a' | foreach {$_.node}

解析本地 HTML 文件

提问by Steven Penny

采纳答案by Steven Penny

回答by PeterK

回答by user2235686

相关推荐

最近更新

标签

解析本地 HTML 文件

提问by Steven Penny

采纳答案by Steven Penny

回答by PeterK

回答by user2235686

相关推荐

HTML 和 CSS：如何创建四个填充 100% 宽度的大小相等的选项卡？

Html 我的 CSS 没有在 Internet Explorer 11 和 Firefox 中加载！仅适用于 Chrome

Html 单个巨大的 .css 文件与多个较小的特定 .css 文件？

Html Bootstrap 分页中的禁用链接

相关推荐

最近更新

标签