你如何在 vb.net 中解析 HTML
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/516811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you parse an HTML in vb.net
提问by tooleb
I would like to know if there is a simple way to parse HTML in vb.net. I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net?
我想知道是否有一种简单的方法可以在 vb.net 中解析 HTML。我知道 HTML 不是 XML 的严格子集,但如果可以这样处理它会很好。有什么东西可以让我在 VB.net 中以类似 XML 的方式解析 HTML?
采纳答案by TcKs
I like Html Agility pack- it's very developer friendly, free and source code is available.
我喜欢Html Agility 包- 它对开发人员非常友好,免费且源代码可用。
回答by TripleHelix Tech
'add prog ref too: Microsoft.mshtml
'也添加 prog ref:Microsoft.mshtml
'then on the page:
'然后在页面上:
Imports mshtml
Function parseMyHtml(ByVal htmlToParse$) As String
Dim htmlDocument As IHTMLDocument2 = New HTMLDocumentClass()
htmlDocument.write(htmlToParse)
htmlDocument.close()
Dim allElements As IHTMLElementCollection = htmlDocument.body.all
Dim allInputs As IHTMLElementCollection = allElements.tags("a")
Dim element As IHTMLElement
For Each element In allInputs
element.title = element.innerText
Next
Return htmlDocument.body.innerHTML
End Function
As found here:
在这里发现:
回答by Erx_VB.NExT.Coder
Don't use agility pack, just use mshtml library to access the dom, this is what ie uses and is great for going through HTML elements.
不要使用agility pack,只需使用mshtml 库来访问dom,这就是ie 使用的并且非常适合浏览HTML 元素。
Agility pack is nasty and unnecessarily hackie if you ask me, mshtml is the way to go. Look it up on msdn.
如果你问我,敏捷包是讨厌的,而且是不必要的黑客,mshtml 是要走的路。在 msdn 上查一下。
回答by Yes - that Jake.
If your HTML follows XHTML standards, you can do a lot of the parsing and processing using the System.XML namespace classes.
如果您的 HTML 遵循 XHTML 标准,您可以使用 System.XML 命名空间类进行大量解析和处理。
If, on the other hand, if what you're parsing is what web developers refer to as "tag soup," you'll need a third-party parser like HTML Agility Pack.
另一方面,如果您解析的是 Web 开发人员所说的“标签汤”,那么您将需要第三方解析器,例如HTML Agility Pack。
This may be only a partial solution to your problem if you're trying to figure out how a browser will interpret your HTML as each browser parses tag soup slightly differently.
如果您试图弄清楚浏览器将如何解释您的 HTML,因为每个浏览器解析标签汤的方式略有不同,这可能只是您问题的部分解决方案。
回答by Andrew Hare
Is it well formed? If the HTML is in fact well formed then it can be parsed as XML. If it is tag soup and there are unclosed elements and such I would think you would have to hunt around for a third-party solution.
它形成良好吗?如果 HTML 实际上格式正确,则可以将其解析为 XML。如果它是标签汤并且有未封闭的元素,那么我认为您将不得不四处寻找第三方解决方案。