使用 c# 或 vb.net 获取最终生成的 html 源代码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14847656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get the final generated html source using c# or vb.net
提问by Hello-World
using VB.net or c#, How do I get the generated HTML source?
使用 VB.net 或 c#,如何获取生成的 HTML 源代码?
To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?
要获取页面的 html 源,我可以在下面使用它,但这不会获取生成的源,它不会包含任何由浏览器中的 javascript 动态添加的 html。如何获得最终生成的 HTML 源代码?
thanks
谢谢
WebRequest req = WebRequest.Create("http://www.asp.net");
WebResponse res = req.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string html = sr.ReadToEnd();
if I try this below then it returns the document with out the JavaScript code injected
如果我在下面尝试这个,那么它会返回没有注入 JavaScript 代码的文档
Public Class Form1
Dim WB As WebBrowser = Nothing
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
WB = New WebBrowser()
Me.Controls.Add(WB)
AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted
WB.Navigate("mysite/Default.aspx")
End Sub
Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)
'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
Dim s As String = WB.DocumentText
End Sub
End Class
HTML returned
返回的 HTML
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
<title></title>
</head>
<body>
<form id="form1" runat="server">
<div id="center_text_panel">
//test text this text should be here
</div>
</form>
</body>
</html>
<script type="text/javascript">
document.getElementById("center_text_panel").innerText = "test text";
</script>
回答by Brian Webster
You can use WebKit.NET
您可以使用WebKit.NET
Look herefor official tutorials
看这里官方教程
This can not only grab the source, but also process javascript through the pageload event.
这样不仅可以抓取源码,还可以通过pageload事件处理javascript。
webKitBrowser1.Navigate(MyURL)
Then, handle the DocumentCompleted event, and:
然后,处理 DocumentCompleted 事件,并:
private documentContent = webKitBrowser1.DocumentText
Edit- This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/
编辑- 这可能是更好的开源 WebKit 选项:http: //code.google.com/p/open-webkit-sharp/
回答by KF2
Just put a webbrowsercontrol to your form and you flowing code:
只需webbrowser在您的表单中放置一个控件和您流动的代码:
webBrowser1.Navigate("YourLink");
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText
}
Edited
已编辑
for getting also html code that generated dynamically by java script code you have two way:
要获取由 java 脚本代码动态生成的 html 代码,您有两种方法:
- run flowing code after
webBrowser1_DocumentCompletedEvent
- 在
webBrowser1_DocumentCompleted事件之后运行流动代码
StringBuilder htmlcode = new StringBuilder(); foreach (HtmlElement item in webBrowser1.Document.All) { htmlcode.Append( item.InnerHtml); }
StringBuilder htmlcode = new StringBuilder(); foreach (HtmlElement item in webBrowser1.Document.All) { htmlcode.Append( item.InnerHtml); }
- write a javascript code for returning
document.documentElement.innerHTMLand using InvolkeScript Function To Return Result:
- 编写用于返回
document.documentElement.innerHTML和使用 InvolkeScript 函数返回结果的 javascript 代码:
var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");
var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");
回答by ngochoaitn
You can use this code:
您可以使用此代码:
webBrowser1.Document.Body.OuterHtml

