使用 c# 或 vb.net 获取最终生成的 html 源代码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14847656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 12:19:23  来源:igfitidea点击:

Get the final generated html source using c# or vb.net

c#vb.net

提问by Hello-World

using VB.net or c#, How do I get the generated HTML source?

使用 VB.net 或 c#,如何获取生成的 HTML 源代码?

To get the html source of a page I can use this below but this wont get the generated source, it won't contain any of the html that was added dynamically by the javascript in the browser. How do I get the the final generated HTML source?

要获取页面的 html 源,我可以在下面使用它,但这不会获取生成的源,它不会包含任何由浏览器中的 javascript 动态添加的 html。如何获得最终生成的 HTML 源代码?

thanks

谢谢

WebRequest req = WebRequest.Create("http://www.asp.net"); 
WebResponse res = req.GetResponse(); 
StreamReader sr = new StreamReader(res.GetResponseStream()); 
string html = sr.ReadToEnd();

if I try this below then it returns the document with out the JavaScript code injected

如果我在下面尝试这个,那么它会返回没有注入 JavaScript 代码的文档

Public Class Form1

    Dim WB As WebBrowser = Nothing

    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load

        WB = New WebBrowser()
        Me.Controls.Add(WB)
        AddHandler WB.DocumentCompleted, AddressOf WebBrowser1_DocumentCompleted


        WB.Navigate("mysite/Default.aspx")

    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs)


        'Dim htmlcode As String = WebBrowser1.Document.Body.OuterHtml()
        Dim s As String = WB.DocumentText

    End Sub
End Class

HTML returned

返回的 HTML

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>

</head>
<body>
    <form id="form1" runat="server">
    <div id="center_text_panel">
    //test text  this text should be here
    </div>
    </form>
</body>
</html>

    <script type="text/javascript">

        document.getElementById("center_text_panel").innerText = "test text";


    </script>

回答by Brian Webster

You can use WebKit.NET

您可以使用WebKit.NET

Look herefor official tutorials

看这里官方教程

This can not only grab the source, but also process javascript through the pageload event.

这样不仅可以抓取源码,还可以通过pageload事件处理javascript。

webKitBrowser1.Navigate(MyURL)

Then, handle the DocumentCompleted event, and:

然后,处理 DocumentCompleted 事件,并:

private documentContent = webKitBrowser1.DocumentText

Edit- This might be the better open source WebKit option: http://code.google.com/p/open-webkit-sharp/

编辑- 这可能是更好的开源 WebKit 选项:http: //code.google.com/p/open-webkit-sharp/

回答by KF2

Just put a webbrowsercontrol to your form and you flowing code:

只需webbrowser在您的表单中放置一个控件和您流动的代码:

 webBrowser1.Navigate("YourLink");

     private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
           string htmlcode= webBrowser1.Document.Body.InnerHtml;//Or Each Filed Or element..//WebBrowser.DocumentText
        }

Edited

已编辑

for getting also html code that generated dynamically by java script code you have two way:

要获取由 java 脚本代码动态生成的 html 代码,您有两种方法:

  1. run flowing code after webBrowser1_DocumentCompletedEvent
  1. webBrowser1_DocumentCompleted事件之后运行流动代码
 StringBuilder htmlcode = new StringBuilder();
            foreach (HtmlElement item in webBrowser1.Document.All)
            {
                htmlcode.Append( item.InnerHtml);
            }
 StringBuilder htmlcode = new StringBuilder();
            foreach (HtmlElement item in webBrowser1.Document.All)
            {
                htmlcode.Append( item.InnerHtml);
            }
  1. write a javascript code for returning document.documentElement.innerHTMLand using InvolkeScript Function To Return Result:
  1. 编写用于返回document.documentElement.innerHTML和使用 InvolkeScript 函数返回结果的 javascript 代码:
   var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");
   var htmlcode = webBrowser1.Document.InvokeScript("javascriptcode");

回答by ngochoaitn

You can use this code:

您可以使用此代码:

webBrowser1.Document.Body.OuterHtml