C# 如何以编程方式登录网站以截屏?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/975426/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:39:44  来源:igfitidea点击:

How to programmatically log in to a website to screenscape?

c#formsloginweb-scraping

提问by Tamara Wijsman

I need some information from a website that's not mine, in order to get this information I need to login to the website to gather the information, this happens through a HTML form. How can I do this authenticated screenscaping in C#?

我需要来自非我网站的一些信息,为了获取这些信息,我需要登录该网站以收集信息,这是通过 HTML 表单进行的。如何在 C# 中进行这种经过身份验证的屏幕截图?

Extra information:

额外的信息:

  • Cookie based authentication.
  • POST action needed.
  • 基于 Cookie 的身份验证。
  • 需要 POST 操作。

采纳答案by dlamblin

You'd make the request as though you'd just filled out the form. Assuming it's POST for example, you make a POST request with the correct data. Now if you can't login directly to the same page you want to scrape, you will have to track whatever cookies are set after your login request, and include them in your scraping request to allow you to stay logged in.

您可以像刚刚填写表格一样提出请求。假设它是 POST,例如,您使用正确的数据发出 POST 请求。现在,如果您无法直接登录到要抓取的同一页面,则必须跟踪在您的登录请求之后设置的任何 cookie,并将它们包含在您的抓取请求中以允许您保持登录状态。

It might look like:

它可能看起来像:

HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.KeepAlive = true;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
    postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;
http.CookieContainer = new CookieContainer();
http.CookieContainer.Add(httpResponse.Cookies);
HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;

Maybe.

也许。

回答by BFree

You need to use the HTTPWebRequest and do a POST. This link should help you get started. The key is, you need to look at the HTML Form of the page you're trying to post from to see all the parameters the form needs in order to submit the post.

您需要使用 HTTPWebRequest 并执行 POST。此链接应该可以帮助您入门。关键是,您需要查看您尝试发布的页面的 HTML 表单,以查看表单提交帖子所需的所有参数。

http://www.netomatix.com/httppostdata.aspx

http://www.netomatix.com/httppostdata.aspx

http://geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx

http://geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx

回答by eran

You can use a WebBrowsercontrol. Just feed it the URL of the site, then use the DOM to set the username and password into the right fields, and eventually send a click to the submit button. This way you don't care about anything but the two input fields and the submit button. No cookie handling, no raw HTML parsing, no HTTP sniffing - all that is done by the browser control.

您可以使用WebBrowser控件。只需将站点的 URL 提供给它,然后使用 DOM 将用户名和密码设置到正确的字段中,并最终将点击发送到提交按钮。这样,除了两个输入字段和提交按钮之外,您什么都不关心。没有 cookie 处理,没有原始 HTML 解析,没有 HTTP 嗅探——所有这些都是由浏览器控件完成的。

If you go that way, a few more suggestions:

如果你这样做,还有一些建议:

  1. You can prevent the control from loading add-ins such as Flash - could save you some time.
  2. Once you login, you can obtain whatever information you need from the DOM - no need to parse raw HTML.
  3. If you want to make the tool even more portable in case the site changes in the future, you can replace your explicit DOM manipulation with an injection of JavaScript. The JS can be obtained from an external resource, and once called it can do the fields population and the submit.
  1. 您可以阻止控件加载加载项,例如 Flash - 可以节省您一些时间。
  2. 登录后,您可以从 DOM 获取所需的任何信息 - 无需解析原始 HTML。
  3. 如果您想让该工具在将来站点发生变化时更加便携,您可以使用 JavaScript 注入替换显式 DOM 操作。JS 可以从外部资源中获取,一旦调用它就可以进行字段填充和提交。

回答by Eugeniu Torica

As an addition to dlambin answer It is necessary to have

作为 dlambin 答案的补充 有必要有

http.AllowAutoRedirect=false;

Otherwise

除此以外

HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;

It will make another request to initial url and you won't be able to retrieve url2.

它将向初始 url 发出另一个请求,您将无法检索 url2。

回答by Ppp

For some cases, httpResponse.Cookieswill be blank. Use the CookieContainerinstead.

在某些情况下,httpResponse.Cookies将为空白。使用CookieContainer来代替。

CookieContainer cc = new CookieContainer();

HttpWebRequest http = WebRequest.Create(url) as HttpWebRequest;
http.KeepAlive = true;
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";

http.CookieContainer = cc;

string postData="FormNameForUserId=" + strUserId + "&FormNameForPassword=" + strPassword;
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
http.ContentLength = dataBytes.Length;
using (Stream postStream = http.GetRequestStream())
{
    postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse httpResponse = http.GetResponse() as HttpWebResponse;
// Probably want to inspect the http.Headers here first
http = WebRequest.Create(url2) as HttpWebRequest;

http.CookieContainer = cc;

HttpWebResponse httpResponse2 = http.GetResponse() as HttpWebResponse;