通过 C# 登录网站
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/930807/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Login to website, via C#
提问by
I'm relatively new to using C#, and have an application that reads parts of the source code on a website. That all works; but the problem is that the page in question requires the user to be logged in to access this source code. What my program needs a way to initially log the user into the website- after that is done, I'll be able to access and read the source code.
我对使用 C# 比较陌生,并且有一个应用程序可以读取网站上的部分源代码。这一切都有效;但问题是有问题的页面需要用户登录才能访问此源代码。我的程序需要一种方法来最初将用户登录到网站 - 完成后,我将能够访问和阅读源代码。
The website that needs to be logged into is: mmoinn.com/index.do?PageModule=UsersLogin
需要登录的网站是:mmoinn.com/index.do?PageModule=UsersLogin
I've searched for the entire day about how to do this and tried examples, but have had no luck.
我已经搜索了一整天关于如何做到这一点并尝试了示例,但没有运气。
Thanks in advance
提前致谢
回答by Matt Brindley
You can continue using WebClient to POST (instead of GET, which is the HTTP verbyou're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.
您可以继续使用 WebClient 进行 POST(而不是 GET,这是您当前与 DownloadString 一起使用的HTTP 动词),但我认为您会发现使用(稍微)较低级别的 WebRequest 和 WebResponse 类更容易。
There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").
这有两个部分 - 第一个是发布登录表单,第二个是恢复“Set-cookie”标头并将其作为“Cookie”与您的 GET 请求一起发送回服务器。从现在开始,服务器将使用此 cookie 来识别您的身份(假设它使用基于 cookie 的身份验证,我相当有信心,因为该页面返回一个包含“PHPSESSID”的 Set-cookie 标头)。
POSTing to the login form
发布到登录表单
Form posts are easy to simulate, it's just a case of formatting your post data as follows:
表单帖子很容易模拟,这只是格式化帖子数据的一种情况,如下所示:
field1=value1&field2=value2
Using WebRequest and code I adapted from Scott Hanselman, here's how you'd POST form data to your login form:
使用 WebRequest 和我改编自Scott Hanselman 的代码,以下是您将表单数据发布到登录表单的方式:
string formUrl = "http://www.mmoinn.com/index.do?PageModule=UsersAction&Action=UsersLogin"; // NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag
string formParams = string.Format("email_address={0}&password={1}", "your email", "your password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];
Here's an example of what you should see in the Set-cookie header for your login form:
以下是您应该在登录表单的 Set-cookie 标头中看到的示例:
PHPSESSID=c4812cffcf2c45e0357a5a93c137642e; path=/; domain=.mmoinn.com,wowmine_referer=directenter; path=/; domain=.mmoinn.com,lang=en; path=/;domain=.mmoinn.com,adt_usertype=other,adt_host=-
GETting the page behind the login form
获取登录表单后面的页面
Now you can perform your GET request to a page that you need to be logged in for.
现在,您可以对需要登录的页面执行 GET 请求。
string pageSource;
string getUrl = "the url of the page behind the login";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
EDIT:
编辑:
If you need to view the results of the first POST, you can recover the HTML it returned with:
如果你需要查看第一次 POST 的结果,你可以恢复它返回的 HTML:
using (StreamReader sr = new StreamReader(resp.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
Place this directly below cookieHeader = resp.Headers["Set-cookie"];
and then inspect the string held in pageSource.
将它直接放在下面cookieHeader = resp.Headers["Set-cookie"];
,然后检查 pageSource 中保存的字符串。
回答by Josh
You can simplify things quite a bit by creating a class that derives from WebClient, overriding its GetWebRequest method and setting a CookieContainer object on it. If you always set the same CookieContainer instance, then cookie management will be handled automatically for you.
您可以通过创建一个派生自 WebClient 的类、覆盖其 GetWebRequest 方法并在其上设置 CookieContainer 对象来大大简化事情。如果您始终设置相同的 CookieContainer 实例,那么 cookie 管理将自动为您处理。
But the only way to get at the HttpWebRequest before it is sent is to inherit from WebClient and override that method.
但是在发送之前获取 HttpWebRequest 的唯一方法是从 WebClient 继承并覆盖该方法。
public class CookieAwareWebClient : WebClient
{
private CookieContainer cookie = new CookieContainer();
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
if (request is HttpWebRequest)
{
(request as HttpWebRequest).CookieContainer = cookie;
}
return request;
}
}
var client = new CookieAwareWebClient();
client.BaseAddress = @"https://www.site.com/any/base/url/";
var loginData = new NameValueCollection();
loginData.Add("login", "YourLogin");
loginData.Add("password", "YourPassword");
client.UploadValues("login.php", "POST", loginData);
//Now you are logged in and can request pages
string htmlSource = client.DownloadString("index.php");
回答by TN.
Sometimes, it may help switching off AllowAutoRedirect
and setting both login POST
and page GET
requests the same user agent.
有时,它可能有助于关闭AllowAutoRedirect
和设置登录POST
和页面GET
请求相同的用户代理。
request.UserAgent = userAgent;
request.AllowAutoRedirect = false;
回答by WhySoSerious
Matthew Brindley, your code worked very good for some website I needed (with login), but I needed to change to HttpWebRequest
and HttpWebResponse
otherwise I get a 404 Bad Requestfrom the remote server. Also I would like to share my workaround using your code, and is that I tried it to login to a website based on moodle, but it didn't work at your step "GETting the page behind the login form" because when successfully POSTingthe login, the Header 'Set-Cookie'
didn't return anything despite other websites does.
Matthew Brindley,您的代码对我需要的某些网站(登录)非常有效,但我需要更改为HttpWebRequest
,HttpWebResponse
否则我会从远程服务器收到404 错误请求。此外,我想用您的代码来分享我的解决办法,而且是我它试图登录到基于Moodle的网站,但它并没有在你的工作一步“获取页面的登录表单后面”,因为当成功POST操作的登录后,'Set-Cookie'
尽管其他网站会返回任何内容,但Header没有返回任何内容。
So I think this where we need to store cookies for next Requests, so I added this.
To the "POSTing to the login form" code block :
所以我认为这是我们需要为下一个请求存储 cookie 的地方,所以我添加了这个。
到“ POSTing to the login form”代码块:
var cookies = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(formUrl);
req.CookieContainer = cookies;
And To the "GETting the page behind the login form" :
和“获取登录表单后面的页面”:
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
getRequest.CookieContainer = new CookieContainer();
getRequest.CookieContainer.Add(resp.Cookies);
getRequest.Headers.Add("Cookie", cookieHeader);
Doing this, lets me Log me inand get the source code of the "page behind login" (website based moodle) I know this is a vague use of the CookieContainer
and HTTPCookies because we may ask first is there a previously set of cookies saved before sending the request to the server. This works without problem anyway, but here's a good info to read about WebRequest
and WebResponse
with sample projects and tutorial:
Retrieving HTTP content in .NET
How to use HttpWebRequest and HttpWebResponse in .NET
这样做,让我登录并获取“登录后页面”(基于网站的moodle)的源代码我知道这是CookieContainer
和 HTTPCookies的模糊用法,因为我们可能会首先问是否有之前保存的一组 cookie向服务器发送请求。无论如何这都没有问题,但这里有一个很好的信息可以阅读WebRequest
并WebResponse
使用示例项目和教程:
在 .NET 中检索 HTTP 内容如何在 .NET
中使用 HttpWebRequest 和 HttpWebResponse