C# 如何检查 System.Net.WebClient.DownloadData 是否正在下载二进制文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/153451/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 15:51:43  来源:igfitidea点击:

How to check if System.Net.WebClient.DownloadData is downloading a binary file?

提问by

I am trying to use WebClientto download a file from web using a WinForms application. However, I really only want to download HTML file. Any other type I will want to ignore.

我正在尝试使用WebClientWinForms 应用程序从 Web 下载文件。但是,我真的只想下载 HTML 文件。我想忽略的任何其他类型。

I checked the WebResponse.ContentType, but its value is always null.

我检查了WebResponse.ContentType,但它的值始终是null

Anyone have any idea what could be the cause?

任何人都知道可能是什么原因?

采纳答案by Marc Gravell

Given your update, you can do this by changing the .Method in GetWebRequest:

鉴于您的更新,您可以通过更改 GetWebRequest 中的 .Method 来做到这一点:

using System;
using System.Net;
static class Program
{
    static void Main()
    {
        using (MyClient client = new MyClient())
        {
            client.HeadOnly = true;
            string uri = "http://www.google.com";
            byte[] body = client.DownloadData(uri); // note should be 0-length
            string type = client.ResponseHeaders["content-type"];
            client.HeadOnly = false;
            // check 'tis not binary... we'll use text/, but could
            // check for text/html
            if (type.StartsWith(@"text/"))
            {
                string text = client.DownloadString(uri);
                Console.WriteLine(text);
            }
        }
    }

}

class MyClient : WebClient
{
    public bool HeadOnly { get; set; }
    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest req = base.GetWebRequest(address);
        if (HeadOnly && req.Method == "GET")
        {
            req.Method = "HEAD";
        }
        return req;
    }
}

Alternatively, you can check the header when overriding GetWebRespons(), perhaps throwing an exception if it isn't what you wanted:

或者,您可以在覆盖 GetWebRespons() 时检查标题,如果它不是您想要的,则可能会抛出异常:

protected override WebResponse GetWebResponse(WebRequest request)
{
    WebResponse resp = base.GetWebResponse(request);
    string type = resp.Headers["content-type"];
    // do something with type
    return resp;
}

回答by Marc Gravell

You could issue the first request with the HEAD verb, and check the content-type response header? [edit] It looks like you'll have to use HttpWebRequest for this, though.

您可以使用 HEAD 动词发出第一个请求,并检查内容类型响应标头?[编辑] 不过,看起来您必须为此使用 HttpWebRequest。

回答by Micah

WebResponse is an abstract class and the ContentType property is defined in inheriting classes. For instance in the HttpWebRequest object this method is overloaded to provide the content-type header. I'm not sure what instance of WebResponse the WebClient is using. If you ONLY want HTML files, your best of using the HttpWebRequest object directly.

WebResponse 是一个抽象类,ContentType 属性在继承类中定义。例如,在 HttpWebRequest 对象中,此方法被重载以提供内容类型标头。我不确定 WebClient 正在使用什么 WebResponse 实例。如果您只需要 HTML 文件,最好直接使用 HttpWebRequest 对象。

回答by mdb

Your question is a bit confusing: if you're using an instance of the Net.WebClient class, the Net.WebResponse doesn't enter into the equation (apart from the fact that it's indeed an abstract class, and you'd be using a concrete implementation such as HttpWebResponse, as pointed out in another response).

您的问题有点令人困惑:如果您使用的是 Net.WebClient 类的实例,则 Net.WebResponse 不会进入等式(除了它确实是一个抽象类的事实,并且您将使用一个具体的实现,例如 HttpWebResponse,正如另一个响应中所指出的那样)。

Anyway, when using WebClient, you can achieve what you want by doing something like this:

无论如何,在使用 WebClient 时,您可以通过执行以下操作来实现您想要的:

Dim wc As New Net.WebClient()
Dim LocalFile As String = IO.Path.Combine(Environment.GetEnvironmentVariable("TEMP"), Guid.NewGuid.ToString)
wc.DownloadFile("http://example.com/somefile", LocalFile)
If Not wc.ResponseHeaders("Content-Type") Is Nothing AndAlso wc.ResponseHeaders("Content-Type") <> "text/html" Then
    IO.File.Delete(LocalFile)
Else
    '//Process the file
End If

Note that you do have to check for the existence of the Content-Type header, as the server is not guaranteed to return it (although most modern HTTP servers will always include it). If no Content-Type header is present, you can fall back to another HTML detection method, for example opening the file, reading the first 1K characters or so into a string, and seeing if that contains the substring <html>

请注意,您必须检查 Content-Type 标头是否存在,因为服务器不能保证返回它(尽管大多数现代 HTTP 服务器将始终包含它)。如果没有 Content-Type 标头,您可以回退到另一种 HTML 检测方法,例如打开文件,将前 1K 个字符左右读入一个字符串,然后查看是否包含子字符串 <html>

Also note that this is a bit wasteful, as you'll always transfer the full file, prior to deciding whether you want it or not. To work around that, switching to the Net.HttpWebRequest/Response classes might help, but whether the extra code is worth it depends on your application...

另请注意,这有点浪费,因为在决定是否需要之前,您总是会传输完整文件。要解决这个问题,切换到 Net.HttpWebRequest/Response 类可能会有所帮助,但额外的代码是否值得取决于您的应用程序......

回答by mdb

I apologize for not been very clear. I wrote a wrapper class that extends WebClient. In this wrapper class, I added cookie container and exposed the timeout property for the WebRequest.

我很抱歉不是很清楚。我编写了一个扩展 WebClient 的包装类。在这个包装类中,我添加了 cookie 容器并公开了 WebRequest 的超时属性。

I was using DownloadDataAsync() from this wrapper class and I wasn't able to retrieve content-type from WebResponse of this wrapper class. My main intention is to intercept the response and determine if its of text/html nature. If it isn't, I will abort this request.

我正在使用这个包装类中的 DownloadDataAsync() 并且我无法从这个包装类的 WebResponse 中检索内容类型。我的主要目的是拦截响应并确定其是否具有 text/html 性质。如果不是,我将中止此请求。

I managed to obtain the content-type after overriding WebClient.GetWebResponse(WebRequest, IAsyncResult) method.

在覆盖 WebClient.GetWebResponse(WebRequest, IAsyncResult) 方法后,我设法获得了内容类型。

The following is a sample of my wrapper class:

以下是我的包装类的示例:

public class MyWebClient : WebClient
{
    private CookieContainer _cookieContainer;
    private string _userAgent;
    private int _timeout;
    private WebReponse _response;

    public MyWebClient()
    {
        this._cookieContainer = new CookieContainer();
        this.SetTimeout(60 * 1000);
    }

    public MyWebClient SetTimeout(int timeout)
    {
        this.Timeout = timeout;
        return this;
    }

    public WebResponse Response
    {
        get { return this._response; }
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        WebRequest request = base.GetWebRequest(address);

        if (request.GetType() == typeof(HttpWebRequest))
        {
            ((HttpWebRequest)request).CookieContainer = this._cookieContainer;
            ((HttpWebRequest)request).UserAgent = this._userAgent;
            ((HttpWebRequest)request).Timeout = this._timeout;
        }

        this._request = request;
        return request;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        this._response = base.GetWebResponse(request);
        return this._response;
    }

    protected override WebResponse GetWebResponse(WebRequest request, IAsyncResult result)
    {
        this._response = base.GetWebResponse(request, result);
        return this._response;
    }

    public MyWebClient ServerCertValidation(bool validate)
    {
        if (!validate) ServicePointManager.ServerCertificateValidationCallback += delegate(object sender, X509Certificate certificate, X509Chain chain, SslPolicyErrors sslPolicyErrors) { return true; };
        return this;
    }
}

回答by RandomInsano

I'm not sure the cause, but perhaps you hadn't downloaded anything yet. This is the lazy way to get the content type of a remote file/page (I haven't checked if this is efficient on the wire. For all I know, it may download huge chunks of content)

我不确定原因,但也许你还没有下载任何东西。这是获取远程文件/页面内容类型的懒惰方式(我还没有检查这是否在线上有效。据我所知,它可能会下载大量内容)

        Stream connection = new MemoryStream(""); // Just a placeholder
        WebClient wc = new WebClient();
        string contentType;
        try
        {
            connection = wc.OpenRead(current.Url);
            contentType = wc.ResponseHeaders["content-type"];
        }
        catch (Exception)
        {
            // 404 or what have you
        }
        finally
        {
            connection.Close();
        }

回答by Greg

Here is a method using TCP, which http is built on top of. It will return when connected or after the timeout (milliseconds), so the value may need to be changed depending on your situation

这是一种使用 TCP 的方法,它建立在 http 之上。它将在连接时或超时(毫秒)后返回,因此可能需要根据您的情况更改该值

var result = false;
try {
    using (var socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp)) {
        var asyncResult = socket.BeginConnect(yourUri.AbsoluteUri, 80, null, null);
        result = asyncResult.AsyncWaitHandle.WaitOne(100, true);
        socket.Close();
    }
}
catch { }
return result;