C# HttpWebRequest 命令获取目录列表

Question

提问by

I need a short code snippet to get a directory listing from an HTTP server.

我需要一个简短的代码片段来从 HTTP 服务器获取目录列表。

Thanks

谢谢

Answer 1

采纳答案by Jorge Ferreira

A few important considerations before the code:

代码前的几个重要考虑：

The HTTP Server has to be configured to allow directories listing for the directories you want;
Because directory listings are normal HTML pages there is no standard that defines the format of a directory listing;
Due to consideration 2you are in the land where you have to put specific code for each server.

必须配置 HTTP 服务器以允许列出您想要的目录；
因为目录列表是普通的 HTML 页面，所以没有定义目录列表格式的标准。
由于考虑2，您必须为每个服务器放置特定代码。

My choice is to use regular expressions. This allows for rapid parsing and customization. You can get specific regular expressions pattern per site and that way you have a very modular approach. Use an external source for mapping URL to regular expression patterns if you plan to enhance the parsing module with new sites support without changing the source code.

我的选择是使用正则表达式。这允许快速解析和定制。您可以为每个站点获取特定的正则表达式模式，这样您就有了一种非常模块化的方法。如果您计划在不更改源代码的情况下使用新站点支持来增强解析模块，请使用外部源将 URL 映射到正则表达式模式。

Example to print directory listing from http://www.ibiblio.org/pub/

从http://www.ibiblio.org/pub/打印目录列表的示例

namespace Example
{
    using System;
    using System.Net;
    using System.IO;
    using System.Text.RegularExpressions;

    public class MyExample
    {
        public static string GetDirectoryListingRegexForUrl(string url)
        {
            if (url.Equals("http://www.ibiblio.org/pub/"))
            {
                return "<a href=\".*\">(?<name>.*)</a>";
            }
            throw new NotSupportedException();
        }
        public static void Main(String[] args)
        {
            string url = "http://www.ibiblio.org/pub/";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    string html = reader.ReadToEnd();
                    Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                    MatchCollection matches = regex.Matches(html);
                    if (matches.Count > 0)
                    {
                        foreach (Match match in matches)
                        {
                            if (match.Success)
                            {
                                Console.WriteLine(match.Groups["name"]);
                            }
                        }
                    }
                }
            }

            Console.ReadLine();
        }
    }
}

Answer 2

回答by Brian R. Bondy

Basic understanding:

基本理解：

Directory listings are just HTML pages generated by a web server. Each web server generates these HTML pages in its own way because there is no standard way for a web server to list these directories.

目录列表只是由 Web 服务器生成的 HTML 页面。每个 Web 服务器都以自己的方式生成这些 HTML 页面，因为 Web 服务器没有列出这些目录的标准方法。

The best way to get a directory listing, is to simply do an HTTP request to the URL you'd like the directory listing for and to try to parse and extract all of the links from the HTML returned to you.

获取目录列表的最佳方法是简单地向您想要目录列表的 URL 发出 HTTP 请求，并尝试从返回给您的 HTML 中解析和提取所有链接。

To parse the HTML links please try to use the HTML Agility Pack.

要解析 HTML 链接，请尝试使用HTML Agility Pack。

Directory Browsing:

目录浏览：

The web server you'd like to list directories from must have directory browsing turned on to get this HTML representation of the files in its directories. So you can only get the directory listing if the HTTP server wants you to be able to.

您要从中列出目录的 Web 服务器必须打开目录浏览功能，以获取其目录中文件的这种 HTML 表示。因此，只有在 HTTP 服务器希望您能够获得时，您才能获得目录列表。

A quick example of the HTML Agility Pack:

HTML Agility Pack 的快速示例：

HtmlDocument doc = new HtmlDocument();
doc.Load(strURL);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")
{
HtmlAttribute att = link"href";
//do something with att.Value;
}

Cleaner alternative:

更清洁的替代品：

If it is possible in your situation, a cleaner method is to use an intended protocol for directory listings, like the File Transfer Protocol (FTP), SFTP (FTP like over SSH) or FTPS (FTP over SSL).

如果在您的情况下可能，更简洁的方法是使用预期的目录列表协议，例如文件传输协议 (FTP)、SFTP（类似于 SSH 的 FTP）或 FTPS（基于 SSL 的 FTP）。

What if directory browsing is not turned on:

如果未打开目录浏览会怎样：

If the web server does not have directory browsing turned on, then there is no easy way to get the directory listing.

如果 Web 服务器没有打开目录浏览，则没有简单的方法来获取目录列表。

The best you could do in this case is to start at a given URL, follow all HTML links on the same page, and try to build a virtual listing of directories yourself based on the relative paths of the resources on these HTML pages. This will not give you a complete listing of what files are actually on the web server though.

在这种情况下，您可以做的最好的事情是从给定的 URL 开始，跟踪同一页面上的所有 HTML 链接，并尝试根据这些 HTML 页面上资源的相对路径自己构建目录的虚拟列表。但是，这不会为您提供 Web 服务器上实际存在哪些文件的完整列表。

Answer 3

回答by roryf

You can't, unless the particular directory you want has directory listing enabled and no default file (usually index.htm, index.html or default.html but always configurable). Only then will you be presented with a directory listing, which will usually be marked up with HTML and require parsing.

你不能，除非你想要的特定目录启用了目录列表并且没有默认文件（通常是 index.htm、index.html 或 default.html，但总是可配置的）。只有这样，您才会看到一个目录列表，该列表通常用 HTML 标记并需要解析。

Answer 4

回答by Frank Krueger

You can alternatively set the server up for WebDAV.

您也可以为WebDAV设置服务器。

Answer 5

回答by Seyed

Thanks for the great post. for me the pattern below worked better.

谢谢你的好帖子。对我来说，下面的模式效果更好。

<AHREF=\"\S+\">(?<name>\S+)</A>

I also tested it at http://regexhero.net/tester.

我还在http://regexhero.net/tester 上对其进行了测试。

to use it in your C# code, you have to add more backslashes () before any backslash and double quotes in the pattern for i

要在 C# 代码中使用它，您必须在模式中的任何反斜杠和双引号之前添加更多反斜杠 () for i

<AHREF=\\"\S+\">(?<name>\S+)</A>

nstance, in the GetDirectoryListingRegexForUrl method you should use something like this

nstance，在 GetDirectoryListingRegexForUrl 方法中，你应该使用这样的东西

return "< A HREF=\\"\S+\\">(?\S+)";

return "<A HREF=\\"\S+\\">(?\S+)";

Cheers!

干杯!

Answer 6

回答by Avinash patil

i just modified above and found this best

我刚刚修改了上面，发现这个最好

public static class  GetallFilesFromHttp
{
    public static string GetDirectoryListingRegexForUrl(string url)
    {
        if (url.Equals("http://ServerDirPath/"))
        {
            return "\\"([^\"]*)\\""; 
        }
        throw new NotSupportedException();
    }
    public static void ListDiractory()
    {
        string url = "http://ServerDirPath/";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                string html = reader.ReadToEnd();

                Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                MatchCollection matches = regex.Matches(html);
                if (matches.Count > 0)
                {
                    foreach (Match match in matches)
                    {
                        if (match.Success)
                        {
                            Console.WriteLine(match.ToString());
                        }
                    }
                }
            }
            Console.ReadLine();
        }
    }
}

Answer 7

回答by Jake Drew

The following code works well for me when I do not have access to the ftp server:

当我无法访问 ftp 服务器时，以下代码对我很有效：

public static string[] GetFiles(string url)
{
    List<string> files = new List<string>(500);
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    {
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            string html = reader.ReadToEnd();

            Regex regex = new Regex("<a href=\".*\">(?<name>.*)</a>");
            MatchCollection matches = regex.Matches(html);

            if (matches.Count > 0)
            {
                foreach (Match match in matches)
                {
                    if (match.Success)
                    {
                        string[] matchData = match.Groups[0].ToString().Split('\"');
                        files.Add(matchData[1]);
                    }
                }
            }
        }
    }
    return files.ToArray();
}

However, when I do have access to the ftp server, the following code works much faster:

但是，当我确实可以访问 ftp 服务器时，以下代码的运行速度要快得多：

public static string[] getFtpFolderItems(string ftpURL)
{
    FtpWebRequest request = (FtpWebRequest)WebRequest.Create(ftpURL);
    request.Method = WebRequestMethods.Ftp.ListDirectory;

    //You could add Credentials, if needed 
    //request.Credentials = new NetworkCredential("anonymous", "password");

    FtpWebResponse response = (FtpWebResponse)request.GetResponse();

    Stream responseStream = response.GetResponseStream();
    StreamReader reader = new StreamReader(responseStream);

    return reader.ReadToEnd().Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
}

C# HttpWebRequest 命令获取目录列表

提问by

采纳答案by Jorge Ferreira

回答by Brian R. Bondy

回答by roryf

回答by Frank Krueger

回答by Seyed

回答by Avinash patil

回答by Jake Drew

相关推荐

最近更新

标签

C# HttpWebRequest 命令获取目录列表

提问by

采纳答案by Jorge Ferreira

回答by Brian R. Bondy

回答by roryf

回答by Frank Krueger

回答by Seyed

回答by Avinash patil

回答by Jake Drew

相关推荐

C# 一个或多个文件与数据库的主文件不匹配（错误 5173）

C# 将文字“@”与字符串变量一起使用

C# 在 WPF 中为整个应用程序设置外观的推荐方法是什么？

C# SharePoint 错误：“无法导入 Web 部件”

相关推荐

最近更新

标签