java Java下载一个目录下的所有文件和文件夹

Question

提问by Kyle

I am trying to download all the files from this directory. However, I can only get it to download the url as one file. What can I do? I tried searching for this problem and it was confusing and people were starting to suggest using httpclients instead. Thanks for your help, this is my code so far. It has been suggested that I use an input stream to attain all the files in the directory. Would that then go into an array? I tried the tutorial here http://docs.oracle.com/javase/tutorial/networking/urls/but it didn't help me understand.

我正在尝试从该目录下载所有文件。但是，我只能将 url 作为一个文件下载。我能做什么？我尝试搜索这个问题，但它令人困惑，人们开始建议改用 httpclients。感谢您的帮助，这是我目前的代码。有人建议我使用输入流来获取目录中的所有文件。那会进入一个数组吗？我在这里尝试了教程http://docs.oracle.com/javase/tutorial/networking/urls/但它没有帮助我理解。

//ProgressBar/Install
            String URL_LOCATION = "http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/";
            String LOCAL_FILE = filelocation.getText() + "\ProfessorPhys\";
            try {
                java.net.URL url = new URL(URL_LOCATION);
                HttpURLConnection connection = (HttpURLConnection) url.openConnection(); 
                connection.addRequestProperty("User-Agent", "Mozilla/4.76"); 
                //URLConnection connection = url.openConnection();
                BufferedInputStream stream = new BufferedInputStream(connection.getInputStream());
                int available = stream.available();
                byte b[]= new byte[available];
                stream.read(b);
                File file = new File(LOCAL_FILE);
                OutputStream out  = new FileOutputStream(file);
                out.write(b);
            } catch (Exception e) {
                System.err.println(e);
            }

I also found this code which will return a List of files to download. Can someone help me combine the two codes?

我还发现此代码将返回要下载的文件列表。有人可以帮我合并这两个代码吗？

public class GetAllFilesInDirectory {

public static void main(String[] args) throws IOException {

    File dir = new File("dir");

    System.out.println("Getting all files in " + dir.getCanonicalPath() + " including those in subdirectories");
    List<File> files = (List<File>) FileUtils.listFiles(dir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
    for (File file : files) {
        System.out.println("file: " + file.getCanonicalPath());
    }

}

}

Answer 1

采纳答案by MadProgrammer

You need to download the page, which is the directory listing, parse it and then download the inidiviudal files linked in the page...

您需要下载页面，即目录列表，解析它，然后下载页面中链接的独立文件...

You could do something like...

你可以做一些像...

URL url = new URL("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys");
InputStream is = null;
try {
    is = url.openStream();
    byte[] buffer = new byte[1024];
    int bytesRead = -1;
    StringBuilder page = new StringBuilder(1024);
    while ((bytesRead = is.read(buffer)) != -1) {
        page.append(new String(buffer, 0, bytesRead));
    }
    // Spend the rest of your life using String methods
    // to parse the result...
} catch (IOException ex) {
    ex.printStackTrace();
} finally {
    try {
        is.close();
    } catch (Exception e) {
    }
}

Or, you can download Jsoupand use it to do all the hard work...

或者，您可以下载Jsoup并使用它来完成所有艰苦的工作......

try {
    Document doc = Jsoup.connect("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys").get();
    Elements links = doc.getElementsByTag("a");
    for (Element link : links) {
        System.out.println(link.attr("href") + " - " + link.text());
    }
} catch (IOException ex) {
    ex.printStackTrace();
}

Which outputted...

哪个输出...

?C=N;O=D - Name
?C=M;O=A - Last modified
?C=S;O=A - Size
?C=D;O=A - Description
/gamefiles/ - Parent Directory
Assembly-CSharp-Editor-firstpass-vs.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.pidb - Assembly-CSharp-Edit..>
Assembly-CSharp-firstpass-vs.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.pidb - Assembly-CSharp-firs..>
Assembly-CSharp-vs.csproj - Assembly-CSharp-vs.c..>
Assembly-CSharp.csproj - Assembly-CSharp.csproj
Assembly-CSharp.pidb - Assembly-CSharp.pidb
Assembly-UnityScript-Editor-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript.pidb - Assembly-UnityScript..>
Assembly-UnityScript.unityproj - Assembly-UnityScript..>
Assets/ - Assets/
Library/ - Library/
Professor%20Phys-csharp.sln - Professor Phys-cshar..>
Professor%20Phys.exe - Professor Phys.exe
Professor%20Phys.sln - Professor Phys.sln
Professor%20Phys.userprefs - Professor Phys.userp..>
Professor%20Phys_Data/ - Professor Phys_Data/
Script.doc - Script.doc
~$Script.doc - ~$Script.doc
~WRL0392.tmp - ~WRL0392.tmp
~WRL1966.tmp - ~WRL1966.tmp

You would then need to build a new URL for each file and read as you have already done...

然后，您需要为每个文件构建一个新 URL 并按照您已经完成的方式阅读...

For example, the hreffor Assembly-CSharp-Edit..>is Assembly-CSharp-Editor-firstpass-vs.csproj, which appears to a relative link, so you would need prefix this with http://www.futureretrogaming.tk/gamefiles/ProfessorPhysto make a new URLof http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj

例如，href对于Assembly-CSharp-Edit..>IS Assembly-CSharp-Editor-firstpass-vs.csproj，这似乎相对链接，所以你需要的前缀，它用http://www.futureretrogaming.tk/gamefiles/ProfessorPhys做一个新URL的http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj

You would need to do this for each element you want to grab

您需要为要抓取的每个元素执行此操作

Answer 2

回答by gerrytan

Have you considered tool like HTTrack, it can detect presence of anchor tag on HTML and download entire website (limited by tree level). You can also specify filter what files should be downloaded etc

您是否考虑过像HTTrack这样的工具，它可以检测 HTML 上锚标记的存在并下载整个网站（受树级别限制）。您还可以指定过滤器应下载哪些文件等

If this doesn't suit your requirement, you can still use hand written Java program, except the problem is obtaining a list of files in the URL (and all subfolder within). You need to parse the HTML, gather all the anchor tags, and traverse it (which is what HTTrack is doing)

如果这不符合您的要求，您仍然可以使用手写 Java 程序，但问题是获取 URL 中的文件列表（以及其中的所有子文件夹）。你需要解析 HTML，收集所有的锚标签，并遍历它（这就是 HTTrack 正在做的）

java Java下载一个目录下的所有文件和文件夹

提问by Kyle

采纳答案by MadProgrammer

回答by gerrytan

相关推荐

最近更新

标签

java Java下载一个目录下的所有文件和文件夹

提问by Kyle

采纳答案by MadProgrammer

回答by gerrytan

相关推荐

java 如何停用 SSL 验证？

线程“AWT-EventQueue-0”中的 Java、ArrayList 和异常 java.util.ConcurrentModificationException

在 Linux 中将 Java 进程作为服务运行

Java 中的合成字段是什么？

相关推荐

最近更新

标签