java Java下载一个目录下的所有文件和文件夹
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17101276/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java download all files and folders in a directory
提问by Kyle
I am trying to download all the files from this directory. However, I can only get it to download the url as one file. What can I do? I tried searching for this problem and it was confusing and people were starting to suggest using httpclients instead. Thanks for your help, this is my code so far. It has been suggested that I use an input stream to attain all the files in the directory. Would that then go into an array? I tried the tutorial here http://docs.oracle.com/javase/tutorial/networking/urls/but it didn't help me understand.
我正在尝试从该目录下载所有文件。但是,我只能将 url 作为一个文件下载。我能做什么?我尝试搜索这个问题,但它令人困惑,人们开始建议改用 httpclients。感谢您的帮助,这是我目前的代码。有人建议我使用输入流来获取目录中的所有文件。那会进入一个数组吗?我在这里尝试了教程http://docs.oracle.com/javase/tutorial/networking/urls/但它没有帮助我理解。
//ProgressBar/Install
String URL_LOCATION = "http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/";
String LOCAL_FILE = filelocation.getText() + "\ProfessorPhys\";
try {
java.net.URL url = new URL(URL_LOCATION);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.addRequestProperty("User-Agent", "Mozilla/4.76");
//URLConnection connection = url.openConnection();
BufferedInputStream stream = new BufferedInputStream(connection.getInputStream());
int available = stream.available();
byte b[]= new byte[available];
stream.read(b);
File file = new File(LOCAL_FILE);
OutputStream out = new FileOutputStream(file);
out.write(b);
} catch (Exception e) {
System.err.println(e);
}
I also found this code which will return a List of files to download. Can someone help me combine the two codes?
我还发现此代码将返回要下载的文件列表。有人可以帮我合并这两个代码吗?
public class GetAllFilesInDirectory {
public static void main(String[] args) throws IOException {
File dir = new File("dir");
System.out.println("Getting all files in " + dir.getCanonicalPath() + " including those in subdirectories");
List<File> files = (List<File>) FileUtils.listFiles(dir, TrueFileFilter.INSTANCE, TrueFileFilter.INSTANCE);
for (File file : files) {
System.out.println("file: " + file.getCanonicalPath());
}
}
}
}
采纳答案by MadProgrammer
You need to download the page, which is the directory listing, parse it and then download the inidiviudal files linked in the page...
您需要下载页面,即目录列表,解析它,然后下载页面中链接的独立文件...
You could do something like...
你可以做一些像...
URL url = new URL("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys");
InputStream is = null;
try {
is = url.openStream();
byte[] buffer = new byte[1024];
int bytesRead = -1;
StringBuilder page = new StringBuilder(1024);
while ((bytesRead = is.read(buffer)) != -1) {
page.append(new String(buffer, 0, bytesRead));
}
// Spend the rest of your life using String methods
// to parse the result...
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
is.close();
} catch (Exception e) {
}
}
Or, you can download Jsoupand use it to do all the hard work...
或者,您可以下载Jsoup并使用它来完成所有艰苦的工作......
try {
Document doc = Jsoup.connect("http:www.futureretrogaming.tk/gamefiles/ProfessorPhys").get();
Elements links = doc.getElementsByTag("a");
for (Element link : links) {
System.out.println(link.attr("href") + " - " + link.text());
}
} catch (IOException ex) {
ex.printStackTrace();
}
Which outputted...
哪个输出...
?C=N;O=D - Name
?C=M;O=A - Last modified
?C=S;O=A - Size
?C=D;O=A - Description
/gamefiles/ - Parent Directory
Assembly-CSharp-Editor-firstpass-vs.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.csproj - Assembly-CSharp-Edit..>
Assembly-CSharp-Editor-firstpass.pidb - Assembly-CSharp-Edit..>
Assembly-CSharp-firstpass-vs.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.csproj - Assembly-CSharp-firs..>
Assembly-CSharp-firstpass.pidb - Assembly-CSharp-firs..>
Assembly-CSharp-vs.csproj - Assembly-CSharp-vs.c..>
Assembly-CSharp.csproj - Assembly-CSharp.csproj
Assembly-CSharp.pidb - Assembly-CSharp.pidb
Assembly-UnityScript-Editor-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-Editor-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.pidb - Assembly-UnityScript..>
Assembly-UnityScript-firstpass.unityproj - Assembly-UnityScript..>
Assembly-UnityScript-vs.unityproj - Assembly-UnityScript..>
Assembly-UnityScript.pidb - Assembly-UnityScript..>
Assembly-UnityScript.unityproj - Assembly-UnityScript..>
Assets/ - Assets/
Library/ - Library/
Professor%20Phys-csharp.sln - Professor Phys-cshar..>
Professor%20Phys.exe - Professor Phys.exe
Professor%20Phys.sln - Professor Phys.sln
Professor%20Phys.userprefs - Professor Phys.userp..>
Professor%20Phys_Data/ - Professor Phys_Data/
Script.doc - Script.doc
~$Script.doc - ~$Script.doc
~WRL0392.tmp - ~WRL0392.tmp
~WRL1966.tmp - ~WRL1966.tmp
You would then need to build a new URL for each file and read as you have already done...
然后,您需要为每个文件构建一个新 URL 并按照您已经完成的方式阅读...
For example, the href
for Assembly-CSharp-Edit..>
is Assembly-CSharp-Editor-firstpass-vs.csproj
, which appears to a relative link, so you would need prefix this with http://www.futureretrogaming.tk/gamefiles/ProfessorPhys
to make a new URL
of http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj
例如,href
对于Assembly-CSharp-Edit..>
IS Assembly-CSharp-Editor-firstpass-vs.csproj
,这似乎相对链接,所以你需要的前缀,它用http://www.futureretrogaming.tk/gamefiles/ProfessorPhys
做一个新URL
的http://www.futureretrogaming.tk/gamefiles/ProfessorPhys/Assembly-CSharp-Editor-firstpass-vs.csproj
You would need to do this for each element you want to grab
您需要为要抓取的每个元素执行此操作
回答by gerrytan
Have you considered tool like HTTrack, it can detect presence of anchor tag on HTML and download entire website (limited by tree level). You can also specify filter what files should be downloaded etc
您是否考虑过像HTTrack这样的工具,它可以检测 HTML 上锚标记的存在并下载整个网站(受树级别限制)。您还可以指定过滤器应下载哪些文件等
If this doesn't suit your requirement, you can still use hand written Java program, except the problem is obtaining a list of files in the URL (and all subfolder within). You need to parse the HTML, gather all the anchor tags, and traverse it (which is what HTTrack is doing)
如果这不符合您的要求,您仍然可以使用手写 Java 程序,但问题是获取 URL 中的文件列表(以及其中的所有子文件夹)。你需要解析 HTML,收集所有的锚标签,并遍历它(这就是 HTTrack 正在做的)