使用 Java 递归列出目录中的所有文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2534632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:58:36  来源:igfitidea点击:

List all files from a directory recursively with Java

javafile-io

提问by Hultner

I have this function that prints the name of all the files in a directory recursively. The problem is that my code is very slow because it has to access a remote network device with every iteration.

我有这个函数可以递归地打印目录中所有文件的名称。问题是我的代码很慢,因为每次迭代都必须访问远程网络设备。

My plan is to first load all the files from the directory recursively and then after that go through all files with the regex to filter out all the files I don't want. Does anyone have a better suggestion?

我的计划是首先递归加载目录中的所有文件,然后使用正则表达式遍历所有文件以过滤掉我不想要的所有文件。有人有更好的建议吗?

public static printFnames(String sDir){
 ?File[] faFiles = new File(sDir).listFiles();
 ?for(File file: faFiles){
    if(file.getName().matches("^(.*?)")){
 ?    System.out.println(file.getAbsolutePath());
    }
 ?  if(file.isDirectory()){
 ? ?  printFnames(file.getAbsolutePath());
 ?  }
 ?}
}

This is just a test later on I'm not going to use the code like this, instead I'm going to add the path and modification date of every file which matches an advanced regex to an array.

这只是稍后的测试,我不会使用这样的代码,而是将与高级正则表达式匹配的每个文件的路径和修改日期添加到数组。

采纳答案by skaffman

Assuming this is actual production code you'll be writing, then I suggest using the solution to this sort of thing that's already been solved - Apache Commons IO, specifically FileUtils.listFiles(). It handles nested directories, filters (based on name, modification time, etc).

假设这是您将要编写的实际生产代码,那么我建议使用解决方案来解决此类问题 - Apache Commons IO,特别是FileUtils.listFiles(). 它处理嵌套目录、过滤器(基于名称、修改时间等)。

For example, for your regex:

例如,对于您的正则表达式:

Collection files = FileUtils.listFiles(
  dir, 
  new RegexFileFilter("^(.*?)"), 
  DirectoryFileFilter.DIRECTORY
);

This will recursively search for files matching the ^(.*?)regex, returning the results as a collection.

这将递归搜索与^(.*?)正则表达式匹配的文件,将结果作为集合返回。

It's worth noting that this will be no faster than rolling your own code, it's doing the same thing - trawling a filesystem in Java is just slow. The difference is, the Apache Commons version will have no bugs in it.

值得注意的是,这不会比滚动您自己的代码快,它在做同样的事情 - 在 Java 中拖网文件系统只是很慢。不同的是,Apache Commons 版本不会有任何错误。

回答by Michael Borgwardt

it feels like it's stupid access the filesystem and get the contents for every subdirectory instead of getting everything at once.

感觉访问文件系统并获取每个子目录的内容而不是一次获取所有内容是愚蠢的。

Your feeling is wrong. That's how filesystems work. There is no faster way (except when you have to do this repeatedly or for different patterns, you can cache all the file paths in memory, but then you have to deal with cache invalidation i.e. what happens when files are added/removed/renamed while the app runs).

你的感觉是错误的。这就是文件系统的工作方式。没有更快的方法(除非您必须重复执行此操作或针对不同模式执行此操作,您可以将所有文件路径缓存在内存中,但是您必须处理缓存失效,即在添加/删除/重命名文件时会发生什么应用程序运行)。

回答by Kevin Day

Java's interface for reading filesystem folder contents is not very performant (as you've discovered). JDK 7 fixes this with a completely new interface for this sort of thing, which should bring native level performance to these sorts of operations.

Java 的用于读取文件系统文件夹内容的接口的性能不是很好(如您所见)。JDK 7 用一个全新的接口修复了这个问题,它应该为这些类型的操作带来本机级别的性能。

The core issue is that Java makes a native system call for every single file. On a low latency interface, this is not that big of a deal - but on a network with even moderate latency, it really adds up. If you profile your algorithm above, you'll find that the bulk of the time is spent in the pesky isDirectory() call - that's because you are incurring a round trip for every single call to isDirectory(). Most modern OSes can provide this sort of information when the list of files/folders was originally requested (as opposed to querying each individual file path for it's properties).

核心问题是 Java 对每个文件都进行本地系统调用。在低延迟接口上,这没什么大不了的——但在延迟适中的网络上,它确实加起来了。如果您在上面分析您的算法,您会发现大部分时间都花在讨厌的 isDirectory() 调用上——这是因为每次调用 isDirectory() 都会导致往返。大多数现代操作系统可以在最初请求文件/文件夹列表时提供此类信息(而不是查询每个单独的文件路径的属性)。

If you can't wait for JDK7, one strategy for addressing this latency is to go multi-threaded and use an ExecutorService with a maximum # of threads to perform your recursion. It's not great (you have to deal with locking of your output data structures), but it'll be a heck of a lot faster than doing this single threaded.

如果您迫不及待地等待 JDK7,那么解决此延迟的一种策略是采用多线程并使用具有最大线程数的 ExecutorService 来执行递归。这不是很好(你必须处理输出数据结构的锁定),但它会比做这个单线程快很多。

In all of your discussions about this sort of thing, I highly recommend that you compare against the best you could do using native code (or even a command line script that does roughly the same thing). Saying that it takes an hour to traverse a network structure doesn't really mean that much. Telling us that you can do it native in 7 second, but it takes an hour in Java will get people's attention.

在关于此类事情的所有讨论中,我强烈建议您与使用本机代码(甚至是执行大致相同操作的命令行脚本)所能做到的最好的情况进行比较。说遍历一个网络结构需要一个小时并没有多大意义。告诉我们你可以在 7 秒内完成原生,但在 Java 中需要一个小时才会引起人们的注意。

回答by Daniel Ryan

Just so you know isDirectory() is quite a slow method. I'm finding it quite slow in my file browser. I'll be looking into a library to replace it with native code.

只是让您知道 isDirectory() 是一种非常慢的方法。我发现它在我的文件浏览器中很慢。我将研究一个库以用本机代码替换它。

回答by Kiran

The more efficient way I found in dealing with millions of folders and files is to capture directory listing through DOS command in some file and parse it. Once you have parsed data then you can do analysis and compute statistics.

我发现在处理数百万个文件夹和文件时更有效的方法是通过 DOS 命令在某个文件中捕获目录列表并对其进行解析。一旦你解析了数据,你就可以进行分析和计算统计数据。

回答by Vishal Mokal

This Function will probably list all the file name and its path from its directory and its subdirectories.

此函数可能会列出其目录及其子目录中的所有文件名及其路径。

public void listFile(String pathname) {
    File f = new File(pathname);
    File[] listfiles = f.listFiles();
    for (int i = 0; i < listfiles.length; i++) {
        if (listfiles[i].isDirectory()) {
            File[] internalFile = listfiles[i].listFiles();
            for (int j = 0; j < internalFile.length; j++) {
                System.out.println(internalFile[j]);
                if (internalFile[j].isDirectory()) {
                    String name = internalFile[j].getAbsolutePath();
                    listFile(name);
                }

            }
        } else {
            System.out.println(listfiles[i]);
        }

    }

}

回答by RealHowTo

The fast way to get the content of a directory using Java 7 NIO :

使用 Java 7 NIO 获取目录内容的快速方法:

import java.nio.file.DirectoryStream;
import java.nio.file.Files;
import java.nio.file.FileSystems;
import java.nio.file.Path;

...

Path dir = FileSystems.getDefault().getPath( filePath );
DirectoryStream<Path> stream = Files.newDirectoryStream( dir );
for (Path path : stream) {
   System.out.println( path.getFileName() );
}
stream.close();

回答by jboi

With Java 7 a faster way to walk thru a directory tree was introduced with the Pathsand Filesfunctionality. They're much faster then the "old" Fileway.

在 Java 7 中,通过PathsFiles功能引入了一种更快的遍历目录树的方法。它们比“旧”File方式快得多。

This would be the code to walk thru and check path names with a regular expression:

这将是使用正则表达式遍历和检查路径名称的代码:

public final void test() throws IOException, InterruptedException {
    final Path rootDir = Paths.get("path to your directory where the walk starts");

    // Walk thru mainDir directory
    Files.walkFileTree(rootDir, new FileVisitor<Path>() {
        // First (minor) speed up. Compile regular expression pattern only one time.
        private Pattern pattern = Pattern.compile("^(.*?)");

        @Override
        public FileVisitResult preVisitDirectory(Path path,
                BasicFileAttributes atts) throws IOException {

            boolean matches = pattern.matcher(path.toString()).matches();

            // TODO: Put here your business logic when matches equals true/false

            return (matches)? FileVisitResult.CONTINUE:FileVisitResult.SKIP_SUBTREE;
        }

        @Override
        public FileVisitResult visitFile(Path path, BasicFileAttributes mainAtts)
                throws IOException {

            boolean matches = pattern.matcher(path.toString()).matches();

            // TODO: Put here your business logic when matches equals true/false

            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult postVisitDirectory(Path path,
                IOException exc) throws IOException {
            // TODO Auto-generated method stub
            return FileVisitResult.CONTINUE;
        }

        @Override
        public FileVisitResult visitFileFailed(Path path, IOException exc)
                throws IOException {
            exc.printStackTrace();

            // If the root directory has failed it makes no sense to continue
            return path.equals(rootDir)? FileVisitResult.TERMINATE:FileVisitResult.CONTINUE;
        }
    });
}

回答by Dan

This is a very simple recursive method to get all files from a given root.

这是一种非常简单的递归方法,可以从给定的根目录中获取所有文件。

It uses the Java 7 NIO Path class.

它使用 Java 7 NIO Path 类。

private List<String> getFileNames(List<String> fileNames, Path dir) {
    try(DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
        for (Path path : stream) {
            if(path.toFile().isDirectory()) {
                getFileNames(fileNames, path);
            } else {
                fileNames.add(path.toAbsolutePath().toString());
                System.out.println(path.getFileName());
            }
        }
    } catch(IOException e) {
        e.printStackTrace();
    }
    return fileNames;
} 

回答by Prathamesh sawant

this will work just fine ... and its recursive

这将工作得很好......而且它的递归

File root = new File("ROOT PATH");
for ( File file : root.listFiles())
{
    getFilesRecursive(file);
}


private static void getFilesRecursive(File pFile)
{
    for(File files : pFile.listFiles())
    {
        if(files.isDirectory())
        {
            getFilesRecursive(files);
        }
        else
        {
            // do your thing 
            // you can either save in HashMap and use it as
            // per your requirement
        }
    }
}