使用Java计算目录中的文件数

Question

提问by euphoria83

How do I count the number of files in a directory using Java ? For simplicity, lets assume that the directory doesn't have any sub-directories.

如何使用 Java 计算目录中的文件数？为简单起见，我们假设该目录没有任何子目录。

I know the standard method of :

我知道标准方法：

new File(<directory path>).listFiles().length

But this will effectively go through all the files in the directory, which might take long if the number of files is large. Also, I don't care about the actual files in the directory unless their number is greater than some fixed large number (say 5000).

但这将有效地遍历目录中的所有文件，如果文件数量很大，这可能需要很长时间。另外，我不关心目录中的实际文件，除非它们的数量大于某个固定的大数字（比如 5000）。

I am guessing, but doesn't the directory (or its i-node in case of Unix) store the number of files contained in it? If I could get that number straight away from the file system, it would be much faster. I need to do this check for every HTTP request on a Tomcat server before the back-end starts doing the real processing. Therefore, speed is of paramount importance.

我在猜测，但是目录（或在 Unix 的情况下是它的 i-node）不存储其中包含的文件数吗？如果我可以直接从文件系统中获取该数字，速度会快得多。在后端开始进行真正的处理之前，我需要对 Tomcat 服务器上的每个 HTTP 请求进行此检查。因此，速度至关重要。

I could run a daemon every once in a while to clear the directory. I know that, so please don't give me that solution.

我可以每隔一段时间运行一个守护进程来清除目录。我知道，所以请不要给我那个解决方案。

Answer 1

采纳答案by Marty Lamb

This might not be appropriate for your application, but you could always try a native call (using jni or jna), or exec a platform-specific command and read the output before falling back to list().length. On *nix, you could exec ls -1a | wc -l(note - that's dash-one-a for the first command, and dash-lowercase-L for the second). Not sure what would be right on windows - perhaps just a dirand look for the summary.

这可能不适合您的应用程序，但您始终可以尝试本地调用（使用 jni 或jna），或执行特定于平台的命令并在回退到 list().length 之前读取输出。在 *nix 上，您可以执行ls -1a | wc -l（注意 - 第一个命令是 dash-one-a，第二个命令是 dash-lowercase-L）。不确定在 Windows 上什么是正确的 - 也许只是一个dir并寻找摘要。

Before bothering with something like this I'd strongly recommend you create a directory with a very large number of files and just see if list().length really does take too long. As this bloggersuggests, you may not want to sweat this.

在考虑这样的事情之前，我强烈建议您创建一个包含大量文件的目录，然后看看 list().length 是否真的花费了太长时间。正如这位博主所建议的那样，您可能不想为此烦恼。

I'd probably go with Varkhan's answer myself.

我自己可能会同意 Varkhan 的回答。

Answer 2

回答by Michael Myers

Unfortunately, I believe that is already the best way (although list()is slightly better than listFiles(), since it doesn't construct Fileobjects).

不幸的是，我相信这已经是最好的方法（虽然list()比略好listFiles()，因为它不构造File对象）。

Answer 3

回答by Varkhan

Ah... the rationale for not having a straightforward method in Java to do that is file storage abstraction: some filesystems may not have the number of files in a directory readily available... that count may not even have any meaning at all (see for example distributed, P2P filesystems, fs that store file lists as a linked list, or database-backed filesystems...). So yes,

啊...在 Java 中没有一个简单的方法来做到这一点的基本原理是文件存储抽象：一些文件系统可能没有目录中可用的文件数量......这个数量甚至可能根本没有任何意义（参见例如分布式、P2P 文件系统、将文件列表存储为链接列表的 fs 或数据库支持的文件系统......）。所以是的，

new File(<directory path>).list().length

is probably your best bet.

可能是你最好的选择。

Answer 4

回答by Sebastian Celis

Unfortunately, as mmyers said, File.list() is about as fast as you are going to get using Java. If speed is as important as you say, you may want to consider doing this particular operation using JNI. You can then tailor your code to your particular situation and filesystem.

不幸的是，正如 mmyers 所说， File.list() 与您将要使用 Java 的速度一样快。如果速度如您所说的一样重要，您可能需要考虑使用JNI执行此特定操作。然后，您可以根据您的特定情况和文件系统定制您的代码。

Answer 5

回答by Renaud

If you have directories containing really (>100'000) many files, here is a (non-portable) way to go:

如果您的目录包含真正（> 100'000）个文件，这里有一个（不可移植的）方法：

String directoryPath = "a path";

// -f flag is important, because this way ls does not sort it output,
// which is way faster
String[] params = { "/bin/sh", "-c",
    "ls -f " + directoryPath + " | wc -l" };
Process process = Runtime.getRuntime().exec(params);
BufferedReader reader = new BufferedReader(new InputStreamReader(
    process.getInputStream()));
String fileCount = reader.readLine().trim() - 2; // accounting for .. and .
reader.close();
System.out.println(fileCount);

Answer 6

回答by user2162827

Using sigar should help. Sigarhas native hooks to get the stats

使用 sigar 应该会有所帮助。Sigar有本地钩子来获取统计信息

new Sigar().getDirStat(dir).getTotal()

Answer 7

回答by mateuscb

Since you don't really need the total number, and in fact want to perform an action after a certain number (in your case 5000), you can use java.nio.file.Files.newDirectoryStream. The benefit is that you can exit early instead having to go through the entire directory just to get a count.

由于您实际上并不需要总数，并且实际上想要在某个数字（在您的情况下为 5000）之后执行操作，您可以使用java.nio.file.Files.newDirectoryStream. 好处是您可以提前退出，而不必遍历整个目录才能获得计数。

public boolean isOverMax(){
    Path dir = Paths.get("C:/foo/bar");
    int i = 1;

    try (DirectoryStream<Path> stream = Files.newDirectoryStream(dir)) {
        for (Path p : stream) {
            //larger than max files, exit
            if (++i > MAX_FILES) {
                return true;
            }
        }
    } catch (IOException ex) {
        ex.printStackTrace();
    }

    return false;
}

The interface docfor DirectoryStreamalso has some good examples.

该接口文档的DirectoryStream也有一些很好的例子。

Answer 8

回答by superbob

Since Java 8, you can do that in three lines:

从 Java 8 开始，您可以在三行中完成：

try (Stream<Path> files = Files.list(Paths.get("your/path/here"))) {
    long count = files.count();
}

Regarding the 5000 child nodes and inode aspects:

关于 5000 个子节点和 inode 方面：

This method will iterate over the entries but as Varkhan suggested you probably can't do better besides playing with JNI or direct system commands calls, but even then, you can never be sure these methods don't do the same thing!

此方法将迭代条目，但正如 Varkhan 所建议的，除了使用 JNI 或直接系统命令调用之外，您可能无法做得更好，但即便如此，您也永远无法确定这些方法不会做同样的事情！

However, let's dig into this a little:

但是，让我们深入研究一下：

Looking at JDK8 source, Files.listexposes a streamthat uses an Iterablefrom Files.newDirectoryStreamthat delegates to FileSystemProvider.newDirectoryStream.

纵观JDK8源，Files.list暴露了一个流，它使用Iterable从Files.newDirectoryStream委托给FileSystemProvider.newDirectoryStream。

On UNIX systems (decompiled sun.nio.fs.UnixFileSystemProvider.class), it loads an iterator: A sun.nio.fs.UnixSecureDirectoryStreamis used (with file locks while iterating through the directory).

在 UNIX 系统（反编译sun.nio.fs.UnixFileSystemProvider.class）上，它加载一个迭代器：使用 A sun.nio.fs.UnixSecureDirectoryStream（在遍历目录时使用文件锁）。

So, there is an iterator that will loop through the entries here.

因此，有一个迭代器将遍历此处的条目。

Now, let's look to the counting mechanism.

现在，让我们看看计数机制。

The actual count is performed by the count/sum reducing API exposed by Java 8 streams. In theory, this API can perform parallel operations without much effort (with multihtreading). However the stream is created with parallelism disabled so it's a no go...

实际计数由Java 8 流公开的计数/总和减少 API 执行。理论上，这个 API 可以毫不费力地执行并行操作（使用多线程）。但是，该流是在禁用并行性的情况下创建的，因此这是行不通的...

The good sideof this approach is that it won't load the array in memoryas the entries will be counted by an iterator as they are read by the underlying (Filesystem) API.

这种方法的好处是它不会将数组加载到内存中，因为条目将由迭代器在底层（文件系统）API 读取时进行计数。

Finally, for the information, conceptually in a filesystem, a directory node is not required to hold the numberof the files that it contains, it can justcontain the list of it's child nodes (list of inodes). I'm not an expert on filesystems, but I believe that UNIX filesystems work just like that. So you can't assume there is a way to have this information directly (i.e: there can always be some list of child nodes hidden somewhere).

最后，对于信息，从概念上讲，在文件系统中，目录节点不需要保存它包含的文件数，它可以只包含它的子节点列表（索引节点列表）。我不是文件系统方面的专家，但我相信 UNIX 文件系统就是这样工作的。所以你不能假设有一种方法可以直接获得这些信息（即：总有一些子节点列表隐藏在某处）。

Answer 9

回答by Sergii Povzaniuk

public void shouldGetTotalFilesCount() {
    Integer reduce = of(listRoots()).parallel().map(this::getFilesCount).reduce(0, ((a, b) -> a + b));
}

private int getFilesCount(File directory) {
    File[] files = directory.listFiles();
    return Objects.isNull(files) ? 1 : Stream.of(files)
            .parallel()
            .reduce(0, (Integer acc, File p) -> acc + getFilesCount(p), (a, b) -> a + b);
}

Answer 10

回答by Santhosh Hirekerur

In spring batch I did below

在春季批次中，我在下面做了

private int getFilesCount() throws IOException {
        ResourcePatternResolver resolver = new PathMatchingResourcePatternResolver();
        Resource[] resources = resolver.getResources("file:" + projectFilesFolder + "/**/input/splitFolder/*.csv");
        return resources.length;
    }

使用Java计算目录中的文件数

提问by euphoria83

采纳答案by Marty Lamb

回答by Michael Myers

回答by Varkhan

回答by Sebastian Celis

回答by Renaud

回答by user2162827

回答by mateuscb

回答by superbob

回答by Sergii Povzaniuk

回答by Santhosh Hirekerur

相关推荐

最近更新

标签

使用Java计算目录中的文件数

提问by euphoria83

采纳答案by Marty Lamb

回答by Michael Myers

回答by Varkhan

回答by Sebastian Celis

回答by Renaud

回答by user2162827

回答by mateuscb

回答by superbob

回答by Sergii Povzaniuk

回答by Santhosh Hirekerur

相关推荐

Java Liquibase 为 postgres 创建架构

Java 如何使用 PriorityQueue？

Java中线程的生命周期是什么？

Java 如何确定 AWS Lambda 函数中的当前区域？

相关推荐

最近更新

标签