Java 中的 File.exists 有多昂贵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6321180/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 15:21:19  来源:igfitidea点击:

How expensive is File.exists in Java

javafile-iodirectoryoperating-systemfilesystems

提问by Franz Kafka

I am wondering how File.exists()works. I'm not very aware of how filesystems work, so I should maybe start reading there first.

我想知道如何File.exists()运作。我不太了解文件系统是如何工作的,所以我应该先从那里开始阅读。

But for a quick pre information:

但是对于快速的预信息:

Is a call to File.exists()a single action for the filesystem, if that path and filename are registered in some journal? Or does the OS get the content of the directory and then scan through it for matches?

File.exists()如果该路径和文件名已在某个日志中注册,是否会调用文件系统的单个操作?或者操作系统是否获取目录的内容,然后扫描它以查找匹配项?

I presume this will be filesystem dependant, but maybe all filesystems use the quick approach?

我认为这将取决于文件系统,但也许所有文件系统都使用快速方法?

I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs :-)

我不是在谈论网络和磁带系统。让我们将其保留为 ntfs、extX、zfs、jfs :-)

采纳答案by Peter Lawrey

How this operation if performed the first time is entirely dependant on the filesystem. This is done by the OS and Java doesn't play any part.

第一次执行此操作的方式完全取决于文件系统。这是由操作系统完成的,Java 不发挥任何作用。

In terms of performance, a read to a disk is required in all cases. This typically takes 8-12 ms. @Sven points out some storage could slower, but this relatively rare in cases where performance is important. You may have an additional delay if this is a network file system (usually relatively small but it depends on your network latency).

在性能方面,在所有情况下都需要读取磁盘。这通常需要 8-12 毫秒。@Sven 指出一些存储可能会变慢,但在性能很重要的情况下这种情况相对较少。如果这是网络文件系统,您可能会有额外的延迟(通常相对较小,但取决于您的网络延迟)。

Everything else the OS and Java does is very short by comparison.

相比之下,OS 和 Java 所做的其他一切都非常简短。

However, if you check the file exists repeatedly, a Disk access may not be required as the information can cached, in this case the time the OS takes and resources. One of the largest of these the objects File.exists() creates (you wouldn't think it would) however it encodes the file's name on every call creating a lot of objects. If you put File.exists() in a tight loop it can create 400MB of garbage per second. :(

但是,如果您重复检查文件是否存在,则可能不需要磁盘访问,因为信息可以缓存,在这种情况下,操作系统需要的时间和资源。File.exists() 创建的这些对象中最大的一个(您不会认为它会),但是它在每次调用时都会对文件的名称进行编码,从而创建大量对象。如果你把 File.exists() 放在一个紧密的循环中,它每秒可以创建 400MB 的垃圾。:(

Journaling filesystems work differently by keeping track of all the changes you make to a file system, however they don't change how you read the filesystem.

通过跟踪您对文件系统所做的所有更改,日志文件系统的工作方式有所不同,但是它们不会改变您读取文件系统的方式。

回答by Costis Aivalis

Measure the necessary time and see yourself. As you say it is absolutely file system dependent.

测量必要的时间,看看你自己。正如你所说,它绝对依赖于文件系统

        long t1 = System.currentTimeMillis();
        ...Your File.exists call
        long t2 = System.currentTimeMillis();
        System.out.println("time: " + (t2 - t1) + " ms");

You will see that it will always give you different results, since it depends also on the way your OS caches data, on its load etc.

你会发现它总是会给你不同的结果,因为它也取决于你的操作系统缓存数据的方式,它的负载等。

回答by Vineet Reynolds

Most of the file-related operations are not performed in Java; native code exists to perform these activities. In reality, most of the work done depends on the nature of the FileSystemobject (that is backing the Fileobject) and the underlying implementation of the native IO operations in the OS.

大多数与文件相关的操作不是在 Java 中执行的;存在执行这些活动的本机代码。实际上,所做的大部分工作取决于FileSystem对象的性质(即支持File对象)和操作系统中原生 IO 操作的底层实现。

I'll present the case of the implementation in OpenJDK 6, for clarity. The File.exists() implementation defers the actual checks to the FileSystem class:

为清楚起见,我将介绍在 OpenJDK 6 中实现的案例。File.exists() 实现将实际检查推迟到 FileSystem 类:

public boolean exists() {
    ... calls to SecurityManager have been omitted for brevity ...
    return ((fs.getBooleanAttributes(this) & FileSystem.BA_EXISTS) != 0);
}

The FileSystem class is abstract, and an implementation exists for all supported filesystems:

FileSystem 类是抽象的,所有支持的文件系统都有一个实现:

package java.io;


/**
 * Package-private abstract class for the local filesystem abstraction.
 */

abstract class FileSystem

Notice the package private nature. A Java Runtime Environment, will provide concrete classes that extend the FileSystem class. In the OpenJDK implementation, there are:

注意包的私有性质。Java 运行时环境将提供扩展 FileSystem 类的具体类。在 OpenJDK 实现中,有:

  • java.io.WinNTFileSystem, for NTFS
  • java.io.Win32FileSystem, for FAT32
  • java.io.UnixFileSystem, for *nix filesystems (this is a class with a very broad responsibility).
  • java.io.WinNTFileSystem,用于 NTFS
  • java.io.Win32FileSystem,用于 FAT32
  • java.io.UnixFileSystem,用于 *nix 文件系统(这是一个责任非常广泛的类)。

All of the above classes delegate to native code, for the getBooleanAttributesmethod. This implies that performance is not constrained by the managed (Java) code in this case; the implementation of the file system, and the nature of the native calls being made have a greater bearing on performance.

对于getBooleanAttributes方法,上述所有类都委托给本机代码。这意味着在这种情况下,性能不受托管(Java)代码的限制;文件系统的实现以及所进行的本机调用的性质对性能有更大的影响。

Update #2

更新 #2

Based on the updated question -

基于更新的问题 -

I'm not talking about network and tape systems. Lets keep it to ntfs, extX, zfs, jfs

我不是在谈论网络和磁带系统。让我们将其保留为 ntfs、extX、zfs、jfs

Well, that still doesn't matter. Different operating systems will implement support for different file systems in different ways. For example, NTFS support in Windows will be different from the one in *nix, because the operating system will also have to do it's share of bookkeeping, in addition to communicating with devices via their drivers; not all the work is done in the device.

嗯,那仍然无所谓。不同的操作系统会以不同的方式实现对不同文件系统的支持。例如,Windows 中的 NTFS 支持将与 *nix 中的不同,因为除了通过驱动程序与设备进行通信之外,操作系统还必须进行簿记工作;并非所有工作都在设备中完成。

In Windows, you will almost always find the concept of a file system filter driversthat manages the task of communicating with other file system filter drivers or the file system. This is necessary to support various operations; one example would be the use of filter drivers for anti-virus engines and other software (on-the-fly encryption and compression products) intercepting IO calls.

在 Windows 中,您几乎总是会发现文件系统过滤器驱动程序的概念,它管理与其他文件系统过滤器驱动程序或文件系统通信的任务。这是支持各种操作所必需的;一个例子是将过滤驱动程序用于反病毒引擎和其他软件(即时加密和压缩产品)拦截 IO 调用。

In *nix, you will have the stat(), system call that will perform the necessary activity of reading the inode information for the file descriptor.

在 *nix 中,您将拥有stat()系统调用,它将执行读取文件描述符的 inode 信息的必要活动。

回答by Travis May

It's super fast on any modern machine, my tests show 0.0028 millis (2.8 microseconds) on my 2013 Mac w/SSD

它在任何现代机器上都非常快,我的测试显示在我的 2013 Mac w/SSD 上为 0.0028 毫秒(2.8 微秒)

1,000 files created in 307 millis, 0.0307 millis per file

307 毫秒内创建 1,000 个文件,每个文件 0.0307 毫秒

1,000 .exists() done in 28 millis, 0.0028 millis per file

1,000 .exists() 在 28 毫秒内完成,每个文件 0.0028 毫秒

Here's a test in Groovy (Java)

这是 Groovy (Java) 中的测试

def index() {
    File fileWrite

    long start = System.currentTimeMillis()

    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        fileWrite.write('Some nice text')
    }
    long diff = System.currentTimeMillis() - start
    println "1,000 files created in $diff millis, ${diff/10000.0} millis per file"



    start = System.currentTimeMillis()
    (1..1000).each {
        fileWrite = new File("/tmp/fileSpeedTest/${it}.txt")
        if ( ! fileWrite.exists() )
            throw new Exception("where's the file")
    }
    diff = System.currentTimeMillis() - start
    println "1,000 .exists()   done in  $diff millis, ${diff/10000.0} millis per file"

}