如何使用 Java 测试文件是否“完整”(完全编写)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10029365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 23:20:12  来源:igfitidea点击:

How to test if a file is "complete" (completely written) with Java

java

提问by Zugdud

Let's say you had an external process writing files to some directory, and you had a separate process periodically trying to read files from this directory. The problem to avoid is reading a file that the other process is currently in the middle of writing out, so it would be incomplete. Currently, the process that reads uses a minimum file age timer check, so it ignores all files unless their last modified date is more than XX seconds old.

假设您有一个外部进程将文件写入某个目录,并且您有一个单独的进程定期尝试从该目录读取文件。要避免的问题是读取另一个进程当前正在写出的文件,因此它是不完整的。目前,读取进程使用最小文件年龄计时器检查,因此它会忽略所有文件,除非它们的最后修改日期超过 XX 秒。

I'm wondering if there is a cleaner way to solve this problem. If the filetype is unknown (could be a number of different formats) is there some reliable way to check the file header for the number of bytes that should be in the file, vs the number of bytes currently in the file to confirm they match?

我想知道是否有更清洁的方法来解决这个问题。如果文件类型未知(可能是多种不同的格式),是否有一些可靠的方法可以检查文件头中应包含在文件中的字节数与文件中当前的字节数以确认它们匹配?

Thanks for any thoughts or ideas!

感谢您的任何想法或想法!

采纳答案by Micha? Kosmulski

You could use an external marker file. The writing process could create a file XYZ.lock before it starts creating file XYZ, and delete XYZ.lock after XYZ is completed. The reader would then easily know that it can consider a file complete only if the corresponding .lock file is not present.

您可以使用外部标记文件。写入过程可以在开始创建文件XYZ之前创建一个文件XYZ.lock,并在XYZ完成后删除XYZ.lock。读者随后会很容易地知道,只有当相应的 .lock 文件不存在时,才可以认为文件是完整的。

回答by John Farrelly

The way I've done this in the past is that the process writing the file writes to a "temp" file, and then moves the file to the read location when it has finished writing the file.

我过去这样做的方法是,写入文件的进程写入“临时”文件,然后在完成文件写入后将文件移动到读取位置。

So the writing process would write to info.txt.tmp. When it's finished, it renames the file to info.txt. The reading process then just had to check for the existence of info.txt- and it knows that if it exists, it has been written completely.

所以写入过程会写入info.txt.tmp。完成后,它将文件重命名为info.txt。然后读取过程只需要检查info.txt是否存在- 它知道如果存在,它已经被完全写入。

Alternatively you could have the write process write info.txtto a different directory, and then move it to the read directory if you don't like using weird file extensions.

或者,您可以让写入进程将info.txt写入不同的目录,如果您不喜欢使用奇怪的文件扩展名,然后将其移动到读取目录。

回答by wired00

I had no option of using temp markers etc as the files are being uploaded by clients over keypair SFTP. they can be very large in size.

我无法选择使用临时标记等,因为客户端正在通过密钥对 SFTP 上传文件。它们的尺寸可以非常大。

Its quite hacky but I compare file size before and after sleeping a few seconds.

它非常hacky,但我比较了睡眠几秒钟前后的文件大小。

Its obviously not ideal to lock the thread but in our case it is merely running as a background system processes so seems to work fine

锁定线程显然不理想,但在我们的例子中它只是作为后台系统进程运行,所以似乎工作正常

private boolean isCompletelyWritten(File file) throws InterruptedException{
    Long fileSizeBefore = file.length();
    Thread.sleep(3000);
    Long fileSizeAfter = file.length();

    System.out.println("comparing file size " + fileSizeBefore + " with " + fileSizeAfter);

    if (fileSizeBefore.equals(fileSizeAfter)) {
        return true;
    }
    return false;
}

Note: as mentioned below this might not work on windows. This was used in a Linux environment.

注意:如下所述,这可能不适用于 Windows。这是在 Linux 环境中使用的。

回答by JoshDM

One simple solution I've used in the past for this scenario with Windows is to use boolean File.renameTo(File)and attempt to move the original file to a separate staging folder:

我过去在 Windows 的这种情况下使用的一个简单解决方案是使用boolean File.renameTo(File)并尝试将原始文件移动到单独的暂存文件夹:

boolean success = potentiallyIncompleteFile.renameTo(stagingAreaFile);

If successis false, then the potentiallyIncompleteFileis still being written to.

如果successfalse,则potentiallyIncompleteFile仍在写入。

回答by Taras Melnyk

This possible to do by using Apache Commons IOmaven library FileUtils.copyFile() method. If you try to copy file and get IOException its means that file is not completely saved.

这可以通过使用Apache Commons IOmaven 库 FileUtils.copyFile() 方法来完成。如果您尝试复制文件并获得 IOException,则表示该文件未完全保存。

Example:

例子:

public static void copyAndDeleteFile(File file, String destinationFile) {

    try {
        FileUtils.copyFile(file, new File(fileDirectory));
    } catch (IOException e) {
        e.printStackTrace();
        copyAndDeleteFile(file, fileDirectory, delayThreadPeriod);
    }

Or periodically check with some delay size of folder that contains this file:

或者定期检查包含此文件的文件夹的延迟大小:

FileUtils.sizeOfDirectory(folder);

回答by shem

2 options that seems to solve this issue:

2 个似乎可以解决此问题的选项:

  1. the best option- writer process notify reading process somehow that the writing was finished.
  2. write the file to {id}.tmp, than when finish- rename it to {id}.java, and the reading process run only on *.java files. renaming taking much less time and the chance this 2 process work together decrease.
  1. 最好的选择 - 写入进程以某种方式通知读取进程写入已完成。
  2. 将文件写入 {id}.tmp,然后在完成时将其重命名为 {id}.java,并且读取过程仅在 *.java 文件上运行。重命名花费的时间少得多,这两个过程协同工作的机会也减少了。

回答by Will Hartung

First, there's Why doesn't OS X lock files like windows does when copying to a Samba share?but that's variation of what you're already doing.

首先,当复制到 Samba 共享时为什么 OS X 不像 Windows 那样锁定文件?但这是你已经在做的事情的变化。

As far as reading arbitrary files and looking for sizes, some files have that information, some do not, but even those that do do not have any common way of representing it. You would need specific information of each format, and manage them each independently.

至于读取任意文件和查找大小,有些文件具有该信息,有些则没有,但即使是那些没有任何通用表示方式的文件。您需要每种格式的特定信息,并单独管理它们。

If you absolutely must act on the file the "instant" it's done, then your writing process would need to send some kind of notification. Otherwise, you're pretty much stuck polling the files, and reading the directory is quite cheap in terms of I/O compared to reading random blocks from random files.

如果您绝对必须在文件完成的“即时”上对其进行操作,那么您的写作过程将需要发送某种通知。否则,您几乎无询文件,并且与从随机文件中读取随机块相比,读取目录在 I/O 方面非常便宜。

回答by Mert Akcakaya

Even the number of bytes are equal, the content of the file may be different.

即使字节数相等,文件的内容也可能不同。

So I think, you have to match the old and the new file byte by byte.

所以我认为,你必须逐字节匹配旧文件和新文件。