Java:观看目录以移动大文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3369383/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java: Watching a directory to move large files
提问by nite
I have been writing a program that watches a directory and when files are created in it, it changes the name and moves them to a new directory. In my first implementation I used Java's Watch Service API which worked fine when I was testing 1kb files. The problem that came up is that in reality the files getting created are anywhere from 50-300mb. When this happened the watcher API would find the file right away but could not move it because it was still being written. I tried putting the watcher in a loop (which generated exceptions until the file could be moved) but this seemed pretty inefficient.
我一直在编写一个程序来监视一个目录,当在其中创建文件时,它会更改名称并将它们移动到一个新目录。在我的第一个实现中,我使用了 Java 的 Watch Service API,当我测试 1kb 文件时它运行良好。出现的问题是,实际上创建的文件大小为 50-300mb。发生这种情况时,观察者 API 会立即找到该文件,但无法移动它,因为它仍在写入中。我尝试将观察者置于循环中(这会产生异常,直到可以移动文件),但这似乎效率很低。
Since that didn't work, I tried up using a timer that checks the folder every 10s and then moves files when it can. This is the method I ended up going for.
由于这不起作用,我尝试使用计时器每 10 秒检查一次文件夹,然后在可以时移动文件。这是我最终采用的方法。
Question: Is there anyway to signal when a file is done being written without doing an exception check or continually comparing the size? I like the idea of using the Watcher API just once for each file instead of continually checking with a timer (and running into exceptions).
问题:无论如何,是否有在不进行异常检查或不断比较大小的情况下完成写入文件的信号?我喜欢对每个文件只使用一次 Watcher API 的想法,而不是不断检查计时器(并遇到异常)。
All responses are greatly appreciated!
非常感谢所有回复!
nt
nt
采纳答案by stacker
Write another file as an indication that the original file is completed. I.g 'fileorg.dat' is growing if done create a file 'fileorg.done' and check only for the 'fileorg.done'.
写另一个文件作为原始文件完成的指示。如果完成创建文件 'fileorg.done' 并只检查 'fileorg.done',那么 Ig 'fileorg.dat' 正在增长。
With clever naming conventions you should not have problems.
使用巧妙的命名约定,您应该不会有问题。
回答by Jasper Krijgsman
I ran into the same problem today. I my usecase a small delay before the file is actually imported was not a big problem and I still wanted to use the NIO2 API. The solution I choose was to wait until a file has not been modified for 10 seconds before performing any operations on it.
我今天遇到了同样的问题。我的用例在实际导入文件之前有一点延迟并不是什么大问题,我仍然想使用 NIO2 API。我选择的解决方案是等待文件未被修改 10 秒,然后再对其执行任何操作。
The important part of the implementation is as follows. The program waits until the wait time expires or a new event occures. The expiration time is reset every time a file is modified. If a file is deleted before the wait time expires it is removed from the list. I use the poll method with a timeout of the expected expirationtime, that is (lastmodified+waitTime)-currentTime
实现的重要部分如下。程序一直等到等待时间到期或发生新事件。每次修改文件时都会重置过期时间。如果文件在等待时间到期之前被删除,它将从列表中删除。我使用轮询方法和预期到期时间的超时,即 (lastmodified+waitTime)-currentTime
private final Map<Path, Long> expirationTimes = newHashMap();
private Long newFileWait = 10000L;
public void run() {
for(;;) {
//Retrieves and removes next watch key, waiting if none are present.
WatchKey k = watchService.take();
for(;;) {
long currentTime = new DateTime().getMillis();
if(k!=null)
handleWatchEvents(k);
handleExpiredWaitTimes(currentTime);
// If there are no files left stop polling and block on .take()
if(expirationTimes.isEmpty())
break;
long minExpiration = min(expirationTimes.values());
long timeout = minExpiration-currentTime;
logger.debug("timeout: "+timeout);
k = watchService.poll(timeout, TimeUnit.MILLISECONDS);
}
}
}
private void handleExpiredWaitTimes(Long currentTime) {
// Start import for files for which the expirationtime has passed
for(Entry<Path, Long> entry : expirationTimes.entrySet()) {
if(entry.getValue()<=currentTime) {
logger.debug("expired "+entry);
// do something with the file
expirationTimes.remove(entry.getKey());
}
}
}
private void handleWatchEvents(WatchKey k) {
List<WatchEvent<?>> events = k.pollEvents();
for (WatchEvent<?> event : events) {
handleWatchEvent(event, keys.get(k));
}
// reset watch key to allow the key to be reported again by the watch service
k.reset();
}
private void handleWatchEvent(WatchEvent<?> event, Path dir) throws IOException {
Kind<?> kind = event.kind();
WatchEvent<Path> ev = cast(event);
Path name = ev.context();
Path child = dir.resolve(name);
if (kind == ENTRY_MODIFY || kind == ENTRY_CREATE) {
// Update modified time
FileTime lastModified = Attributes.readBasicFileAttributes(child, NOFOLLOW_LINKS).lastModifiedTime();
expirationTimes.put(name, lastModified.toMillis()+newFileWait);
}
if (kind == ENTRY_DELETE) {
expirationTimes.remove(child);
}
}
回答by Sean Patrick Floyd
Two solutions:
两种解决方案:
The first is a slight variation of the answer by stacker:
第一个是stacker 对答案的轻微变化:
Use a unique prefix for incomplete files. Something like myhugefile.zip.incinstead of myhugefile.zip. Rename the files when upload / creation is finished. Exclude .inc files from the watch.
对不完整的文件使用唯一的前缀。类似的东西myhugefile.zip.inc而不是myhugefile.zip. 上传/创建完成后重命名文件。从手表中排除 .inc 文件。
The second is to use a different folder on the same drive to create / upload / write the files and move them to the watched folder once they are ready. Moving should be an atomic action if they are on the same drive (file system dependent, I guess).
第二种是使用同一驱动器上的不同文件夹来创建/上传/写入文件,并在它们准备好后将它们移动到监视文件夹。如果它们在同一个驱动器上(我猜取决于文件系统),移动应该是一个原子操作。
Either way, the clients that create the files will have to do some extra work.
无论哪种方式,创建文件的客户端都必须做一些额外的工作。
回答by user1322265
I know it's an old question but maybe it can help somebody.
我知道这是一个老问题,但也许它可以帮助某人。
I had the same issue, so what I did was the following:
我有同样的问题,所以我做了以下事情:
if (kind == ENTRY_CREATE) {
System.out.println("Creating file: " + child);
boolean isGrowing = false;
Long initialWeight = new Long(0);
Long finalWeight = new Long(0);
do {
initialWeight = child.toFile().length();
Thread.sleep(1000);
finalWeight = child.toFile().length();
isGrowing = initialWeight < finalWeight;
} while(isGrowing);
System.out.println("Finished creating file!");
}
When the file is being created, it will be getting bigger and bigger. So what I did was to compare the weight separated by a second. The app will be in the loop until both weights are the same.
当文件被创建时,它会越来越大。所以我所做的是比较相隔一秒的重量。该应用程序将处于循环中,直到两个权重相同。
回答by Flint O'Brien
Looks like Apache Camel handles the file-not-done-uploading problem by trying to rename the file (java.io.File.renameTo). If the rename fails, no read lock, but keep trying. When the rename succeeds, they rename it back, then proceed with intended processing.
看起来 Apache Camel 通过尝试重命名文件 (java.io.File.renameTo) 来处理文件未完成上传问题。如果重命名失败,则没有读锁,但继续尝试。当重命名成功时,他们将其重命名,然后继续进行预期的处理。
See operations.renameFilebelow. Here are the links to the Apache Camel source: GenericFileRenameExclusiveReadLockStrategy.javaand FileUtil.java
请参阅下面的operations.renameFile。以下是 Apache Camel 源的链接: GenericFileRenameExclusiveReadLockStrategy.java和FileUtil.java
public boolean acquireExclusiveReadLock( ... ) throws Exception {
LOG.trace("Waiting for exclusive read lock to file: {}", file);
// the trick is to try to rename the file, if we can rename then we have exclusive read
// since its a Generic file we cannot use java.nio to get a RW lock
String newName = file.getFileName() + ".camelExclusiveReadLock";
// make a copy as result and change its file name
GenericFile<T> newFile = file.copyFrom(file);
newFile.changeFileName(newName);
StopWatch watch = new StopWatch();
boolean exclusive = false;
while (!exclusive) {
// timeout check
if (timeout > 0) {
long delta = watch.taken();
if (delta > timeout) {
CamelLogger.log(LOG, readLockLoggingLevel,
"Cannot acquire read lock within " + timeout + " millis. Will skip the file: " + file);
// we could not get the lock within the timeout period, so return false
return false;
}
}
exclusive = operations.renameFile(file.getAbsoluteFilePath(), newFile.getAbsoluteFilePath());
if (exclusive) {
LOG.trace("Acquired exclusive read lock to file: {}", file);
// rename it back so we can read it
operations.renameFile(newFile.getAbsoluteFilePath(), file.getAbsoluteFilePath());
} else {
boolean interrupted = sleep();
if (interrupted) {
// we were interrupted while sleeping, we are likely being shutdown so return false
return false;
}
}
}
return true;
}
回答by Felipe Guimaraes
While it's not possible to be notificated by the Watcher Service API when the SO finish copying, all options seems to be 'work around' (including this one!).
虽然在 SO 完成复制时不可能收到 Watcher Service API 的通知,但所有选项似乎都是“变通”(包括这个!)。
As commented above,
如上所述,
1) Moving or copying is not an option on UNIX;
1) 移动或复制不是 UNIX 上的选项;
2) File.canWrite always returns true if you have permission to write, even if the file is still being copied;
2) File.canWrite 如果您有写入权限,则始终返回 true,即使文件仍在被复制;
3) Waits until the a timeout or a new event occurs would be an option, but what if the system is overloaded but the copy was not finished? if the timeout is a big value, the program would wait so long.
3) 等待超时或新事件发生是一种选择,但如果系统过载但复制未完成怎么办?如果超时值很大,程序会等待很长时间。
4) Writing another file to 'flag' that the copy finished is not an option if you are just consuming the file, not creating.
4) 如果您只是使用文件而不是创建文件,则将另一个文件写入“标记”复制完成不是一个选项。
An alternative is to use the code below:
另一种方法是使用以下代码:
boolean locked = true;
while (locked) {
RandomAccessFile raf = null;
try {
raf = new RandomAccessFile(file, "r"); // it will throw FileNotFoundException. It's not needed to use 'rw' because if the file is delete while copying, 'w' option will create an empty file.
raf.seek(file.length()); // just to make sure everything was copied, goes to the last byte
locked = false;
} catch (IOException e) {
locked = file.exists();
if (locked) {
System.out.println("File locked: '" + file.getAbsolutePath() + "'");
Thread.sleep(1000); // waits some time
} else {
System.out.println("File was deleted while copying: '" + file.getAbsolutePath() + "'");
}
} finally {
if (raf!=null) {
raf.close();
}
}
}
回答by Eric B.
Depending on how urgently you need to move the file once it is done being written, you can also check for a stable last-modified timestamp and only move the file one it is quiesced. The amount of time you need it to be stable can be implementation dependent, but I would presume that something with a last-modified timestamp that hasn't changed for 15 secs should be stable enough to be moved.
根据写入完成后您需要移动文件的紧迫程度,您还可以检查稳定的最后修改时间戳,并仅移动文件被停顿的文件。您需要它保持稳定的时间量可能取决于实现,但我认为具有 15 秒未更改的最后修改时间戳的内容应该足够稳定以进行移动。
回答by Ramcis
I had to deal with a similar situation when I implemented a file system watcher to transfer uploaded files. The solution I implemented to solve this problem consists of the following:
当我实现一个文件系统观察器来传输上传的文件时,我不得不处理类似的情况。我为解决此问题而实施的解决方案包括以下内容:
1- First of all, maintain a Map of unprocessed file (As long as the file is still being copied, the file system generates Modify_Event, so you can ignore them if the flag is false).
1-首先,维护一个未处理文件的Map(只要文件还在被复制,文件系统就会产生Modify_Event,所以如果flag为false,你可以忽略它们)。
2- In your fileProcessor, you pickup a file from the list and check if it's locked by the filesystem, if yes, you will get an exception, just catch this exception and put your thread in wait state (i.e 10 seconds) and then retry again till the lock is released. After processing the file, you can either change the flag to true or remove it from the map.
2- 在您的 fileProcessor 中,您从列表中取出一个文件并检查它是否被文件系统锁定,如果是,您将收到一个异常,只需捕获此异常并将您的线程置于等待状态(即 10 秒)然后重试直到锁被释放。处理文件后,您可以将标志更改为 true 或将其从地图中删除。
This solution will be not be efficient if the many versions of the same file are transferred during the wait timeslot.
如果在等待时间段内传输同一文件的多个版本,则此解决方案将不会有效。
Cheers, Ramzi
干杯,拉姆齐
回答by Pawan Kumar
For large file in linux, the files gets copied with a extension of .filepart. You just need to check the extension using commons api and register the ENTRY_CREATE event. I tested this with my .csv files(1GB) and add it worked
对于 linux 中的大文件,文件会以 .filepart 的扩展名复制。您只需要使用 commons api 检查扩展并注册 ENTRY_CREATE 事件。我用我的 .csv 文件(1GB)测试了这个并添加它工作
public void run()
{
try
{
WatchKey key = myWatcher.take();
while (key != null)
{
for (WatchEvent event : key.pollEvents())
{
if (FilenameUtils.isExtension(event.context().toString(), "filepart"))
{
System.out.println("Inside the PartFile " + event.context().toString());
} else
{
System.out.println("Full file Copied " + event.context().toString());
//Do what ever you want to do with this files.
}
}
key.reset();
key = myWatcher.take();
}
} catch (InterruptedException e)
{
e.printStackTrace();
}
}
回答by enigma969
If you don't have control over the write process, log all ENTRY_CREATEDevents and observe if there are patterns.
如果您无法控制写入过程,请记录所有ENTRY_CREATED事件并观察是否存在模式。
In my case, the files are created via WebDav (Apache) and a lot of temporary files are created but also twoENTRY_CREATEDevents are triggered for the same file. The second ENTRY_CREATEDevent indicates that the copy process is complete.
在我的情况下,文件是通过 WebDav (Apache) 创建的,并且创建了许多临时文件,但也会为同一个文件触发两个ENTRY_CREATED事件。第二个ENTRY_CREATED事件指示复制过程已完成。
Here are my example ENTRY_CREATEDevents. The absolute file path is printed (your log may differ, depending on the application that writes the file):
这是我的示例ENTRY_CREATED事件。打印绝对文件路径(您的日志可能会有所不同,具体取决于写入文件的应用程序):
[info] application - /var/www/webdav/.davfs.tmp39dee1 was created
[info] application - /var/www/webdav/document.docx was created
[info] application - /var/www/webdav/.davfs.tmp054fe9 was created
[info] application - /var/www/webdav/document.docx was created
[info] application - /var/www/webdav/.DAV/__db.document.docx was created
As you see, I get two ENTRY_CREATEDevents for document.docx. After the second event I know the file is complete. Temporary files are obviously ignored in my case.
如您所见,我收到ENTRY_CREATED了document.docx 的两个事件。在第二个事件之后,我知道文件已完成。在我的情况下,临时文件显然被忽略了。

