windows 如何仅检测卷上已删除、更改和创建的文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7421440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 18:04:34  来源:igfitidea点击:

How can I detect only deleted, changed, and created files on a volume?

c++windowsbackupntfs

提问by roymustang86

I need to know if there is an easy way of detecting only the files that were deleted, modified or created on an NTFS volume.

我需要知道是否有一种简单的方法可以仅检测在 NTFS 卷上删除、修改或创建的文件。

I have written a program for offsite backup in C++. After the first backup, I check the archive bit of each file to see if there was any change made, and back up only the files that were changed. Also, it backs up from the VSS snapshot in order to prevent file locks.

我已经用 C++ 编写了一个用于异地备份的程序。第一次备份后,我检查每个文件的存档位以查看是否有任何更改,并且只备份更改的文件。此外,它会从 VSS 快照进行备份以防止文件锁定。

This seems to work fine on most file systems, but for some with lots of files and directories, this process takes too long and often the backup takes more than a day to finish backing up.

这在大多数文件系统上似乎都可以正常工作,但对于一些文件和目录很多的系统来说,这个过程花费的时间太长,而且备份通常需要一天多的时间才能完成备份。

I tried using the change journal to easily detect changes made on an NTFS volume, but the change journal would show a lot of records, most of them relating to small temporary files created and destroyed. Also, I could the file name, file reference number, and the parent file reference number, but I could not get the full file path. The parent file reference number is somehow supposed to give you the parent directory path.

我尝试使用更改日志轻松检测对 NTFS 卷所做的更改,但更改日志会显示大量记录,其中大部分与创建和销毁的小临时文件有关。此外,我可以获取文件名、文件参考号和父文件参考号,但无法获得完整的文件路径。父文件参考号应该以某种方式为您提供父目录路径。

EDIT: This needs to run everyday, so at the beginning of every scan, it should record only the changes that took place since the last scan. Or atleast, there should be a way to say changes since so and so time and date.

编辑:这需要每天运行,所以在每次扫描开始时,它应该只记录自上次扫描以来发生的更改。或者至少,应该有一种方式来说明自某某时间和日期以来的变化。

回答by Harry Johnston

You can enumerate all the files on a volume using FSCTL_ENUM_USN_DATA. This is a fast process (my tests returned better than 6000 records per second even on a very old machine, and 20000+ is more typical) and only includes files that currently exist.

您可以使用 FSCTL_ENUM_USN_DATA 枚举卷上的所有文件。这是一个快速的过程(即使在非常旧的机器上,我的测试每秒返回的记录也超过 6000 条,而 20000+ 更典型)并且只包括当前存在的文件。

The data returned includes the file flags as well as the USNs so you could check for changes whichever way you prefer.

返回的数据包括文件标志和 USN,因此您可以按照自己喜欢的方式检查更改。

You will still need to work out the full path for the files by matching the parent IDs with the file IDs of the directories. One approach would be to use a buffer large enough to hold all the file records simultaneously, and search through the records to find the matching parent for each file you need to back up. For large volumes you would probably need to process the directory records into a more efficient data structure, perhaps a hash table.

您仍然需要通过将父 ID 与目录的文件 ID 匹配来计算文件的完整路径。一种方法是使用足够大的缓冲区同时保存所有文件记录,并搜索记录以查找需要备份的每个文件的匹配父文件。对于大容量,您可能需要将目录记录处理为更有效的数据结构,可能是哈希表。

Alternately, you can read/reread the records for the parent directories as needed. This would be less efficient, but the performance might still be satisfactory depending on how many files are being backed up. Windows does appear to cache the data returned by FSCTL_ENUM_USN_DATA.

或者,您可以根据需要读取/重新读取父目录的记录。这会降低效率,但性能可能仍然令人满意,具体取决于要备份的文件数量。Windows 似乎确实缓存了 FSCTL_ENUM_USN_DATA 返回的数据。

This program searches the C volume for files named test.txt and returns information about any files found, as well as about their parent directories.

该程序在 C 卷中搜索名为 test.txt 的文件,并返回有关找到的所有文件及其父目录的信息。

#include <Windows.h>

#include <stdio.h>

#define BUFFER_SIZE (1024 * 1024)

HANDLE drive;
USN maxusn;

void show_record (USN_RECORD * record)
{
    void * buffer;
    MFT_ENUM_DATA mft_enum_data;
    DWORD bytecount = 1;
    USN_RECORD * parent_record;

    WCHAR * filename;
    WCHAR * filenameend;

    printf("=================================================================\n");
    printf("RecordLength: %u\n", record->RecordLength);
    printf("MajorVersion: %u\n", (DWORD)record->MajorVersion);
    printf("MinorVersion: %u\n", (DWORD)record->MinorVersion);
    printf("FileReferenceNumber: %lu\n", record->FileReferenceNumber);
    printf("ParentFRN: %lu\n", record->ParentFileReferenceNumber);
    printf("USN: %lu\n", record->Usn);
    printf("Timestamp: %lu\n", record->TimeStamp);
    printf("Reason: %u\n", record->Reason);
    printf("SourceInfo: %u\n", record->SourceInfo);
    printf("SecurityId: %u\n", record->SecurityId);
    printf("FileAttributes: %x\n", record->FileAttributes);
    printf("FileNameLength: %u\n", (DWORD)record->FileNameLength);

    filename = (WCHAR *)(((BYTE *)record) + record->FileNameOffset);
    filenameend= (WCHAR *)(((BYTE *)record) + record->FileNameOffset + record->FileNameLength);

    printf("FileName: %.*ls\n", filenameend - filename, filename);

    buffer = VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    if (buffer == NULL)
    {
        printf("VirtualAlloc: %u\n", GetLastError());
        return;
    }

    mft_enum_data.StartFileReferenceNumber = record->ParentFileReferenceNumber;
    mft_enum_data.LowUsn = 0;
    mft_enum_data.HighUsn = maxusn;

    if (!DeviceIoControl(drive, FSCTL_ENUM_USN_DATA, &mft_enum_data, sizeof(mft_enum_data), buffer, BUFFER_SIZE, &bytecount, NULL))
    {
        printf("FSCTL_ENUM_USN_DATA (show_record): %u\n", GetLastError());
        return;
    }

    parent_record = (USN_RECORD *)((USN *)buffer + 1);

    if (parent_record->FileReferenceNumber != record->ParentFileReferenceNumber)
    {
        printf("=================================================================\n");
        printf("Couldn't retrieve FileReferenceNumber %u\n", record->ParentFileReferenceNumber);
        return;
    }

    show_record(parent_record);
}

void check_record(USN_RECORD * record)
{
    WCHAR * filename;
    WCHAR * filenameend;

    filename = (WCHAR *)(((BYTE *)record) + record->FileNameOffset);
    filenameend= (WCHAR *)(((BYTE *)record) + record->FileNameOffset + record->FileNameLength);

    if (filenameend - filename != 8) return;

    if (wcsncmp(filename, L"test.txt", 8) != 0) return;

    show_record(record);
}

int main(int argc, char ** argv)
{
    MFT_ENUM_DATA mft_enum_data;
    DWORD bytecount = 1;
    void * buffer;
    USN_RECORD * record;
    USN_RECORD * recordend;
    USN_JOURNAL_DATA * journal;
    DWORDLONG nextid;
    DWORDLONG filecount = 0;
    DWORD starttick, endtick;

    starttick = GetTickCount();

    printf("Allocating memory.\n");

    buffer = VirtualAlloc(NULL, BUFFER_SIZE, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    if (buffer == NULL)
    {
        printf("VirtualAlloc: %u\n", GetLastError());
        return 0;
    }

    printf("Opening volume.\n");

    drive = CreateFile(L"\\?\c:", GENERIC_READ, FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_ALWAYS, FILE_FLAG_NO_BUFFERING, NULL);

    if (drive == INVALID_HANDLE_VALUE)
    {
        printf("CreateFile: %u\n", GetLastError());
        return 0;
    }

    printf("Calling FSCTL_QUERY_USN_JOURNAL\n");

    if (!DeviceIoControl(drive, FSCTL_QUERY_USN_JOURNAL, NULL, 0, buffer, BUFFER_SIZE, &bytecount, NULL))
    {
        printf("FSCTL_QUERY_USN_JOURNAL: %u\n", GetLastError());
        return 0;
    }

    journal = (USN_JOURNAL_DATA *)buffer;

    printf("UsnJournalID: %lu\n", journal->UsnJournalID);
    printf("FirstUsn: %lu\n", journal->FirstUsn);
    printf("NextUsn: %lu\n", journal->NextUsn);
    printf("LowestValidUsn: %lu\n", journal->LowestValidUsn);
    printf("MaxUsn: %lu\n", journal->MaxUsn);
    printf("MaximumSize: %lu\n", journal->MaximumSize);
    printf("AllocationDelta: %lu\n", journal->AllocationDelta);

    maxusn = journal->MaxUsn;

    mft_enum_data.StartFileReferenceNumber = 0;
    mft_enum_data.LowUsn = 0;
    mft_enum_data.HighUsn = maxusn;

    for (;;)
    {
//      printf("=================================================================\n");
//      printf("Calling FSCTL_ENUM_USN_DATA\n");

        if (!DeviceIoControl(drive, FSCTL_ENUM_USN_DATA, &mft_enum_data, sizeof(mft_enum_data), buffer, BUFFER_SIZE, &bytecount, NULL))
        {
            printf("=================================================================\n");
            printf("FSCTL_ENUM_USN_DATA: %u\n", GetLastError());
            printf("Final ID: %lu\n", nextid);
            printf("File count: %lu\n", filecount);
            endtick = GetTickCount();
            printf("Ticks: %u\n", endtick - starttick);
            return 0;
        }

//      printf("Bytes returned: %u\n", bytecount);

        nextid = *((DWORDLONG *)buffer);
//      printf("Next ID: %lu\n", nextid);

        record = (USN_RECORD *)((USN *)buffer + 1);
        recordend = (USN_RECORD *)(((BYTE *)buffer) + bytecount);

        while (record < recordend)
        {
            filecount++;
            check_record(record);
            record = (USN_RECORD *)(((BYTE *)record) + record->RecordLength);
        }

        mft_enum_data.StartFileReferenceNumber = nextid;
    }
}

Additional notes

补充说明

  • As discussed in the comments, you may need to replace MFT_ENUM_DATAwith MFT_ENUM_DATA_V0on versions of Windows later than Windows 7. (This may also depend on what compiler and SDK you are using.)

  • I'm printing the 64-bit file reference numbers as if they were 32-bit. That was just a mistake on my part. Probably in production code you won't be printing them anyway, but FYI.

  • 如评论中所述,您可能需要在 Windows 7 之后的 Windows 版本上替换MFT_ENUM_DATAMFT_ENUM_DATA_V0。(这也可能取决于您使用的编译器和 SDK。)

  • 我正在打印 64 位文件参考号,就好像它们是 32 位一样。那只是我的一个错误。可能在生产代码中你无论如何都不会打印它们,但仅供参考。

回答by Ben Voigt

The change journal is your best bet. You can use the file reference numbers to match file creation/deletion pairs and thus ignore temporary files, without having to process them any further.

变更日志是您最好的选择。您可以使用文件参考号来匹配文件创建/删除对,从而忽略临时文件,而无需进一步处理它们。

I think you have to scan the Master File Table to make sense of ParentFileReferenceNumber. Of course you only need to keep track of directories when doing this, and use a data structure that will allow you to quickly lookup the information, so you only need to scan the MFT once.

我认为您必须扫描主文件表才能理解 ParentFileReferenceNumber。当然,这样做时您只需要跟踪目录,并使用可以让您快速查找信息的数据结构,因此您只需要扫描一次 MFT。

回答by AJG85

You can use ReadDirectoryChangesand surrounding windows API.

您可以使用ReadDirectoryChanges和周围的 Windows API。

回答by ronnie

I know how to achieve this in java. It will help you if you implement Java code inside C++.

我知道如何在 Java 中实现这一点。如果您在 C++ 中实现 Java 代码,它将对您有所帮助。

In Java you can achieve this using JnotifyAPI.It looks for changes in sub-directory also.

在 Java 中,您可以使用JnotifyAPI来实现这一点。它还查找子目录中的更改。