Python 递归比较两个目录以确保它们具有相同的文件和子目录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4187564/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:41:52  来源:igfitidea点击:

Recursively compare two directories to ensure they have the same files and subdirectories

pythonrecursion

提问by Gregg Lind

From what I observe filecmp.dircmpis recursive, but inadequate for my needs, at least in py2. I want to compare two directories and all their contained files. Does this exist, or do I need to build (using os.walk, for example). I prefer pre-built, where someone else has already done the unit-testing :)

从我观察到的filecmp.dircmp递归的,但不足以满足我的需要,至少在 py2. 我想比较两个目录及其包含的所有文件。这是否存在,或者我是否需要构建(os.walk例如,使用)。我更喜欢预先构建的,其他人已经完成了单元测试:)

The actual 'comparison' can be sloppy (ignore permissions, for example), if that helps.

如果有帮助的话,实际的“比较”可能很草率(例如,忽略权限)。

I would like something boolean, and report_full_closureis a printed report. It also only goes down common subdirs. AFIAC, if they have anything in the left or right dir onlythose are different dirs. I build this using os.walkinstead.

我想要一些布尔值,并且report_full_closure是打印报告。它也只会进入常见的子目录。AFIAC,如果他们在左侧或右侧目录中有任何内容,则只有那些是不同的目录。我使用它os.walk来构建它。

回答by Katriel

dircmpcan be recursive: see report_full_closure.

dircmp可以递归:见report_full_closure

As far as I know dircmpdoes not offer a directory comparison function. It would be very easy to write your own, though; use left_onlyand right_onlyon dircmpto check that the files in the directories are the same and then recurse on the subdirsattribute.

据我所知dircmp不提供目录比较功能。不过,编写自己的代码会很容易;使用left_onlyand right_onlyondircmp检查目录中的文件是否相同,然后对subdirs属性进行递归。

回答by asthasr

The report_full_closure()method is recursive:

report_full_closure()方法是递归的:

comparison = filecmp.dircmp('/directory1', '/directory2')
comparison.report_full_closure()

Edit: After the OP's edit, I would say that it's best to just use the other functions in filecmp. I think os.walkis unnecessary; better to simply recurse through the lists produced by common_dirs, etc., although in some cases (large directory trees) this might risk a Max Recursion Depth error if implemented poorly.

编辑:在 OP 编辑​​之后,我会说最好只使用filecmp. 我认为os.walk是不必要的;最好简单地通过common_dirs等生成的列表进行递归,尽管在某些情况下(大型目录树),如果实施不当,这可能会导致 Max Recursion Depth 错误。

回答by Gregg Lind

Here is my solution: gist

这是我的解决方案: gist

def dirs_same_enough(dir1,dir2,report=False):
    ''' use os.walk and filecmp.cmpfiles to
    determine if two dirs are 'same enough'.

    Args:
        dir1, dir2:  two directory paths
        report:  if True, print the filecmp.dircmp(dir1,dir2).report_full_closure()
                 before returning

    Returns:
        bool

    '''
    # os walk:  root, list(dirs), list(files)
    # those lists won't have consistent ordering,
    # os.walk also has no guaranteed ordering, so have to sort.
    walk1 = sorted(list(os.walk(dir1)))
    walk2 = sorted(list(os.walk(dir2)))

    def report_and_exit(report,bool_):
        if report:
            filecmp.dircmp(dir1,dir2).report_full_closure()
            return bool_
        else:
            return bool_

    if len(walk1) != len(walk2):
        return false_or_report(report)

    for (p1,d1,fl1),(p2,d2,fl2) in zip(walk1,walk2):
        d1,fl1, d2, fl2 = set(d1),set(fl1),set(d2),set(fl2)
        if d1 != d2 or fl1 != fl2:
            return report_and_exit(report,False)
        for f in fl1:
            same,diff,weird = filecmp.cmpfiles(p1,p2,fl1,shallow=False)
            if diff or weird:
                return report_and_exit(report,False)

    return report_and_exit(report,True)

回答by Mateusz Kobos

Here's an alternative implementation of the comparison function with filecmpmodule. It uses a recursion instead of os.walk, so it is a little simpler. However, it does not recurse simply by using common_dirsand subdirsattributes since in that case we would be implicitly using the default "shallow" implementation of files comparison, which is probably not what you want. In the implementation below, when comparing files with the same name, we're always comparing only their contents.

这是带有filecmp模块的比较函数的另一种实现。它使用递归而不是os.walk,所以它更简单一些。但是,它不会简单地通过使用common_dirssubdirs属性进行递归,因为在这种情况下,我们将隐式使用文件比较的默认“浅”实现,这可能不是您想要的。在下面的实现中,当比较同名文件时,我们总是只比较它们的内容。

import filecmp
import os.path

def are_dir_trees_equal(dir1, dir2):
    """
    Compare two directories recursively. Files in each directory are
    assumed to be equal if their names and contents are equal.

    @param dir1: First directory path
    @param dir2: Second directory path

    @return: True if the directory trees are the same and 
        there were no errors while accessing the directories or files, 
        False otherwise.
   """

    dirs_cmp = filecmp.dircmp(dir1, dir2)
    if len(dirs_cmp.left_only)>0 or len(dirs_cmp.right_only)>0 or \
        len(dirs_cmp.funny_files)>0:
        return False
    (_, mismatch, errors) =  filecmp.cmpfiles(
        dir1, dir2, dirs_cmp.common_files, shallow=False)
    if len(mismatch)>0 or len(errors)>0:
        return False
    for common_dir in dirs_cmp.common_dirs:
        new_dir1 = os.path.join(dir1, common_dir)
        new_dir2 = os.path.join(dir2, common_dir)
        if not are_dir_trees_equal(new_dir1, new_dir2):
            return False
    return True

回答by NotAUser

def same(dir1, dir2):
"""Returns True if recursively identical, False otherwise

"""
    c = filecmp.dircmp(dir1, dir2)
    if c.left_only or c.right_only or c.diff_files or c.funny_files:
        return False
    else:
        safe_so_far = True
        for i in c.common_dirs:
            same_so_far = same_so_far and same(os.path.join(frompath, i), os.path.join(topath, i))
            if not same_so_far:
                break
        return same_so_far

回答by Raullen Chai

Another solution to Compare the lay out of dir1 and dir2, ignore the content of files

比较dir1和dir2的布局,忽略文件内容的另一种解决方案

See gist here: https://gist.github.com/4164344

请参阅此处的要点:https: //gist.github.com/4164344

Edit: here's the code, in case the gist gets lost for some reason:

编辑:这是代码,以防要点因某种原因丢失:

import os

def compare_dir_layout(dir1, dir2):
    def _compare_dir_layout(dir1, dir2):
        for (dirpath, dirnames, filenames) in os.walk(dir1):
            for filename in filenames:
                relative_path = dirpath.replace(dir1, "")
                if os.path.exists( dir2 + relative_path + '\' +  filename) == False:
                    print relative_path, filename
        return

    print 'files in "' + dir1 + '" but not in "' + dir2 +'"'
    _compare_dir_layout(dir1, dir2)
    print 'files in "' + dir2 + '" but not in "' + dir1 +'"'
    _compare_dir_layout(dir2, dir1)


compare_dir_layout('xxx', 'yyy')

回答by Philippe Ombredanne

filecmp.dircmpis the way to go. But it does not compare the content of files found with the same path in two compared directories. Instead filecmp.dircmponly looks at files attributes. Since dircmpis a class, you fix that with a dircmpsubclass and override its phase3function that compares files to ensure content is compared instead of only comparing os.statattributes.

filecmp.dircmp是要走的路。但它不会比较在两个比较目录中找到的具有相同路径的文件的内容。而是filecmp.dircmp只查看文件属性。由于dircmp是一个类,您可以使用dircmp子类修复它并覆盖其phase3比较文件的功能,以确保比较内容而不是仅比较os.stat属性。

import filecmp

class dircmp(filecmp.dircmp):
    """
    Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this
    subclass compares the content of files with the same path.
    """
    def phase3(self):
        """
        Find out differences between common files.
        Ensure we are using content comparison with shallow=False.
        """
        fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,
                                 shallow=False)
        self.same_files, self.diff_files, self.funny_files = fcomp

Then you can use this to return a boolean:

然后你可以使用它来返回一个布尔值:

import os.path

def is_same(dir1, dir2):
    """
    Compare two directory trees content.
    Return False if they differ, True is they are the same.
    """
    compared = dircmp(dir1, dir2)
    if (compared.left_only or compared.right_only or compared.diff_files 
        or compared.funny_files):
        return False
    for subdir in compared.common_dirs:
        if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):
            return False
    return True

In case you want to reuse this code snippet, it is hereby dedicated to the Public Domain or the Creative Commons CC0 at your choice (in addition to the default license CC-BY-SA provided by SO).

如果您想重用此代码片段,特此将其用于您选择的公共领域或知识共享 CC0(除了 SO 提供的默认许可 CC-BY-SA)。

回答by Guillaume Vincent

Here a simple solution with a recursive function :

这是一个带有递归函数的简单解决方案:

import filecmp

def same_folders(dcmp):
    if dcmp.diff_files:
        return False
    for sub_dcmp in dcmp.subdirs.values():
        if not same_folders(sub_dcmp):
            return False
    return True

same_folders(filecmp.dircmp('/tmp/archive1', '/tmp/archive2'))

回答by alzix

Based on python issue 12932and filecmp documentationyou may use following example:

基于python 问题 12932filecmp 文档,您可以使用以下示例:

import os
import filecmp

# force content compare instead of os.stat attributes only comparison
filecmp.cmpfiles.__defaults__ = (False,)

def _is_same_helper(dircmp):
    assert not dircmp.funny_files
    if dircmp.left_only or dircmp.right_only or dircmp.diff_files or dircmp.funny_files:
        return False
    for sub_dircmp in dircmp.subdirs.values():
       if not _is_same_helper(sub_dircmp):
           return False
    return True

def is_same(dir1, dir2):
    """
    Recursively compare two directories
    :param dir1: path to first directory 
    :param dir2: path to second directory
    :return: True in case directories are the same, False otherwise
    """
    if not os.path.isdir(dir1) or not os.path.isdir(dir2):
        return False
    dircmp = filecmp.dircmp(dir1, dir2)
    return _is_same_helper(dircmp)

回答by Rok

This will check if files are in the same locations and if their content is the same. It will not correctly validate for empty subfolders.

这将检查文件是否位于相同的位置以及它们的内容是否相同。它不会正确验证空子文件夹。

import filecmp
import glob
import os

path_1 = '.'
path_2 = '.'

def folders_equal(f1, f2):
    file_pairs = list(zip(
        [x for x in glob.iglob(os.path.join(f1, '**'), recursive=True) if os.path.isfile(x)],
        [x for x in glob.iglob(os.path.join(f2, '**'), recursive=True) if os.path.isfile(x)]
    ))

    locations_equal = any([os.path.relpath(x, f1) == os.path.relpath(y, f2) for x, y in file_pairs])
    files_equal = all([filecmp.cmp(*x) for x in file_pairs]) 

    return locations_equal and files_equal

folders_equal(path_1, path_2)