python 查找目录中最旧的文件(递归)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/837606/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 20:57:17  来源:igfitidea点击:

Find the oldest file (recursively) in a directory

pythonlinuxfile-io

提问by Rowan Parker

I'm writing a Python backup script and I need to find the oldest file in a directory (and its sub-directories). I also need to filter it down to *.avi files only.

我正在编写一个 Python 备份脚本,我需要在目录(及其子目录)中找到最旧的文件。我还需要将其过滤为仅 *.avi 文件。

The script will always be running on a Linux machine. Is there some way to do it in Python or would running some shell commands be better?

该脚本将始终在 Linux 机器上运行。有没有办法在 Python 中做到这一点,或者运行一些 shell 命令会更好吗?

At the moment I'm running dfto get the free space on a particular partition, and if there is less than 5 gigabytes free, I want to start deleting the oldest *.avifiles until that condition is met.

目前我正在运行df以获取特定分区上的可用空间,如果可用空间少于 5 GB,我想开始删除最旧的*.avi文件,直到满足该条件。

回答by tzot

Hm. Nadia's answer is closer to what you meantto ask; however, for finding the (single) oldest file in a tree, try this:

嗯。Nadia 的回答更接近你的意思;但是,要在树中查找(单个)最旧的文件,请尝试以下操作:

import os
def oldest_file_in_tree(rootfolder, extension=".avi"):
    return min(
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

With a little modification, you can get the noldest files (similar to Nadia's answer):

稍加修改,您可以获得n最旧的文件(类似于 Nadia 的回答):

import os, heapq
def oldest_files_in_tree(rootfolder, count=1, extension=".avi"):
    return heapq.nsmallest(count,
        (os.path.join(dirname, filename)
        for dirname, dirnames, filenames in os.walk(rootfolder)
        for filename in filenames
        if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime)

Note that using the .endswithmethod allows calls as:

请注意,使用该.endswith方法允许调用为:

oldest_files_in_tree("/home/user", 20, (".avi", ".mov"))

to select more than one extension.

选择多个分机。

Finally, should you want the complete list of files, ordered by modification time, in order to delete as many as required to free space, here's some code:

最后,如果您想要完整的文件列表,按修改时间排序,以便根据需要删除尽可能多的空间,这里有一些代码:

import os
def files_to_delete(rootfolder, extension=".avi"):
    return sorted(
        (os.path.join(dirname, filename)
         for dirname, dirnames, filenames in os.walk(rootfolder)
         for filename in filenames
         if filename.endswith(extension)),
        key=lambda fn: os.stat(fn).st_mtime),
        reverse=True)

and note that the reverse=Truebrings the oldest files at the end of the list, so that for the next file to delete, you just do a file_list.pop().

并注意reverse=True将最旧的文件放在列表的末尾,以便删除下一个文件,您只需执行file_list.pop().

By the way, for a complete solution to your issue, since you are running on Linux, where the os.statvfsis available, you can do:

顺便说一下,要完整解决您的问题,因为您在 Linux 上运行,在os.statvfs可用的地方,您可以执行以下操作:

import os
def free_space_up_to(free_bytes_required, rootfolder, extension=".avi"):
    file_list= files_to_delete(rootfolder, extension)
    while file_list:
        statv= os.statvfs(rootfolder)
        if statv.f_bfree*statv.f_bsize >= free_bytes_required:
            break
        os.remove(file_list.pop())

statvfs.f_bfreeare the device free blocks and statvfs.f_bsizeis the block size. We take the rootfolderstatvfs, so mind any symbolic links pointing to other devices, where we could delete many files without actually freeing up space in this device.

statvfs.f_bfree是设备空闲块,statvfs.f_bsize是块大小。我们使用rootfolderstatvfs,因此请注意指向其他设备的任何符号链接,我们可以在其中删除许多文件,而无需实际释放该设备中的空间。

UPDATE (copying a comment by Juan):

更新(复制胡安的评论):

Depending on the OS and filesystem implementation, you may want to multiply f_bfree by f_frsize rather than f_bsize. In some implementations, the latter is the preferred I/O request size. For example, on a FreeBSD 9 system I just tested, f_frsize was 4096 and f_bsize was 16384. POSIX says the block count fields are "in units of f_frsize" ( see http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/sys_statvfs.h.html)

根据操作系统和文件系统实现,您可能希望将 f_bfree 乘以 f_frsize 而不是 f_bsize。在一些实现中,后者是首选的 I/O 请求大小。例如,在我刚刚测试的 FreeBSD 9 系统上,f_frsize 为 4096,f_bsize 为 16384。POSIX 表示块计数字段“以 f_frsize 为单位”(参见http://pubs.opengroup.org/onlinepubs/9699919799// basedefs/sys_statvfs.h.html)

回答by dF.

To do it in Python, you can use os.walk(path)to iterate recursively over the files, and the st_sizeand st_mtimeattributes of os.stat(filename)to get the file sizes and modification times.

要在 Python 中执行此操作,您可以使用os.walk(path)递归遍历文件以及 的st_sizest_mtime属性os.stat(filename)来获取文件大小和修改时间。

回答by Nadia Alramli

You can use statand fnmatchmodules together to find the files

您可以一起使用statfnmatch模块来查找文件

ST_MTIME refere to the last modification time. You can choose another value if you want

ST_MTIME 指最后修改时间。如果需要,您可以选择其他值

import os, stat, fnmatch
file_list = []
for filename in os.listdir('.'):
    if fnmatch.fnmatch(filename, '*.avi'):
        file_list.append((os.stat(filename)[stat.ST_MTIME], filename))

Then you can order the list by time and delete according to it.

然后您可以按时间对列表进行排序并根据它进行删除。

file_list.sort(key=lambda a: a[0])

回答by John T

I think the easiest way to do this would be to use find along with ls -t (sort files by time).

我认为最简单的方法是使用 find 和 ls -t (按时间排序文件)。

something along these lines should do the trick (deletes oldest avi file under specified directory)

沿着这些路线的东西应该可以解决问题(删除指定目录下最旧的avi文件)

find / -name "*.avi" | xargs ls -t | tail -n 1 | xargs rm

step by step....

一步步....

find / -name "*.avi"- find all avi files recursively starting at the root directory

find / -name "*.avi"- 从根目录开始递归查找所有 avi 文件

xargs ls -t- sort all files found by modification time, from newest to oldest.

xargs ls -t- 按修改时间对找到的所有文件进行排序,从最新到最旧。

tail -n 1- grab the last file in the list (oldest)

tail -n 1- 获取列表中的最后一个文件(最旧的)

xargs rm- and remove it

xargs rm- 并删除它

回答by tom10

Here's another Python formulation, which a bit old-school compared to some others, but is easy to modify, and handles the case of no matching files without raising an exception.

这是另一个 Python 公式,与其他一些公式相比,它有点老派,但易于修改,并且可以处理没有匹配文件的情况而不会引发异常。

import os

def find_oldest_file(dirname="..", extension=".avi"):
    oldest_file, oldest_time = None, None
    for dirpath, dirs, files in os.walk(dirname):
        for filename in files:
            file_path = os.path.join(dirpath, filename)
            file_time = os.stat(file_path).st_mtime
                if file_path.endswith(extension) and (file_time<oldest_time or oldest_time is None):
                oldest_file, oldest_time = file_path, file_time
    return oldest_file, oldest_time

print find_oldest_file()

回答by Michael Haren

Check out the linux command find.

查看 linux 命令find

Alternatively, this postpipes together ls and tail to delete the oldest file in a directory. That could be done in a loop while there isn't enough free space.

或者,这篇文章将 ls 和 tail 连接在一起以删除目录中最旧的文件。这可以在没有足够可用空间的情况下循环完成。

For reference, here's the shell code that does it (follow the link for more alternatives and a discussion):

作为参考,这是执行此操作的 shell 代码(点击链接了解更多替代方法和讨论):

ls -t -r -1 /path/to/files | head --lines 1 | xargs rm

回答by Parappa

The os moduleprovides the functions that you need to get directory listings and file info in Python. I've found os.walkto be especially useful for walking directories recursively, and os.stat will give you detailed info (including modification time) on each entry.

os模块提供了你需要得到目录列表和Python中的文件信息的功能。我发现os.walk对于递归遍历目录特别有用,并且 os.stat 将为您提供每个条目的详细信息(包括修改时间)。

You may be able to do this easier with a simple shell command. Whether that works better for you or not depends on what you want to do with the results.

您可以使用简单的 shell 命令更轻松地完成此操作。这是否对您更有效取决于您想对结果做什么。