Python 你如何按数字对文件进行排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4623446/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you sort files numerically?
提问by Zach Young
First off, I'm posting this because when I was looking for a solution to the problem below, I could not find one on stackoverflow. So, I'm hoping to add a little bit to the knowledge base here.
首先,我发布这个是因为当我在寻找下面问题的解决方案时,我在 stackoverflow 上找不到一个解决方案。所以,我希望在这里添加一点知识库。
I need to process some files in a directory and need the files to be sorted numerically. I found some examples on sorting--specifically with using the lambdapattern--at wiki.python.org, and I put this together:
我需要处理目录中的一些文件,并需要按数字对文件进行排序。我lambda在wiki.python.org找到了一些关于排序的例子——特别是使用模式——我把它们放在一起:
#!env/python
import re
tiffFiles = """ayurveda_1.tif
ayurveda_11.tif
ayurveda_13.tif
ayurveda_2.tif
ayurveda_20.tif
ayurveda_22.tif""".split('\n')
numPattern = re.compile('_(\d{1,2})\.', re.IGNORECASE)
tiffFiles.sort(cmp, key=lambda tFile:
int(numPattern.search(tFile).group(1)))
print tiffFiles
I'm still rather new to Python and would like to ask the community if there are any improvements that can be made to this: shortening the code up (removing lambda), performance, style/readability?
我对 Python 还是比较陌生,想问社区是否可以对此进行任何改进:缩短代码(删除lambda)、性能、样式/可读性?
Thank you, Zachary
谢谢你,扎卡里
采纳答案by Daniel DiPaolo
This is called "natural sorting" or "human sorting" (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.
这称为“自然排序”或“人工排序”(与默认的字典排序相反)。 Ned B 写了一个快速版本。
import re
def tryint(s):
try:
return int(s)
except:
return s
def alphanum_key(s):
""" Turn a string into a list of string and number chunks.
"z23a" -> ["z", 23, "a"]
"""
return [ tryint(c) for c in re.split('([0-9]+)', s) ]
def sort_nicely(l):
""" Sort the given list in the way that humans expect.
"""
l.sort(key=alphanum_key)
It's similar to what you're doing, but perhaps a bit more generalized.
它类似于您正在做的事情,但可能更笼统一些。
回答by Don O'Donnell
If you are using key=in your sort method you shouldn't use cmpwhich has been removed from the latest versions of Python. keyshould be equated to a function which takes a record as input and returns any object which will compare in the order you want your list sorted. It doesn't need to be a lambda function and might be clearer as a stand alone function. Also regular expressions can be slow to evaluate.
如果您key=在排序方法中使用cmp,则不应使用已从最新版本的 Python 中删除的方法。 key应该等同于一个函数,它将记录作为输入并返回任何对象,该对象将按照您希望列表排序的顺序进行比较。它不需要是一个 lambda 函数,作为一个独立的函数可能会更清晰。此外,正则表达式的计算速度可能很慢。
You could try something like the following to isolate and return the integer part of the file name:
您可以尝试类似以下操作来隔离并返回文件名的整数部分:
def getint(name):
basename = name.partition('.')
alpha, num = basename.split('_')
return int(num)
tiffiles.sort(key=getint)
回答by Prabhath Kota
Partition results in Tuple
元组中的分区结果
def getint(name):
(basename, part, ext) = name.partition('.')
(alpha, num) = basename.split('_')
return int(num)
回答by dkmatt0
Just use :
只需使用:
tiffFiles.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])
is faster than use try/except.
比使用 try/except 更快。
回答by StatsSorceress
This is a modified version of @Don O'Donnell's answer, because I couldn't get it working as-is, but I think it's the best answer here as it's well-explained.
这是@Don O'Donnell 答案的修改版本,因为我无法让它按原样工作,但我认为这是最好的答案,因为它已经得到了很好的解释。
def getint(name):
_, num = name.split('_')
num, _ = num.split('.')
return int(num)
print(sorted(tiffFiles, key=getint))
Changes:
变化:
1) The alphastring doesn't get stored, as it's not needed (hence _, num)
1)alpha字符串不会被存储,因为它不需要(因此_, num)
2) Use num.split('.')to separate the number from .tiff
2)num.split('.')用于将数字与 .tiff 分开
3) Use sortedinstead of list.sort, per https://docs.python.org/2/howto/sorting.html
3) 使用sorted代替list.sort,根据https://docs.python.org/2/howto/sorting.html

