Python 来自 os.listdir() 的非字母数字列表顺序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4813061/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 17:38:40  来源:igfitidea点击:

Non-alphanumeric list order from os.listdir()

pythonlistdirectory-listinglistdir

提问by marshall.ward

I often use python to process directories of data. Recently, I have noticed that the default order of the lists has changed to something almost nonsensical. For example, if I am in a current directory containing the following subdirectories: run01, run02, ... run19, run20, and then I generate a list from the following command:

我经常使用python来处理数据目录。最近,我注意到列表的默认顺序已更改为几乎无意义的东西。例如,如果我在包含以下子目录的当前目录中:run01、run02、...run19、run20,然后我从以下命令生成一个列表:

dir = os.listdir(os.getcwd())

then I usually get a list in this order:

然后我通常会按这个顺序得到一个列表:

dir = ['run01', 'run18', 'run14', 'run13', 'run12', 'run11', 'run08', ... ]

and so on. The order used to be alphanumeric. But this new order has remained with me for a while now.

等等。订单过去是字母数字。但是这个新秩序已经伴随我一段时间了。

What is determining the (displayed) order of these lists?

什么决定了这些列表的(显示)顺序?

采纳答案by Nowayz

I think the order has to do with the way the files are indexed on your FileSystem. If you really want to make it adhere to some order you can always sort the list after getting the files.

我认为顺序与文件在 FileSystem 上的索引方式有关。如果您真的想让它遵守某个顺序,您可以在获取文件后始终对列表进行排序。

回答by Mike DeSimone

It's probably just the order that C's readdir()returns. Try running this C program:

这可能只是 Creaddir()返回的顺序。尝试运行这个 C 程序:

#include <dirent.h>
#include <stdio.h>
int main(void)
{   DIR *dirp;
    struct dirent* de;
    dirp = opendir(".");
    while(de = readdir(dirp)) // Yes, one '='.
        printf("%s\n", de->d_name);
    closedir(dirp);
    return 0;
}

The build line should be something like gcc -o foo foo.c.

构建线应该类似于gcc -o foo foo.c.

P.S. Just ran this and your Python code, and they both gave me sorted output, so I can't reproduce what you're seeing.

PS 刚刚运行这个和你的 Python 代码,他们都给了我排序的输出,所以我无法重现你所看到的。

回答by Mark Tolonen

Per the documentation:

根据文档

os.listdir(path)

Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory.

os.listdir(路径)

返回一个包含路径给定目录中条目名称的列表。该列表的顺序是任意的。它不包括特殊条目“.” 和 '..' 即使它们存在于目录中。

Order cannot be relied upon and is an artifact of the filesystem.

不能依赖顺序,它是文件系统的产物。

To sort the result, use sorted(os.listdir(path)).

要对结果进行排序,请使用sorted(os.listdir(path)).

回答by mgilson

You can use the builtin sortedfunction to sort the strings however you want. Based on what you describe,

您可以使用内置sorted函数根据需要对字符串进行排序。根据你的描述,

sorted(os.listdir(whatever_directory))

Alternatively, you can use the .sortmethod of a list:

或者,您可以使用.sort列表的方法:

lst = os.listdir(whatever_directory)
lst.sort()

I think should do the trick.

我认为应该做的伎俩。

Note that the order that os.listdirgets the filenames is probably completely dependent on your filesystem.

请注意,os.listdir获取文件名的顺序可能完全取决于您的文件系统。

回答by Denis

In [6]: os.listdir?

Type:       builtin_function_or_method
String Form:<built-in function listdir>
Docstring:
listdir(path) -> list_of_strings
Return a list containing the names of the entries in the directory.
path: path of directory to list
The list is in **arbitrary order**.  It does not include the special
entries '.' and '..' even if they are present in the directory.

回答by Jue

I found "sort" does not always do what I expected. eg, I have a directory as below, and the "sort" give me a very strange result:

我发现“排序”并不总是符合我的预期。例如,我有一个如下目录,“排序”给了我一个非常奇怪的结果:

>>> os.listdir(pathon)
['2', '3', '4', '5', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472']
>>> sorted([ f for f in os.listdir(pathon)])
['2', '3', '4', '403', '404', '407', '408', '410', '411', '412', '413', '414', '415', '416', '472', '5']

It seems it compares the first character first, if that is the biggest, it would be the last one.

好像是先比较第一个字符,如果最大,就是最后一个。

回答by funk

The proposed combination of os.listdirand sortedcommands generates the same result as ls -lcommand under Linux. The following example verifies this assumption:

os.listdirsorted命令的建议组合生成与Linux 下的ls -l命令相同的结果。以下示例验证了这一假设:

user@user-PC:/tmp/test$ touch 3a 4a 5a b c d1 d2 d3 k l p0 p1 p3 q 410a 409a 408a 407a
user@user-PC:/tmp/test$ ls -l
total 0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 3a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 407a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 408a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 409a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 410a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 4a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 5a
-rw-rw-r-- 1 user user 0 Feb  15 10:31 b
-rw-rw-r-- 1 user user 0 Feb  15 10:31 c
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d2
-rw-rw-r-- 1 user user 0 Feb  15 10:31 d3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 k
-rw-rw-r-- 1 user user 0 Feb  15 10:31 l
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p0
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p1
-rw-rw-r-- 1 user user 0 Feb  15 10:31 p3
-rw-rw-r-- 1 user user 0 Feb  15 10:31 q

user@user-PC:/tmp/test$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir( './' )
['d3', 'k', 'p1', 'b', '410a', '5a', 'l', 'p0', '407a', '409a', '408a', 'd2', '4a', 'p3', '3a', 'q', 'c', 'd1']
>>> sorted( os.listdir( './' ) )
['3a', '407a', '408a', '409a', '410a', '4a', '5a', 'b', 'c', 'd1', 'd2', 'd3', 'k', 'l', 'p0', 'p1', 'p3', 'q']
>>> exit()
user@user-PC:/tmp/test$ 

So, for someone who wants to reproduce the result of the well-known ls -lcommand in his Python code, sorted( os.listdir( DIR ) )works pretty well.

因此,对于想要在其 Python 代码中重现众所周知的ls -l命令结果的人来说,sorted( os.listdir( DIR ) )效果很好。

回答by rajeshcis

aaa = ['row_163.pkl', 'row_394.pkl', 'row_679.pkl', 'row_202.pkl', 'row_1449.pkl', 'row_247.pkl', 'row_1353.pkl', 'row_749.pkl', 'row_1293.pkl', 'row_1304.pkl', 'row_78.pkl', 'row_532.pkl', 'row_9.pkl', 'row_1435.pkl']                                                                                                                                                                                                                                                                                                 
sorted(aaa, key=lambda x: int(os.path.splitext(x.split('_')[1])[0]))

As In case of mine requirement I have the case like row_163.pklhere os.path.splitext('row_163.pkl')will break it into ('row_163', '.pkl')so need to split it based on '_' also.

因为在我的要求的情况下,我有这样的情况row_163.pklhereos.path.splitext('row_163.pkl')会将它分解成('row_163', '.pkl')所以也需要根据'_'拆分它。

but in case of your requirement you can do something like

但如果您有要求,您可以执行以下操作

sorted(aa, key = lambda x: (int(re.sub('\D','',x)),x))

where

在哪里

aa = ['run01', 'run08', 'run11', 'run12', 'run13', 'run14', 'run18']

and also for directory retrieving you can do sorted(os.listdir(path))

也可以进行目录检索 sorted(os.listdir(path))

and for the case of like 'run01.txt'or 'run01.csv'you can do like this

对于喜欢的情况,'run01.txt'或者'run01.csv'你可以这样做

sorted(files, key=lambda x : int(os.path.splitext(x)[0]))

回答by user136036

Python for whatever reason does not come with a built-in way to have natural sorting(meaning 1, 2, 10 instead of 1, 10, 2), so you have to write it yourself:

无论出于何种原因,Python 都没有内置的自然排序方式(意味着 1, 2, 10 而不是 1, 10, 2),因此您必须自己编写:

import re
def sorted_alphanumeric(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(data, key=alphanum_key)

You can now use this function to sort a list:

您现在可以使用此函数对列表进行排序:

dirlist = sorted_alphanumeric(os.listdir(...))


PROBLEMS:In case you use the above function to sort strings (for example folder names) and want them sorted like Windows Explorer does, it will not work properly in some edge cases.
This sorting function will return incorrect results on Windows, if you have folder names with certain 'special' characters in them. For example this function will sort 1, !1, !a, a, whereas Windows Explorer would sort !1, 1, !a, a.

问题:如果您使用上述函数对字符串(例如文件夹名称)进行排序并希望它们像 Windows 资源管理器那样排序,则在某些极端情况下它将无法正常工作。
如果您的文件夹名称中包含某些“特殊”字符,则此排序功能将在 Windows 上返回不正确的结果。例如,此函数将排序1, !1, !a, a,而 Windows 资源管理器将排序!1, 1, !a, a

So if you want to sort exactly like Windows Explorer does in Pythonyou have to use the Windows built-in function StrCmpLogicalWvia ctypes (this of course won't work on Unix):

因此,如果您想像Windows 资源管理器在 Python 中一样进行排序,则必须通过 ctypes使用 Windows 内置函数StrCmpLogicalW(这当然不适用于 Unix):

from ctypes import wintypes, windll
from functools import cmp_to_key
def winsort(data):
    _StrCmpLogicalW = windll.Shlwapi.StrCmpLogicalW
    _StrCmpLogicalW.argtypes = [wintypes.LPWSTR, wintypes.LPWSTR]
    _StrCmpLogicalW.restype  = wintypes.INT

    cmp_fnc = lambda psz1, psz2: _StrCmpLogicalW(psz1, psz2)
    return sorted(data, key=cmp_to_key(cmp_fnc))

This function is slightly slower than sorted_alphanumeric().

这个函数比sorted_alphanumeric().

Bonus: winsortcan also sort full paths on Windows.

奖励:winsort还可以在 Windows 上对完整路径进行排序

Alternatively, especially if you use Unix, you can use the natsortlibrary (pip install natsort) to sort by full paths in a correct way (meaning subfolders at the correct position).

或者,特别是如果您使用 Unix,您可以使用natsort库 ( pip install natsort) 以正确的方式按完整路径排序(意味着子文件夹在正确的位置)。

You can use it like this to sort full paths:

您可以像这样使用它来对完整路径进行排序:

from natsort import natsorted, ns
dirlist = natsorted(dirlist, alg=ns.PATH | ns.IGNORECASE)

Don't use it for normal sorting of just folder names (or strings in general), as it's quite a bit slower than then sorted_alphanumeric()function above.
natsortedlibrary will give you incorrectresults if you expect Windows Explorer sorting, so use winsort()for that.

不要将它用于文件夹名称(或一般字符串)的正常排序,因为它比sorted_alphanumeric()上面的函数慢很多。如果您希望 Windows 资源管理器排序,
natsorted库会给您不正确的结果,因此请使用winsort()它。

回答by rocksyne

Elliot'sanswer solves it perfectly but because it is a comment, it goes unnoticed so with the aim of helping someone, I am reiterating it as a solution.

Elliot 的回答完美地解决了这个问题,但因为它是一个评论,所以没有引起注意,所以为了帮助某人,我重申它是一个解决方案。

Use natsort library:

使用 natsort 库:

Install the library with the following command for Ubuntu and other Debian versions

使用以下命令为 Ubuntu 和其他 Debian 版本安装库

Python 2

蟒蛇 2

sudo pip install natsort

Python 3

蟒蛇 3

sudo pip3 install natsort

Details of how to use this library is found here

可以在此处找到有关如何使用此库的详细信息