如何解析文件列表以仅获取 Python 中的文件名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/237699/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I parse a listing of files to get just the filenames in Python?
提问by Lawrence Johnston
So lets say I'm using Python's ftplibto retrieve a list of log files from an FTP server. How would I parse that list of files to get just the file names (the last column) inside a list? See the link above for example output.
因此,假设我正在使用 Python 的ftplib从 FTP 服务器检索日志文件列表。我将如何解析该文件列表以仅获取列表中的文件名(最后一列)?有关示例输出,请参见上面的链接。
回答by James Bennett
Using retrlines() probably isn't the best idea there, since it just prints to the console and so you'd have to do tricky things to even get at that output. A likely better bet would be to use the nlst() method, which returns exactly what you want: a list of the file names.
使用 retrlines() 可能不是最好的主意,因为它只是打印到控制台,因此您必须做一些棘手的事情才能获得该输出。可能更好的选择是使用 nlst() 方法,该方法返回您想要的内容:文件名列表。
回答by e-satis
This best answer
这个最佳答案
You may want to use ftp.nlst()
instead of ftp.retrlines()
. It will give you exactly what you want.
您可能想要使用ftp.nlst()
代替ftp.retrlines()
. 它会给你你想要的。
If you can't, read the following :
如果不能,请阅读以下内容:
Generators for sysadmin processes
系统管理员进程的生成器
In his now famous review, Generator Tricks For Systems Programmers An Introduction, David M. Beazley gives a lot of receipes to answer to this kind of data problem with wuick and reusable code.
在他现在著名的评论中,系统程序员的生成器技巧简介,David M. Beazley 给出了很多答案,用 wuick 和可重用的代码来回答这种数据问题。
E.G :
EG :
# empty list that will receive all the log entry
log = []
# we pass a callback function bypass the print_line that would be called by retrlines
# we do that only because we cannot use something better than retrlines
ftp.retrlines('LIST', callback=log.append)
# we use rsplit because it more efficient in our case if we have a big file
files = (line.rsplit(None, 1)[1] for line in log)
# get you file list
files_list = list(files)
Why don't we generate immediately the list ?
为什么我们不立即生成列表?
Well, it's because doing it this way offer you much flexibility : you can apply any intermediate generator to filter files before turning it into files_list
: it's just like pipe, add a line, you add a process without overheat (since it's generators). And if you get rid off retrlines
, it still work be it's even better because you don't store the list even one time.
嗯,这是因为这样做为您提供了很大的灵活性:您可以在将文件转换为之前应用任何中间生成器来过滤文件files_list
:就像管道一样,添加一条线,添加一个过程而不会过热(因为它是生成器)。如果你摆脱了retrlines
,它仍然有效,甚至更好,因为你甚至一次都不存储列表。
EDIT : well, I read the comment to the other answer and it says that this won't work if there is any space in the name.
编辑:好吧,我阅读了对另一个答案的评论,它说如果名称中有任何空格,这将不起作用。
Cool, this will illustrate why this method is handy. If you want to change something in the process, you just change a line. Swap :
很酷,这将说明为什么这种方法很方便。如果您想在过程中更改某些内容,只需更改一行即可。交换 :
files = (line.rsplit(None, 1)[1] for line in log)
and
和
# join split the line, get all the item from the field 8 then join them
files = (' '.join(line.split()[8:]) for line in log)
Ok, this may no be obvious here, but for huge batch process scripts, it's nice :-)
好的,这在这里可能并不明显,但是对于庞大的批处理脚本来说,这很好:-)
回答by tzot
If the FTP server supports the MLSD
command, then please see section “single directory case” from thatanswer.
如果 FTP 服务器支持该MLSD
命令,则请参阅该答案中的“单目录案例”部分。
Use an instance (say ftpd
) of the FTPDirectory
class, call its .getdata
method with connected ftplib.FTP
instance in the correct folder, then you can:
使用类的一个实例(比如ftpd
),在正确的文件夹中使用连接的实例FTPDirectory
调用它的.getdata
方法ftplib.FTP
,然后你可以:
directory_filenames= [ftpfile.name for ftpfile in ftpd.files]
回答by James Bennett
And a slightly less-optimal method, by the way, if you're stuck using retrlines() for some reason, is to pass a function as the second argument to retrlines(); it'll be called for each item in the list. So something like this (assuming you have an FTP object named 'ftp') would work as well:
顺便说一句,如果您由于某种原因坚持使用 retrlines(),那么一个稍微不太理想的方法是将函数作为第二个参数传递给 retrlines();它将为列表中的每个项目调用。所以像这样的东西(假设你有一个名为“ftp”的 FTP 对象)也可以工作:
filenames = []
ftp.retrlines('LIST', lambda line: filenames.append(line.split()[-1]))
The list 'filenames' will then be a list of the file names.
列表 'filenames' 将是文件名的列表。
回答by Jeremy Ruten
Since every filename in the output starts at the same column, all you have to do is get the position of the dot on the first line:
由于输出中的每个文件名都从同一列开始,您所要做的就是获取点在第一行的位置:
drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 .
drwxrwsr-x 5 ftp-usr pdmaint 1536 Mar 20 09:48 .
Then slice the filename out of the other lines using the position of that dot as the starting index.
然后使用该点的位置作为起始索引从其他行中切出文件名。
Since the dot is the last character on the line, you can use the length of the line minus 1 as the index. So the final code is something like this:
由于点是该行的最后一个字符,您可以使用该行的长度减 1 作为索引。所以最终的代码是这样的:
lines = ftp.retrlines('LIST')
lines = lines.split("\n") # This should split the string into an array of lines
filename_index = len(lines[0]) - 1
files = []
for line in lines:
files.append(line[filename_index:])
回答by ayaz
Is there any reason why ftplib.FTP.nlst()won't work for you? I just checked and it returns only names of the files in a given directory.
ftplib.FTP.nlst()对你不起作用有什么原因吗?我刚刚检查过,它只返回给定目录中文件的名称。
回答by Mohit Ranka
I believe it should work for you.
我相信它应该对你有用。
file_name_list = [' '.join(each_file.split()).split()[-1] for each_file_detail in file_list_from_log]
NOTES -
备注 -
Here I am making a assumption that you want the data in the program (as list), not on console.
each_file_detail is each line that is being produced by the program.
' '.join(each_file.split())
在这里,我假设您想要程序中的数据(作为列表),而不是控制台上的数据。
each_file_detail 是程序生成的每一行。
' '.join(each_file.split())
To replace multiple spaces by 1 space.
用 1 个空格替换多个空格。