使用 Python 和 ftplib.FTP 从 z/os 下载文本文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1184844/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 21:39:51  来源:igfitidea点击:

Downloading text files with Python and ftplib.FTP from z/os

pythonftpmainframezos

提问by Brent.Longborough

I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.

我正在尝试使用 Python 和 ftplib 从 z/os PDS 自动下载一些文本文件。

Since the host files are EBCDIC, I can't simply use FTP.retrbinary().

由于主机文件是 EBCDIC,我不能简单地使用 FTP.retrbinary()。

FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.

FTP.retrlines() 与 open(file,w).writelines 作为回调一起使用时,当然不提供 EOL。

So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.

所以,对于初学者来说,我想出了这段“对我来说还不错”的代码,但由于我是一个相对的 Python 菜鸟,有人能提出更好的方法吗?显然,为了让这个问题保持简单,这不是最后的花里胡哨的事情。

Many thanks.

非常感谢。

#!python.exe
from ftplib import FTP

class xfile (file):
    def writelineswitheol(self, sequence):
        for s in sequence:
            self.write(s+"\r\n")

sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
    sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()

Update: Python 3.0, platform is MingW under Windows XP.

更新:Python 3.0,平台为Windows XP下的MingW。

z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.

z/os PDS 具有固定的记录结构,而不是依靠行尾作为记录分隔符。但是,z/os FTP 服务器在以文本模式传输时,会提供记录结尾,而 retrlines() 会去掉这些结尾。

Closing update:

结束更新:

Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):

这是我修改后的解决方案,它将成为持续开发的基础(例如,删除内置密码):

import ftplib
import os
from sys import exc_info

sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
    sess.cwd("'ZLTALM.PREP.%s'" % dir)
    try:
        filelist = sess.nlst()
    except ftplib.error_perm as x:
        if (x.args[0][:3] != '550'):
            raise
    else:
        try:
            os.mkdir(dir)
        except:
            continue
        for hostfile in filelist:
            lines = []
            sess.retrlines("RETR "+hostfile, lines.append)
            pcfile = open("%s/%s"% (dir,hostfile), 'w')
            for line in lines:
                pcfile.write(line+"\n")
            pcfile.close()
        print ("Done: " + dir)
sess.quit()

My thanks to both John and Vinay

感谢约翰和维奈

采纳答案by Dave Griffiths

Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:

刚刚遇到这个问题,因为我试图弄清楚如何从 z/OS 递归下载数据集。多年来,我一直在使用一个简单的 python 脚本从大型机下载 ebcdic 文件。它有效地做到了这一点:

def writeline(line):
    file.write(line + "\n")

file = open(filename, "w")
ftp.retrlines("retr " + filename, writeline)

回答by Vinay Sajip

You should be able to download the file as a binary (using retrbinary) and use the codecsmodule to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):

您应该能够将文件下载为二进制文件(使用retrbinary)并使用该codecs模块从 EBCDIC 转换为您想要的任何输出编码。您应该知道在 z/OS 系统(例如 cp500)上使用的特定 EBCDIC 代码页。如果文件很小,您甚至可以执行以下操作(用于转换为 UTF-8):

file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()

Update:If you need to use retrlinesto get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequencewill be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n")rather than the forloop. It still doesn' feel especially right to subclass filejust to add this utility method, though - it probably needs to be in a different class in your bells-and-whistlesversion.

更新:如果您需要使用retrlines来获取行并且您的行以正确的编码返回,则您的方法将不起作用,因为每行都会调用一次回调。因此,在回调中,sequence将是行,您的 for 循环会将行中的单个字符写入输出,每个字符都在自己的 line 上。所以你可能想要做self.write(sequence + "\r\n")而不是for循环。尽管如此,子类化file只是为了添加这个实用程序方法仍然感觉特别不正确- 它可能需要在您的bells-and-whistles版本中位于不同的类中。

回答by John Machin

Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.

您的 writelineswitheol 方法附加 '\r\n' 而不是 '\n',然后将结果写入以文本模式打开的文件。无论您在哪个平台上运行,结果都是不需要的 '\r'。只需附加 '\n' ,您将获得适当的行尾。

Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.

正确的错误处理不应归于“花里胡哨”的版本。您应该设置您的回调,以便您的文件 open() 在 try/except 中并保留对输出文件句柄的引用,您的 write 调用在 try/except 中,并且您有一个 callback_obj.close() 方法您在 retrlines() 显式返回 file_handle.close() 时使用(在 try/except 中)——这样您会得到显式错误处理,例如消息“不能(打开|写入|关闭)文件 X 因为 Y”和您不必考虑何时隐式关闭文件以及是否有用完文件句柄的风险。

Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.

Python 3.x ftplib.FTP.retrlines() 应该为您提供实际上是 Unicode 字符串的 str 对象,并且您需要在编写它们之前对它们进行编码——除非默认编码是 latin1,这对于 Windows 来说是相当不寻常的盒子。您应该拥有包含 (1) 所有可能的 256 字节 (2) 在预期的 EBCDIC 代码页中有效的所有字节的测试文件。

[a few "sanitation" remarks]

【几句“卫生”备注】

  1. You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.

  2. To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)

  3. Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.

  1. 您应该考虑将 Python 从 3.0(“概念证明”版本)升级到 3.1。

  2. 为了更好地理解您的代码,请仅将“i”用作标识符作为序列索引,并且仅当您在几十年前从 FORTRAN 3 或更多年前不可挽回地获得了这个习惯时:-)

  3. 到目前为止发现的两个问题(向每个字符附加行终止符,错误的行终止符)在您第一次测试时就会出现。