Java 使用 apache commons-net FTPClient 传输原始二进制文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3145768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 16:51:03  来源:igfitidea点击:

Transfer raw binary with apache commons-net FTPClient?

javaftpbinary-dataapache-commons-net

提问by Chris Suter

UPDATE: Solved

更新:已解决

I was calling FTPClient.setFileType()beforeI logged in, causing the FTP server to use the default mode (ASCII) no matter whatI set it to. The client, on the other hand, was behaving as though the file type had been properly set. BINARYmode is now working exactly as desired, transporting the file byte-for-byte in all cases. All I had to do was a little traffic sniffing in wireshark and then mimicing the FTP commands using netcat to see what was going on. Why didn't I think of that two days ago!? Thanks, everyone for your help!

FTPClient.setFileType()登录调用,导致 FTP 服务器使用默认模式 ( ASCII),无论我将其设置为什么。另一方面,客户端的行为就好像文件类型已正确设置。BINARY模式现在完全按需要工作,在所有情况下都逐字节传输文件。我所要做的就是在wireshark中嗅探一些流量,然后使用netcat模仿FTP命令来查看发生了什么。我前两天怎么没想到!?谢谢大家的帮助!

I have an xml file, utf-16 encoded, which I am downloading from an FTP site using apache's commons-net-2.0 java library's FTPClient. It offers support for two transfer modes: ASCII_FILE_TYPEand BINARY_FILE_TYPE, the difference being that ASCIIwill replace line separators with the appropriate local line separator ('\r\n'or just '\n'-- in hex, 0x0d0aor just 0x0a). My problem is this: I have a test file, utf-16 encoded, that contains the following:

我有一个 xml 文件,utf-16 编码,我使用 apache 的 commons-net-2.0 java 库的 FTPClient 从 FTP 站点下载。它提供对两种传输模式的支持:ASCII_FILE_TYPEBINARY_FILE_TYPE,不同之处在于ASCII将用适当的本地行分隔符('\r\n'或只是'\n'-- 十六进制,0x0d0a或只是0x0a)替换行分隔符。我的问题是:我有一个 utf-16 编码的测试文件,其中包含以下内容:

<?xml version='1.0' encoding='utf-16'?>
<data>
    <blah>blah</blah>
</data>

<?xml version='1.0' encoding='utf-16'?>
<data>
    <blah>blah</blah>
</data>

Here's the hex:
0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000a .f.-.1.6.'.?.>..
0000050: 003c 0064 0061 0074 0061 003e 000a 0009 .<.d.a.t.a.>....
0000060: 003c 0062 006c 0061 0068 003e 0062 006c .<.b.l.a.h.>.b.l
0000070: 0061 0068 003c 002f 0062 006c 0061 0068 .a.h.<./.b.l.a.h
0000080: 003e 000a 003c 002f 0064 0061 0074 0061 .>...<./.d.a.t.a
0000090: 003e 000a                                                           .>..

这是十六进制:
0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000a .f.-.1.6.'.?.>..
0000050: 003c 0064 0061 0074 0061 003e 000a 0009 .<.d.a.t.a.>....
0000060: 003c 0062 006c 0061 0068 003e 0062 006c .<.b.l.a.h.>.b.l
0000070: 0061 0068 003c 002f 0062 006c 0061 0068 .a.h.<./.b.l.a.h
0000080: 003e 000a 003c 002f 0064 0061 0074 0061 .>...<./.d.a.t.a
0000090: 003e 000a                                                           .>..

When I use ASCIImode for this file it transfers correctly, byte-for-byte; the result has the same md5sum. Great. When I use BINARYtransfer mode, which is not supposed to do anything but shuffle bytes from an InputStreaminto an OutputStream, the result is that the newlines (0x0a) are converted to carriage return + newline pairs (0x0d0a). Here's the hex after binary transfer:

当我ASCII为此文件使用mode 时,它​​会正确地逐字节传输;结果具有相同的 md5sum。伟大的。当我使用BINARY传输模式时,除了将字节从 an 混洗InputStream到 an 之外OutputStream,不应该做任何事情,结果是换行符 ( 0x0a) 被转换为回车 + 换行符对 ( 0x0d0a)。这是二进制传输后的十六进制:

0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000d .f.-.1.6.'.?.>..
0000050: 0a00 3c00 6400 6100 7400 6100 3e00 0d0a ..<.d.a.t.a.>...
0000060: 0009 003c 0062 006c 0061 0068 003e 0062 ...<.b.l.a.h.>.b
0000070: 006c 0061 0068 003c 002f 0062 006c 0061 .l.a.h.<./.b.l.a
0000080: 0068 003e 000d 0a00 3c00 2f00 6400 6100 .h.>....<./.d.a.
0000090: 7400 6100 3e00 0d0a                                        t.a.>...

0000000: 003c 003f 0078 006d 006c 0020 0076 0065 .<.?.x.m.l. .v.e
0000010: 0072 0073 0069 006f 006e 003d 0027 0031 .r.s.i.o.n.=.'.1
0000020: 002e 0030 0027 0020 0065 006e 0063 006f ...0.'. .e.n.c.o
0000030: 0064 0069 006e 0067 003d 0027 0075 0074 .d.i.n.g.=.'.u.t
0000040: 0066 002d 0031 0036 0027 003f 003e 000d .f.-.1.6.'.?.>..
0000050: 0a00 3c00 6400 6100 7400 6100 3e00 0d0a ..<.d.a.t.a.>...
0000060: 0009 003c 0062 006c 0061 0068 003e 0062 ...<.b.l.a.h.>.b
0000070: 006c 0061 0068 003c 002f 0062 006c 0061 .l.a.h.<./.b.l.a
0000080: 0068 003e 000d 0a00 3c00 2f00 6400 6100 .h.>....<./.d.a.
0000090: 7400 6100 3e00 0d0a                                        t.a.>...

Not only does it convert the newline characters (which it shouldn't), but it doesn't respect the utf-16 encoding (not that I would expect it to know that it should, it's just a dumb FTP pipe). The result is unreadable without further processing to realign the bytes. I would just use ASCIImode, but my application will also be moving realbinary data (mp3 files and jpeg images) across the same pipe. Using the BINARYtransfer mode on these binary files also causes them to have random 0x0ds injected into their contents, which can't safely be removed since the binary data often contains legitimate 0x0d0asequences. If I use ASCIImode on these files, then the "clever" FTPClient converts these 0x0d0as into 0x0aleaving the file inconsistent no matter what I do.

它不仅转换换行符(它不应该转换),而且它不尊重 utf-16 编码(不是我希望它知道它应该,它只是一个愚蠢的 FTP 管道)。如果不进一步处理以重新对齐字节,则结果是不可读的。我只会使用ASCII模式,但我的应用程序还将通过同一管道移动真实的二进制数据(mp3 文件和 jpeg 图像)。BINARY对这些二进制文件使用传输模式还会导致它们将随机0x0ds 注入到它们的内容中,由于二进制数据通常包含合法0x0d0a序列,因此无法安全地将其删除。如果我ASCII在这些文件上使用模式,那么“聪明”0x0d0a0x0a

I guess my question(s) is(are): does anyone know of any good FTP libraries for java that just move the damned bytes from there to here, or am I going to have to hack up apache commons-net-2.0 and maintain my own FTP client code just for this simple application? Has anyone else dealt with this bizarre behavior? Any suggestions would be appreciated.

我想我的问题是(是):有没有人知道任何用于 Java 的好的 FTP 库只是将这些该死的字节从那里移动到这里,或者我将不得不破解 apache commons-net-2.0 并维护我自己的 FTP 客户端代码只是为了这个简单的应用程序?有没有其他人处理过这种奇怪的行为?任何建议,将不胜感激。

I checked out the commons-net source code and it doesn't look like it's responsible for the weird behavior when BINARYmode is used. But the InputStreamit's reading from in BINARYmode is just a java.io.BufferedInptuStreamwrapped around a socket InputStream. Do these lower level java streams ever do any weird byte-manipulation? I would be shocked if they did, but I don't see what else could be going on here.

我检查了 commons-net 源代码,它看起来不像BINARY是使用 mode时的奇怪行为。但是InputStream它从 inBINARY模式读取只是一个java.io.BufferedInptuStream包裹在 socket 上InputStream。这些较低级别的 java 流是否做过任何奇怪的字节操作?如果他们这样做了,我会感到震惊,但我不知道这里还会发生什么。

EDIT 1:

编辑 1:

Here's a minimal piece of code that mimics what I'm doing to download the file. To compile, just do

这是一段模拟我正在做的下载文件的最小代码。要编译,只需执行

javac -classpath /path/to/commons-net-2.0.jar Main.java

To run, you'll need directories /tmp/ascii and /tmp/binary for the file to download to, as well as an ftp site set up with the file sitting in it. The code will also need to be configured with the appropriate ftp host, username and password. I put the file on my testing ftp site under the test/ folder and called the file test.xml. The test file should at least have more than one line, and be utf-16 encoded (this may not be necessary, but will help to recreate my exact situation). I used vim's :set fileencoding=utf-16command after opening a new file and entered the xml text referenced above. Finally, to run, just do

要运行,您需要目录 /tmp/ascii 和 /tmp/binary 用于将文件下载到,以及设置了文件的 ftp 站点。该代码还需要使用适当的 ftp 主机、用户名和密码进行配置。我将该文件放在我的测试 ftp 站点上的 test/ 文件夹下并命名为文件 test.xml。测试文件至少应该多于一行,并且是 utf-16 编码的(这可能不是必需的,但有助于重现我的确切情况)。我:set fileencoding=utf-16在打开一个新文件并输入上面引用的 xml 文本后使用了 vim 的命令。最后,要运行,只需执行

java -cp .:/path/to/commons-net-2.0.jar Main

Code:

代码:

(NOTE: this code modified to use custom FTPClient object, linked below under "EDIT 2")

(注意:此代码已修改为使用自定义 FTPClient 对象,链接在下面的“EDIT 2”下)

import java.io.*;
import java.util.zip.CheckedInputStream;
import java.util.zip.CheckedOutputStream;
import java.util.zip.CRC32;
import org.apache.commons.net.ftp.*;

public class Main implements java.io.Serializable
{
    public static void main(String[] args) throws Exception
    {
        Main main = new Main();
        main.doTest();
    }

    private void doTest() throws Exception
    {
        String host = "ftp.host.com";
        String user = "user";
        String pass = "pass";

        String asciiDest = "/tmp/ascii";
        String binaryDest = "/tmp/binary";

        String remotePath = "test/";
        String remoteFilename = "test.xml";

        System.out.println("TEST.XML ASCII");
        MyFTPClient client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
        File path = new File("/tmp/ascii");
        downloadFTPFileToPath(client, "test/", "test.xml", path);
        System.out.println("");

        System.out.println("TEST.XML BINARY");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
        path = new File("/tmp/binary");
        downloadFTPFileToPath(client, "test/", "test.xml", path);
        System.out.println("");

        System.out.println("TEST.MP3 ASCII");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.ASCII_FILE_TYPE);
        path = new File("/tmp/ascii");
        downloadFTPFileToPath(client, "test/", "test.mp3", path);
        System.out.println("");

        System.out.println("TEST.MP3 BINARY");
        client = createFTPClient(host, user, pass, org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);
        path = new File("/tmp/binary");
        downloadFTPFileToPath(client, "test/", "test.mp3", path);
    }

    public static File downloadFTPFileToPath(MyFTPClient ftp, String remoteFileLocation, String remoteFileName, File path)
        throws Exception
    {
        // path to remote resource
        String remoteFilePath = remoteFileLocation + "/" + remoteFileName;

        // create local result file object
        File resultFile = new File(path, remoteFileName);

        // local file output stream
        CheckedOutputStream fout = new CheckedOutputStream(new FileOutputStream(resultFile), new CRC32());

        // try to read data from remote server
        if (ftp.retrieveFile(remoteFilePath, fout)) {
            System.out.println("FileOut: " + fout.getChecksum().getValue());
            return resultFile;
        } else {
            throw new Exception("Failed to download file completely: " + remoteFilePath);
        }
    }

    public static MyFTPClient createFTPClient(String url, String user, String pass, int type)
        throws Exception
    {
        MyFTPClient ftp = new MyFTPClient();
        ftp.connect(url);
        if (!ftp.setFileType( type )) {
            throw new Exception("Failed to set ftpClient object to BINARY_FILE_TYPE");
        }

        // check for successful connection
        int reply = ftp.getReplyCode();
        if (!FTPReply.isPositiveCompletion(reply)) {
            ftp.disconnect();
            throw new Exception("Failed to connect properly to FTP");
        }

        // attempt login
        if (!ftp.login(user, pass)) {
            String msg = "Failed to login to FTP";
            ftp.disconnect();
            throw new Exception(msg);
        }

        // success! return connected MyFTPClient.
        return ftp;
    }

}

EDIT 2:

编辑2:

Okay I followed the CheckedXputStreamadvice and here are my results. I made a copy of apache's FTPClientcalled MyFTPClient, and I wrapped both the SocketInputStreamand the BufferedInputStreamin a CheckedInputStreamusing CRC32checksums. Furthermore, I wrapped the FileOutputStreamthat I give to FTPClientto store the output in a CheckOutputStreamwith CRC32checksum. The code for MyFTPClient is posted hereand I've modified the above test code to use this version of the FTPClient (tried to post a gist URL to the modified code, but I need 10 reputation points to post more than one URL!), test.xmland test.mp3and the results were thus:

好的,我遵循了CheckedXputStream建议,这是我的结果。我复制了 apache 的FTPClient名为MyFTPClient,并且使用校验SocketInputStream和将the和 the都包装BufferedInputStream在了一个CheckedInputStreamCRC32。此外,我包裹FileOutputStream,我给FTPClient到输出存储在CheckOutputStreamCRC32校验和。MyFTPClient 的代码发布在这里,我修改了上面的测试代码以使用这个版本的 FTPClient(尝试将 gist URL 发布到修改后的代码,但我需要 10 个信誉点才能发布多个 URL!),test.xml并且test.mp3,结果是这样的:

14:00:08,644 DEBUG [main,TestMain] TEST.XML ASCII
14:00:08,919 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
14:00:08,919 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
14:00:08,954 DEBUG [main,FTPUtils] FileOut CRC32: 866869773

14:00:08,955 DEBUG [main,TestMain] TEST.XML BINARY
14:00:09,270 DEBUG [main,MyFTPClient] Socket CRC32: 2739864033
14:00:09,270 DEBUG [main,MyFTPClient] Buffer CRC32: 2739864033
14:00:09,310 DEBUG [main,FTPUtils] FileOut CRC32: 2739864033

14:00:09,310 DEBUG [main,TestMain] TEST.MP3 ASCII
14:00:10,635 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
14:00:10,635 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
14:00:10,636 DEBUG [main,FTPUtils] FileOut CRC32: 2352009735

14:00:10,636 DEBUG [main,TestMain] TEST.MP3 BINARY
14:00:11,482 DEBUG [main,MyFTPClient] Socket CRC32: 60615183
14:00:11,482 DEBUG [main,MyFTPClient] Buffer CRC32: 60615183
14:00:11,483 DEBUG [main,FTPUtils] FileOut CRC32: 60615183

This makes, basically zero sense whatsoever because here are the md5sums of the corresponsing files:

这基本上是零意义,因为这里是相应文件的 md5sums:

bf89673ee7ca819961442062eaaf9c3f  ascii/test.mp3
7bd0e8514f1b9ce5ebab91b8daa52c4b  binary/test.mp3
ee172af5ed0204cf9546d176ae00a509  original/test.mp3

104e14b661f3e5dbde494a54334a6dd0  ascii/test.xml
36f482a709130b01d5cddab20a28a8e8  binary/test.xml
104e14b661f3e5dbde494a54334a6dd0  original/test.xml

I'm at a loss. I swearI haven't permuted the filenames/paths at any point in this process, and I've triple-checked every step. It must be something simple, but I haven't the foggiest idea where to look next. In the interest of practicality I'm going to proceed by calling out to the shell to do my FTP transfers, but I intend to pursue this until I understand what the hell is going on. I'll update this thread with my findings, and I'll continue to appreciate any contributions anyone may have. Hopefully this will be useful to someone at some point!

我不知所措。我发誓在此过程中的任何时候我都没有改变文件名/路径,并且我对每一步都进行了三重检查。这一定很简单,但我不知道下一步该往哪里看。出于实用性考虑,我将继续调用 shell 进行 FTP 传输,但我打算继续执行此操作,直到我明白到底发生了什么。我将用我的发现更新这个线程,我将继续感谢任何人可能做出的任何贡献。希望这在某些时候对某人有用!

回答by Stephen C

It sounds to me as if your application code might have got the selection of ASCII and BINARY mode inverted. ASCII is coming through unchanged, BINARY performing end-of-line character translations is the exact oppositeof how FTP is supposed to work.

在我看来,好像您的应用程序代码可能颠倒了 ASCII 和 BINARY 模式的选择。ASCII 不变,执行行尾字符转换的 BINARYFTP 的工作方式完全相反

If that is not the problem, please edit your question to add the relevant part of your code.

如果这不是问题,请编辑您的问题以添加代码的相关部分。

EDIT

编辑

A couple of other possible (but IMO unlikely) explanations:

其他几种可能(但 IMO 不太可能)的解释:

  • The FTP server is broken / misconfigured. (Can you successfully download the file in ASCII / BINARY mode using a non-Java command-line FTP utility?)
  • You are talking to the FTP server via a proxy that is broken or misconfigured.
  • You've somehow managed to get hold of a dodgy (hacked) copy of the Apache FTP client JAR file. (Yea, yea, very unlikely ...)
  • FTP 服务器损坏/配置错误。(您能否使用非 Java 命令行 FTP 实用程序以 ASCII / BINARY 模式成功下载文件?)
  • 您正在通过损坏或配置错误的代理与 FTP 服务器通信。
  • 您以某种方式设法获得了 Apache FTP 客户端 JAR 文件的狡猾(被黑)副本。(是的,是的,非常不可能......)

回答by Sven

After login to the ftp server

登录ftp服务器后

ftp.setFileType(FTP.BINARY_FILE_TYPE);

The line below doesn't solve it:

下面的行不能解决它:

//ftp.setFileTransferMode(org.apache.commons.net.ftp.FTP.BINARY_FILE_TYPE);

回答by Vivek Kumar

I found that Apache retrieveFile(...) sometimes did not work with File Sizes exceeding a certain limit. To overcome that I would used retrieveFileStream() instead. Prior to download I have set the Correct FileType and set the Mode to PassiveMode

我发现 Apache retrieveFile(...) 有时无法处理超过特定限制的文件大小。为了克服这个问题,我将使用retrieveFileStream() 代替。在下载之前,我已经设置了正确的文件类型并将模式设置为 PassiveMode

So the code will look like

所以代码看起来像

    ....
    ftpClientConnection.setFileType(FTP.BINARY_FILE_TYPE);
    ftpClientConnection.enterLocalPassiveMode();
    ftpClientConnection.setAutodetectUTF8(true);

    //Create an InputStream to the File Data and use FileOutputStream to write it
    InputStream inputStream = ftpClientConnection.retrieveFileStream(ftpFile.getName());
    FileOutputStream fileOutputStream = new FileOutputStream(directoryName + "/" + ftpFile.getName());
    //Using org.apache.commons.io.IOUtils
    IOUtils.copy(inputStream, fileOutputStream);
    fileOutputStream.flush();
    IOUtils.closeQuietly(fileOutputStream);
    IOUtils.closeQuietly(inputStream);
    boolean commandOK = ftpClientConnection.completePendingCommand();
    ....