java 如何存储可能包含二进制数据的 Http 响应?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5777503/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 12:43:38  来源:igfitidea点击:

How to store an Http Response that may contain binary data?

javahttpencodinghttpresponse

提问by Amir Rachum

As I described in a previous question, I have an assignment to write a proxy server. It partially works now, but I still have a problem with handling of gzipped information. I store the HttpResponse in a String, and it appears I can't do that with gzipped content. However, the headers are text which I need to parse, and they all come from the same InputStream. My question is, what do I have to do in order to correctly handle binary responses, while still parsing the headers as strings?

正如我在上一个问题中所描述的,我有一个编写代理服务器的任务。它现在部分工作,但我仍然有处理 gzip 信息的问题。我将 HttpResponse 存储在一个字符串中,看起来我不能用 gzip 压缩的内容来做到这一点。但是,标题是我需要解析的文本,它们都来自同一个InputStream. 我的问题是,我该怎么做才能正确处理二进制响应,同时仍将标头解析为字符串?

>> Please see the edit below before you look at the code.

>> 在查看代码之前,请先查看下面的编辑。

Here's the Responseclass implementation:

这是Response类的实现:

public class Response {
    private String fullResponse = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {
        reader = new BufferedReader(new InputStreamReader(input));
        try {
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                fullResponse += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                }
            }
            reader.close();
            fullResponse = "\r\n" + fullResponse.trim() + "\r\n\r\n";
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

    public CacheControl getCacheControl() {
        return cacheControl;
    }

    public String getFullResponse() {
        return fullResponse;
    }

    public boolean isBusy() {
        return busy;
    }

    public int getResponseCode() {
        return responseCode;
    }

    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result
                + ((fullResponse == null) ? 0 : fullResponse.hashCode());
        return result;
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj)
            return true;
        if (obj == null)
            return false;
        if (!(obj instanceof Response))
            return false;
        Response other = (Response) obj;
        if (fullResponse == null) {
            if (other.fullResponse != null)
                return false;
        } else if (!fullResponse.equals(other.fullResponse))
            return false;
        return true;
    }

    @Override
    public String toString() {
        return "Response\n==============================\n" + fullResponse;
    }
}

And here's HttpPatterns:

这是HttpPatterns

public enum HttpPatterns {
    RESPONSE_CODE("^HTTP/1\.1 (\d+) .*$"),
    CACHE_CONTROL("^Cache-Control: (\w+)$"),
    HOST("^Host: (\w+)$"),
    REQUEST_HEADER("(GET|POST) ([^\s]+) ([^\s]+)$"),
    ACCEPT_ENCODING("^Accept-Encoding: .*$");

    private final Pattern pattern;

    HttpPatterns(String regex) {
        pattern = Pattern.compile(regex);
    }

    public boolean matches(String expression) {
        return pattern.matcher(expression).matches();
    }

    public Object process(String expression) {
        Matcher matcher = pattern.matcher(expression);
        if (!matcher.matches()) {
            throw new RuntimeException("Called `process`, but the expression doesn't match. Call `matches` first.");
        }

        if (this == RESPONSE_CODE) {
            return Integer.parseInt(matcher.group(1));
        } else if (this == CACHE_CONTROL) {
            return CacheControl.parseString(matcher.group(1));
        } else if (this == HOST) {
            return matcher.group(1);
        } else if (this == REQUEST_HEADER) {
            return new RequestHeader(RequestType.parseString(matcher.group(1)), matcher.group(2), matcher.group(3));
        } else { //never happens
            return null;
        }
    }


}


EDIT

编辑

I tried implementing according the suggestions, but it's not working and I'm becoming desperate. When I try to view an image I get the following message from the browser:

我尝试根据建议实施,但它不起作用,我变得绝望了。当我尝试查看图像时,我从浏览器收到以下消息:

The image “http://www.google.com/images/logos/ps_logo2.png” cannot be displayed because it contains errors.

图片“ http://www.google.com/images/logos/ps_logo2.png”无法显示,因为它包含错误。

Here's the log:

这是日志:

Request
==============================

GET http://www.google.com/images/logos/ps_logo2.png HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20100101 Firefox/4.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Cookie: PREF=ID=31f95dd7f42dfc7d:TM=1303507626:LM=1303507626:S=D4kIZ6rGFrlOUWlm


Not Reading from the Cache!!!!
I am going to try to connect to: www.google.com at port 80
Connected.
Writing to the server's buffer...
flushed.
Getting a response...
Got a binary response!


contentLength = 26209; headers.length() = 312; responseLength = 12136; fullResponse length = 12136


Got a response!

Writing to the Cache!!!!
I am going to write the following response:

HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 05 Aug 2010 22:54:44 GMT
Date: Wed, 04 May 2011 15:05:30 GMT
Expires: Wed, 04 May 2011 15:05:30 GMT
Cache-Control: private, max-age=31536000
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 26209
X-XSS-Protection: 1; mode=block

 Response body is binary and was truncated.
Finished with request!

Here's the new Responseclass:

这是新Response课程:

public class Response {
    private String headers = "";
    private BufferedReader reader;
    private boolean busy = true;
    private int responseCode;
    private CacheControl cacheControl;
    private InputStream fullResponse;
    private ContentEncoding encoding = ContentEncoding.TEXT;
    private ContentType contentType = ContentType.TEXT;
    private int contentLength;

    public Response(String input) {
        this(new ByteArrayInputStream(input.getBytes()));
    }

    public Response(InputStream input) {

        ByteArrayOutputStream tempStream = new ByteArrayOutputStream();
        InputStreamReader inputReader = new InputStreamReader(input);
        try {
            while (!inputReader.ready());
            int responseLength = 0;
            while (inputReader.ready()) {
                tempStream.write(inputReader.read());
                responseLength++;
            }
            /*
             * Read the headers
             */
            reader = new BufferedReader(new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray())));
            while (!reader.ready());//wait for initialization.

            String line;
            while ((line = reader.readLine()) != null) {
                headers += "\r\n" + line;

                if (HttpPatterns.RESPONSE_CODE.matches(line)) {
                    responseCode = (Integer) HttpPatterns.RESPONSE_CODE.process(line);
                } else if (HttpPatterns.CACHE_CONTROL.matches(line)) {
                    cacheControl = (CacheControl) HttpPatterns.CACHE_CONTROL.process(line);
                } else if (HttpPatterns.CONTENT_ENCODING.matches(line)) {
                    encoding = (ContentEncoding) HttpPatterns.CONTENT_ENCODING.process(line);
                } else if (HttpPatterns.CONTENT_TYPE.matches(line)) {
                    contentType = (ContentType) HttpPatterns.CONTENT_TYPE.process(line);
                } else if (HttpPatterns.CONTENT_LENGTH.matches(line)) {
                    contentLength = (Integer) HttpPatterns.CONTENT_LENGTH.process(line);
                } else if (line.isEmpty()) {
                    break;
                }
            }

            InputStreamReader streamReader = new InputStreamReader(new ByteArrayInputStream(tempStream.toByteArray()));
            while (!reader.ready());//wait for initialization.
            //Now let's get the rest
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            int counter = 0;
            while (streamReader.ready() && counter < (responseLength - contentLength)) {
                out.write((char) streamReader.read());
                counter++;
            }
            if (encoding == ContentEncoding.BINARY || contentType == ContentType.BINARY) {
                System.out.println("Got a binary response!");
                while (streamReader.ready()) {
                    out.write(streamReader.read());
                }
            } else {
                System.out.println("Got a text response!");
                while (streamReader.ready()) {
                    out.write((char) streamReader.read());
                }
            }
            fullResponse = new ByteArrayInputStream(out.toByteArray());

            System.out.println("\n\ncontentLength = " + contentLength + 
                    "; headers.length() = " + headers.length() + 
                    "; responseLength = " + responseLength + 
                    "; fullResponse length = " + out.toByteArray().length + "\n\n");

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
        busy = false;
    }

}

}

and here's the ProxyServerclass:

这是ProxyServer课程:

class ProxyServer {
    public void start() {
        while (true) {
            Socket serverSocket;
            Socket clientSocket;
            OutputStreamWriter toClient;
            BufferedWriter toServer;
            try {
                //The client is meant to put data on the port, read the socket.
                clientSocket = listeningSocket.accept();
                Request request = new Request(clientSocket.getInputStream());
                //System.out.println("Accepted a request!\n" + request);
                while(request.busy);
                //Make a connection to a real proxy.
                //Host & Port - should be read from the request
                URL url = null;
                try {
                    url = new URL(request.getRequestURL());
                } catch (MalformedURLException e){
                    url = new URL("http:\"+request.getRequestHost()+request.getRequestURL());
                }

                System.out.println(request);

                //remove entry from cache if needed
                if (!request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    cache.remove(request);
                }

                Response response = null;

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && cache.containsRequest(request)) {
                    System.out.println("Reading from the Cache!!!!");
                    response = cache.get(request);
                } else {
                    System.out.println("Not Reading from the Cache!!!!");
                    //Get the response from the destination
                    int remotePort = (url.getPort() == -1) ? 80 : url.getPort();
                    System.out.println("I am going to try to connect to: " + url.getHost() + " at port " + remotePort);
                    serverSocket = new Socket(url.getHost(), remotePort);
                    System.out.println("Connected.");
                    serverSocket.setSoTimeout(50000);

                    //write to the server - keep it open.
                    System.out.println("Writing to the server's buffer...");
                    toServer = new BufferedWriter(new OutputStreamWriter(serverSocket.getOutputStream()));
                    toServer.write(request.getFullRequest());
                    toServer.flush();
                    System.out.println("flushed.");

                    System.out.println("Getting a response...");
                    response = new Response(serverSocket.getInputStream());
                    //System.out.println("Got a response!\n" + response);
                    System.out.println("Got a response!\n");
                    //wait for the response
                    while(response.isBusy());
                }

                if (request.getRequestType() == RequestType.GET && request.getCacheControl().equals(CacheControl.CACHE) && response.getResponseCode() == 200) {
                    System.out.println("Writing to the Cache!!!!");
                    cache.put(request, response);
                }
                else System.out.println("Not Writing to the Cache!!!!");
                response = filter.filter(response);

                // Return the response to the client
                toClient = new OutputStreamWriter(clientSocket.getOutputStream());
                System.out.println("I am going to write the following response:\n" + response);
                BufferedReader responseReader = new BufferedReader(new InputStreamReader(response.getFullResponse()));
                while (responseReader.ready()) {
                    toClient.write(responseReader.read());
                }
                toClient.flush();
                toClient.close();
                clientSocket.close();
                System.out.println("Finished with request!");

            } catch (IOException e) {
                e.printStackTrace();
                continue;
            }
        }
   }
}

I would appreciate any and all feedback/insight/suggestion regarding how to solve this, and would of course prefer some actual code.

我将不胜感激关于如何解决这个问题的任何和所有反馈/见解/建议,当然更喜欢一些实际的代码。

采纳答案by vbence

Store it in a byte array:

将其存储在字节数组中:

byte[] bufer = new byte[???];

A more detailed process:

更详细的过程:

  • Create a buffer large enough for the response header (and drop exception if it is bigger).
  • Read bytes to the buffer until you find \r\n\r\nin the buffer. You can write a helper function for example static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
  • When you encounter the end of header, create a strinform the first nbytes of the buffer. You can then use RegEx on this strng (also note that RegEx is not the best method to parse HTTPeaders).
  • Be prepared that the buffer will contain additional data after the header, which are the first bytes of the response body. You have to copy these bytes to the output stream or output file or output buffer.
  • Read the restof the response body. (Until content-lengthis read or stream is closed).
  • 为响应头创建一个足够大的缓冲区(如果它更大,则丢弃异常)。
  • 将字节读取到缓冲区,直到\r\n\r\n在缓冲区中找到为止。例如,您可以编写一个辅助函数static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle)
  • 当您遇到标头结尾时,创建一个 strinform 缓冲区的前n个字节。然后,您可以在此字符串上使用 RegEx(另请注意,RegEx 不是解析 HTTPeaders 的最佳方法)。
  • 准备好缓冲区将包含标头之后的附加数据,即响应正文的第一个字节。您必须将这些字节复制到输出流或输出文件或输出缓冲区。
  • 阅读响应正文的其余部分。(直到内容长度被读取或流被关闭)。

Edit:

编辑:

You are not following these steps I suggested. inputReader.ready()is a wrong way to detect the phases of the response. There is no guarantee that the header will be sent in a single burst.

您没有遵循我建议的这些步骤。inputReader.ready()是检测响应阶段的错误方法。无法保证将在单个突发中发送标头。

I tried to write a schematics in code (except the arrayIndexOf) function.

我试图用代码(arrayIndexOf 除外)函数编写原理图。

InputStream is;

// Create a buffer large enough for the response header (and drop exception if it is bigger).
byte[] headEnd = {13, 10, 13, 10}; // \r \n \r \n
byte[] buffer = new byte[10 * 1024];
int length = 0;

// Read bytes to the buffer until you find `\r\n\r\n` in the buffer. 
int bytes = 0;
int pos;
while ((pos = arrayIndexOf(buffer, 0, length, headEnd)) == -1 && (bytes = is.read(buffer, length, buffer.length() - length)) > -1) {
    length += bytes;

    // buffer is full but have not found end siganture
    if (length == buffer.length())
        throw new RuntimeException("Response header too long");
}

// pos contains the starting index of the end signature (\r\n\r\n) so we add 4 bytes
pos += 4;

// When you encounter the end of header, create a strinform the first *n* bytes
String header = new String(buffer, 0, pos);

System.out.println(header);

// Be prepared that the buffer will contain additional data after the header
// ... so we process it
System.out.write(buffer, pos, length - pos);

// process the rest until connection is closed
while (bytes = is.read(buffer, 0, bufer.length())) {
    System.out.write(buffer, 0, bytes);
}

The arrayIndexOfmethod could look something like this: (there are probably faster versions)

arrayIndexOf方法可能如下所示:(可能有更快的版本)

public static int arrayIndexOf(byte[] haystack, int offset, int length, byte[] needle) {
    for (int i=offset; i<offset+length-nedle.length(); i++) {
        boolean match = false;
        for (int j=0; j<needle.length(); j++) {
            match = haystack[i + j] == needle[j];
            if (!match)
                break;
        }
        if (match)
            return i;
    }
    return -1;
}

回答by Jon Skeet

You basically need to parse the response headers as text, and the rest as binary. It's slightly tricky to do so, as you can't just create an InputStreamReaderaround the stream - that will read more data than you want. You'll quite possibly need to read data into a byte array and then call Encoding.GetStringmanually. Alternatively, if you've read data into a byte array already you could always create a ByteArrayInputStreamaround that, then an InputStreamReaderon top... but you'll need to work out how far the headers go before you get to the body of the response, which you should keep as binary data.

您基本上需要将响应头解析为文本,其余解析为二进制。这样做有点棘手,因为您不能只InputStreamReader在流周围创建一个- 它会读取比您想要的更多的数据。您很可能需要将数据读入字节数组,然后Encoding.GetString手动调用。或者,如果您已经将数据读入字节数组,您可以始终ByteArrayInputStream围绕它创建一个,然后InputStreamReader在顶部创建一个......但是您需要计算出标题在到达响应正文之前的距离,您应该将其保留为二进制数据。

回答by yves amsellem

Jersey— a high level web framework — may save your day. You don't have to manage gzip content, header, etc, yourself anymore.

Jersey——一个高级网络框架——可能会拯救你的一天。您不必再自己管理 gzip 内容、标题等。

The following code gets the image used for your example and save it to disk. Then it verifies the saved image is equal to the downloaded one:

以下代码获取用于示例的图像并将其保存到磁盘。然后它验证保存的图像是否等于下载的图像:

import com.google.common.io.ByteStreams;
import com.google.common.io.Files;
import com.sun.jersey.api.client.Client;
import com.sun.jersey.api.client.ClientResponse;

@Test
public void test() throws IOException {
    String filename = "ps_logo2.png";
    String url = "http://www.google.com/images/logos/" + filename;
    File file = new File(filename);

    WebResource resource = Client.create().resource(url);
    ClientResponse response = resource.get(ClientResponse.class);
    InputStream stream = response.getEntityInputStream();
    byte[] bytes = ByteStreams.toByteArray(stream);
    Files.write(bytes, file);

    assertArrayEquals(bytes, Files.toByteArray(file));
}

You will need two maven dependencies to run it:

您将需要两个 Maven 依赖项来运行它:

<dependency>
    <groupId>com.sun.jersey</groupId>
    <artifactId>jersey-client</artifactId>
    <version>1.6</version>
</dependency>
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>r08</version>
</dependency>

回答by Mauricio Corrêa

I had the same problem. I commented the line which adds the header accept gzip:

我有同样的问题。我评论了添加标题接受 gzip 的行:

con.setRequestProperty("Accept-Encoding","gzip, deflate");

...and it worked!

...它奏效了!

回答by WhiteFang34

After reading the headers with BufferedReaderyou'll need to detect if the Content-Encodingheader is set to gzip. If it is, to read the body you'll have to switch to using the InputStreamand wrap it with a GZIPInputStreamto decode the body. The tricky part however is the fact that the BufferedReaderwill have buffered past the headers into the body and the underlying InputStreamwill be ahead of where you need it.

阅读标题后,BufferedReader您需要检测Content-Encoding标题是否设置为gzip. 如果是,要读取正文,您必须切换到使用InputStream并用 a 包装它GZIPInputStream以解码正文。然而,棘手的部分是事实BufferedReader将缓冲超过标题到正文中,而底层InputStream将在您需要的地方之前。

What you could do is wrap the initial InputStreamwith a BufferedInputStreamand call mark()on it before you begin processing the headers. When you're done processing the headers call reset(). Then read that stream until you hit the empty line between headers and the body. Now wrap it with the GZIPInputStreamto process the body.

您可以做的是InputStream用 a包装首字母BufferedInputStreammark()在开始处理标头之前调用它。处理完标头后,调用reset(). 然后读取该流,直到遇到标题和正文之间的空行。现在用 包裹它GZIPInputStream以处理主体。