java 实现RESTful大文件上传的正确方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33889410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 22:14:57  来源:igfitidea点击:

Proper way to implement RESTful large file upload

javafilerestcurlfile-upload

提问by Aleksandar Stojadinovic

I've been making REST APIs for some time now, and I'm still bugged with one case - large file upload. I've read a couple of other APIs, like Google Drive, Twitter and other literature, and I got two ideas, but I'm not sure is any of them "proper". As in proper, I mean it is somewhat standardized, there is not too much client logic needed (since other parties will be implementing that client), or even better, it could be easily called with cURL. The plan is to implement it in Java, preferably Play Framework.

我制作 REST API 已经有一段时间了,但我仍然遇到一个案例 - 大文件上传。我已经阅读了一些其他 API,例如 Google Drive、Twitter 和其他文献,我有两个想法,但我不确定它们中的任何一个是否“正确”。正如正确的那样,我的意思是它有点标准化,不需要太多的客户端逻辑(因为其他方将实现该客户端),或者甚至更好,它可以很容易地用 cURL 调用。计划是在 Java 中实现它,最好是 Play Framework。

Obviously I'll need some file partitioning and server-side buffering mechanism since the files are large.

显然我需要一些文件分区和服务器端缓冲机制,因为文件很大。

So, the first solution I've got is a multipart upload (multipart/form-data). I get this way and I have implemented it like this before, but it is always strange to me to actually emulate a form on the client side, especially since the client has to set the file key name, and in my experience, that is something that clients kinda forget or do not understand. Also, how is the chunk size/part size dictated? What keeps the client from putting the whole file in one chunk?

所以,我得到的第一个解决方案是分段上传 ( multipart/form-data)。我是这样理解的,我以前也这样实现过,但是在客户端实际模拟一个表单对我来说总是很奇怪,特别是因为客户端必须设置文件键名,而根据我的经验,这是客户有点忘记或不明白。另外,块大小/部分大小是如何规定的?是什么阻止客户端将整个文件放在一个块中?

Solution two, at least what I understood, but without finding an actual implementation implementation is that a "regular" POST request can work. The content should be chunked and data is buffered on the on the server side. However, I am not sure this is a proper understanding. How is data actually chunked, does the upload span multiple HTTP requests or is it chunked on the TCP level? What is the Content-Type?

解决方案二,至少我的理解是,但没有找到实际的实现实现是“常规” POST 请求可以工作。内容应该被分块,数据在服务器端缓冲。但是,我不确定这是正确的理解。数据实际上是如何分块的,上传是跨越多个 HTTP 请求还是在 TCP 级别分块?是什么Content-Type

Bottom line, what of these two (or anything else?) should be a client-friendly, widely understandable, way of implementing a REST API for file upload?

最重要的是,这两个(或其他任何东西?)应该是一种客户端友好的、广泛易于理解的、实现用于文件上传的 REST API 的方式?

采纳答案by crawfobw

I would recommend taking a look at the Amazon S3 Rest API's solution to multipart file upload. The documentation can be found here.

我建议您查看 Amazon S3 Rest API 的多部分文件上传解决方案。文档可以在这里找到。

To summarize the procedure Amazon uses:

总结亚马逊使用的程序:

  1. The client sends a request to initiate a multipart upload, the API responds with an upload id

  2. The client uploads each file chunk with a part number (to maintain ordering of the file), the size of the part, the md5 hash of the part and the upload id; each of these requests is a separate HTTP request. The API validates the chunk by checking the md5 hash received chunk against the md5 hash the client supplied and the size of the chunk matches the size the client supplied. The API responds with a tag (unique id) for the chunk. If you deploy your API across multiple locations you will need to consider how to store the chunks and later access them in a way that is location transparent.

  3. The client issues a request to complete the upload which contains a list of each chunk number and the associated chunk tag (unique id) received from API. The API validates there are no missing chunks and that the chunk numbers match the correct chunk tag and then assembles the file or returns an error response.

  1. 客户端发送发起分段上传的请求,API 响应上传 id

  2. 客户端上传每个文件块,并带有部分编号(以保持文件的顺序)、部分的大小、部分的 md5 哈希值和上传 id;这些请求中的每一个都是一个单独的 HTTP 请求。API 通过根据客户端提供的 md5 哈希检查接收到的块的 md5 哈希来验证块,并且块的大小与客户端提供的大小匹配。API 使用块的标记(唯一 ID)进行响应。如果您在多个位置部署 API,您将需要考虑如何存储块,然后以位置透明的方式访问它们。

  3. 客户端发出完成上传的请求,其中包含从 API 接收到的每个块编号和关联的块标记(唯一 ID)的列表。API 验证没有丢失的块,并且块编号与正确的块标记匹配,然后组合文件或返回错误响应。

Amazon also supplies methods to abort the upload and list the chunks associated with the upload. You may also want to consider a timeout for the upload request in which the chunks are destroyed if the upload is not completed within a certain amount of time.

亚马逊还提供了中止上传和列出与上传相关的块的方法。如果上传未在一定时间内完成,您可能还需要考虑上传请求的超时时间,其中块将被销毁。

In terms of controlling the chunk sizes that the client uploads, you won't have much control over how the client decides to split up the upload. You could consider having a maximum chunk size configured for the upload and supply error responses for requests that contain chunks larger than the max size.

在控制客户端上传的块大小方面,您无法控制客户端如何决定拆分上传。您可以考虑为上传配置最大块大小,并为包含大于最大大小的块的请求提供错误响应。

I've found the procedure works very well for handling large file uploads in REST APIs and facilitates the handling of the many edge cases associated with file upload. Unfortunately, I've yet to find a library that makes this easy to implement in any language so you pretty much have to write all of the logic yourself.

我发现该过程非常适合处理 REST API 中的大文件上传,并有助于处理与文件上传相关的许多边缘情况。不幸的是,我还没有找到可以用任何语言轻松实现的库,因此您几乎必须自己编写所有逻辑。

回答by Jiss Raphel

https://tus.io/is resumable protocol which helps in chunk uploading and resuming the upload after timeout. This is a opensource implementation and has various client and server implementations already in different languages.

https://tus.io/是可恢复协议,有助于块上传和超时后恢复上传。这是一个开源实现,已经有不同语言的各种客户端和服务器实现。