是否可以使用 Linux 命令仅从 HTTP 服务器读取前 N 个字节？

Question

提问by hahakubile

Here is the question.

这是问题。

Given the url http://www.example.com, can we read the first N bytes out of the page?

给定 url http://www.example.com，我们可以从页面中读取前 N 个字节吗？

using wget, we can download the whole page.
using curl, there is -r, 0-499 specifies the first 500 bytes. Seems solve the problem.
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
using urlibin python. similar question here, but according to Konstantin's comment, is that really true?
Last time I tried this technique it failed because it was actually impossible to read from the HTTP server only specified amount of data, i.e. you implicitly read all HTTP response and only then read first N bytes out of it. So at the end you ended up downloading the whole 1Gb malicious response.

使用wget，我们可以下载整个页面。
使用curl，有 -r, 0-499 指定前 500 个字节。似乎解决了问题。
您还应该知道，许多 HTTP/1.1 服务器没有启用此功能，因此当您尝试获取范围时，您将获取整个文档。
在 python 中使用urlib。类似的问题here，但根据康斯坦丁的评论，这是真的吗？
上次我尝试这种技术时它失败了，因为实际上不可能仅从 HTTP 服务器读取指定数量的数据，即您隐式读取所有 HTTP 响应，然后才从中读取前 N 个字节。所以最后你下载了整个 1Gb 的恶意响应。

So the problem is that how can we read the first N bytes from the HTTP server in practice?

那么问题来了，实际中如何从HTTP服务器读取前N个字节呢？

Regards & Thanks

问候和感谢

Answer 1

采纳答案by sehe

curl <url> | head -c 499

or

或者

curl <url> | dd bs=1 count=499

should do

应该做

Also there are simpler utils with perhaps borader availability like

还有一些更简单的实用程序，可能具有更广泛的可用性，例如

    netcat host 80 <<"HERE" | dd count=499 of=output.fragment
GET /urlpath/query?string=more&bloddy=stuff

HERE

Or

或者

GET /urlpath/query?string=more&bloddy=stuff

Answer 2

回答by Adam Dymitruk

Make a socket connection. Read the bytes you want. Close, and you're done.

进行套接字连接。读取你想要的字节。关闭，你就完成了。

Answer 3

回答by Uxio

You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.

您还应该知道，许多 HTTP/1.1 服务器没有启用此功能，因此当您尝试获取范围时，您将获取整个文档。

You will have to get the whole web anyways, so you can get the web with curl and pipe it to head, for example.

无论如何，您必须获得整个网络，例如，您可以使用 curl 获得网络并将其通过管道连接到头部。

head
c, --bytes=[-]N print the first N bytes of each file; with the leading '-', print all but the last N bytes of each file

头
c、--bytes=[-]N 打印每个文件的前N个字节；以“-”开头，打印每个文件的最后 N 个字节以外的所有字节

Answer 4

回答by Anton Balashov

You can do it natively by the next curl command (no need to donwload whole document). According to culr man page:

您可以通过下一个 curl 命令本地完成（无需下载整个文档）。根据 culr 手册页：

RANGES HTTP 1.1 introduced byte-ranges. Using this, a client can request to get only one or more subparts of a specified document. curlsupports this with the -rflag.
Get the first 100 bytes of a document:
    curl -r 0-99 http://www.get.this/

Get the last 500 bytes of a document:  
    curl -r -500 http://www.get.this/

`curl` also supports simple ranges for FTP files as well.
Then you can only specify start and stop position.

Get the first 100 bytes of a document using FTP:
    curl -r 0-99 ftp://www.get.this/README

范围 HTTP 1.1 引入了字节范围。使用它，客户端可以请求仅获取指定文档的一个或多个子部分。curl用-r标志支持这一点。
Get the first 100 bytes of a document:
    curl -r 0-99 http://www.get.this/

Get the last 500 bytes of a document:  
    curl -r -500 http://www.get.this/

`curl` also supports simple ranges for FTP files as well.
Then you can only specify start and stop position.

Get the first 100 bytes of a document using FTP:
    curl -r 0-99 ftp://www.get.this/README

It works for me even with Java web app that deployed to GigaSpaces.

即使使用部署到 GigaSpaces 的 Java Web 应用程序，它也适用于我。

Answer 5

回答by Luc

I came here looking for a way to time the server's processing time, which I thought I could measure by telling curl to stop downloading after 1 byte or something.

我来到这里是为了寻找一种对服务器处理时间进行计时的方法，我认为我可以通过告诉 curl 在 1 个字节或其他内容后停止下载来衡量。

For me, the better solution turned out to be to do a HEAD request, since this usually lets the server process the request as normal but does not return any response body:

对我来说，更好的解决方案是执行 HEAD 请求，因为这通常让服务器正常处理请求但不返回任何响应正文：

time curl --head <URL>

是否可以使用 Linux 命令仅从 HTTP 服务器读取前 N 个字节？

提问by hahakubile

采纳答案by sehe

回答by Adam Dymitruk

回答by Uxio

回答by Anton Balashov

回答by Luc

相关推荐

最近更新

标签

是否可以使用 Linux 命令仅从 HTTP 服务器读取前 N 个字节？

提问by hahakubile

采纳答案by sehe

回答by Adam Dymitruk

回答by Uxio

回答by Anton Balashov

回答by Luc

相关推荐

linux读取目录内容

Linux 为什么 CUDA 固定内存这么快？

在 C#/.NET 中强制 CRLF 的快速方法是什么？

Linux Eclipse 的任何替代品？

相关推荐

最近更新

标签