bash 如何从 unix/linux 上的文件中获取任意块

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1272675/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 21:07:53  来源:igfitidea点击:

How to grab an arbitrary chunk from a file on unix/linux

bashshellunix

提问by kevinm

I'm trying to copy a chunk from one binary file into a new file. I have the byte offset and length of the chunk I want to grab.

我正在尝试将一个二进制文件中的一个块复制到一个新文件中。我有我想要抓取的块的字节偏移量和长度。

I have tried using the ddutility, but this seems to read and discard the data up to the offset, rather than just seeking (I guess because dd is for copying/converting blocks of data). This makes it quite slow (and slower the higher the offset. This is the command I tried:

我曾尝试使用该dd实用程序,但这似乎读取并丢弃了直到偏移量的数据,而不仅仅是寻找(我猜是因为 dd 用于复制/转换数据块)。这使得它非常慢(偏移越大越慢。这是我试过的命令:

dd if=inputfile ibs=1 skip=$offset count=$datalength of=outputfile

I guess I could write a small perl/python/whatever script to open the file, seek to the offset, then read and write the required amount of data in chunks.

我想我可以编写一个小的 perl/python/whatever 脚本来打开文件,寻找偏移量,然后以块的形式读取和写入所需的数据量。

Is there a utility that supports something like this?

是否有支持此类内容的实用程序?

采纳答案by pixelbeat

Yes it's awkward to do this with dd today. We're considering adding skip_bytes and count_bytes params to dd in coreutils to help. The following should work though:

是的,今天用 dd 做这件事很尴尬。我们正在考虑将 skip_bytes 和 count_bytes 参数添加到 coreutils 中的 dd 以提供帮助。以下应该工作:

#!/bin/sh

bs=100000
infile=
skip=
length=

(
  dd bs=1 skip=$skip count=0
  dd bs=$bs count=$(($length / $bs))
  dd bs=$(($length % $bs)) count=1
) < "$infile"

回答by Mark Rushakoff

You can use tail -c+Nto trim the leading N bytes from input, then you can use head -cMto output only the first M bytes from its input.

您可以使用tail -c+N修剪输入的前 N ​​个字节,然后您可以使用head -cM仅输出其输入的前 M 个字节。

$ echo "hello world 1234567890" | tail -c+9 | head -c6
rld 12

So using your variables, it would probably be:

所以使用你的变量,它可能是:

tail -c+$offset inputfile | head -c$datalength > outputfile



啊,没看到就得找了。将其保留为 CW。

回答by kevinm

Thanks for the other answers. Unfortunately, I'm not in a position to install additional software, so the ddrescue option is out. The head/tail solution is interesting (I didn't realise you could supply + to tail), but scanning through the data makes it quite slow.

感谢其他答案。不幸的是,我无法安装其他软件,因此 ddrescue 选项已失效。头/尾解决方案很有趣(我没有意识到你可以提供 + 到尾),但是扫描数据使它变得很慢。

I ended up writing a small python script to do what I wanted. The buffer size should probably be tuned to be the same as some external buffer setting, but using the value below is performant enough on my system.

我最终编写了一个小的 python 脚本来做我想做的事。缓冲区大小可能应该调整为与某些外部缓冲区设置相同,但在我的系统上使用以下值已经足够了。

#!/usr/local/bin/python

import sys

BUFFER_SIZE = 100000

# Read args
if len(sys.argv) < 4:
    print >> sys.stderr, "Usage: %s input_file start_pos length" % (sys.argv[0],)
    sys.exit(1)
input_filename = sys.argv[1]
start_pos = int(sys.argv[2])
length = int(sys.argv[3])

# Open file and seek to start pos
input = open(sys.argv[1])
input.seek(start_pos)

# Read and write data in chunks
while length > 0:
    # Read data
    buffer = input.read(min(BUFFER_SIZE, length))
    amount_read = len(buffer)

    # Check for EOF
    if not amount_read:
        print >> sys.stderr, "Reached EOF, exiting..."
        sys.exit(1)

    # Write data
    sys.stdout.write(buffer)
    length -= amount_read

回答by mark4o

According to manddon FreeBSD:

根据manddFreeBSD

skip=n

Skip nblocks from the beginning of the input before copying. On input which supports seeks, an lseek(2) operation is used.Otherwise, input data is read and discarded. For pipes, the correct number of bytes is read. For all other devices, the correct number of blocks is read without distinguishing between a partial or complete block being read.

skip=n

在复制之前从输入的开头跳过n个块。 在支持搜索的输入上,使用 lseek(2) 操作。否则,读取并丢弃输入数据。对于管道,读取正确的字节数。对于所有其他设备,读取正确数量的块,而不区分正在读取的部分或完整块。

Using dtrussI verified that it does use lseek()on an input file on Mac OS X. If you just think that it is slow then I agree with the comment that this would be due to the 1-byte block size.

使用dtruss我验证它确实用于lseek()Mac OS X 上的输入文件。如果您只是认为它很慢,那么我同意这将是由于 1 字节块大小的评论。

回答by hlovdal

You can use the

您可以使用

--input-position=POS

option of ddrescue.

ddrescue选项。

回答by Saravanan Palanisamy

You can try hexdump command :

您可以尝试使用 hexdump 命令:

 hexdump  -v <File Path> -c -n <No of bytes to read> -s <Start Offset> | awk '{=""; print 
# hexdump  -v -c  mycorefile -n 100 -s 100 | awk '{=""; print 
# /usr/bin/hexdump  -v -C  mycorefile -n 100 -s 100
00000064  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000074  00 00 00 00 01 00 00 00  05 00 00 00 00 10 03 00  |................|
00000084  00 00 00 00 00 00 40 00  00 00 00 00 00 00 00 00  |......@.........|
00000094  00 00 00 00 00 00 00 00  00 00 00 00 00 a0 03 00  |................|
000000a4  00 00 00 00 00 10 00 00  00 00 00 00 01 00 00 00  |................|
000000b4  06 00 00 00 00 10 03 00  00 00 00 00 00 90 63 00  |..............c.|
000000c4  00 00 00 00                                       |....|
000000c8
#
}' | sed 's/ //g' ##代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码## ##代码####代码####代码####代码##01##代码####代码####代码##05##代码####代码####代码####代码##20003##代码## ##代码####代码####代码####代码####代码####代码##@##代码####代码####代码####代码####代码####代码####代码####代码####代码## ##代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码####代码##?003##代码## ##代码####代码####代码####代码####代码##20##代码####代码####代码####代码####代码####代码##01##代码####代码####代码## 006##代码####代码####代码####代码##20003##代码####代码####代码####代码####代码##20c##代码## ##代码####代码####代码####代码##
}' | sed 's/ //g'

Ex.) Read 100 bytes from 'mycorefile' starting from offset 100.

例如)从“mycorefile”中从偏移量 100 开始读取 100 个字节。

##代码##

Then, using another script join all the lines of the output into single line if you want.

然后,如果需要,可以使用另一个脚本将输出的所有行合并为一行。

If you simply want to see the contents :

如果您只是想查看内容:

##代码##