在 BASH 中按字节读取文件

Question

提问by michaeluskov

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes

我需要读取我指定的文件的第一个字节，然后是第二个字节，第三个字节等等。我怎么能在 BASH 上做到这一点？PS我需要得到这个字节的十六进制

Answer 1

回答by F. Hauri

Full rewrite: september 2019!

完全重写：2019 年 9 月！

A lot shorter and simplier than previous versions! (Something faster, but not so much)

比以前的版本更短更简单！（更快，但不是那么多）

Yes , bashcould read binary:

是的，bash可以读取二进制文件：

Syntax:

句法：

LANG=C IFS= read -r -d '' -n 1 foo

will populate $foowith 1 binary byte. Unfortunely, as bash strings could'nt hold null bytes ($\0), reading one byte onceis required.

将填充$foo1 个二进制字节。不幸的是，由于 bash 字符串不能容纳空字节 ($ \0)，因此需要读取一个字节一次。

But for the valueof byte read, I've missed this in man bash(have a look at 2016 post, at bottom of this):

但是对于字节读取的值，我错过了man bash（查看 2016 年的帖子，在此底部）：

 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

So:

所以：

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car
}

Will populate submited variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN

将$OUTBIN使用来自 STDIN 的第一个字节的十进制 ascii 值填充提交的变量名称（默认为）

read16() {
    local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8 _r16_lb &&
    read8 _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}

Will populate submited variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...

将$OUTBIN使用来自 STDIN 的前 16 位字的十进制值填充提交的变量名称（默认为）...

Of course, for switching Endianness, you have to switch:

当然，要切换Endianness，您必须切换：

    read8 _r16_hb &&
    read8 _r16_lb

And so on:

等等：

# Usage:
#       read[8|16|32|64] [varname] < binaryStdInput

read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8  _r16_lb && read8  _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
    read16 _r32_lw && read16 _r32_hw
    printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
    read32 _r64_ll && read32 _r64_hl
    printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}

So you could sourcethis, then if your /dev/sdais gptpartitioned,

所以你可以source这样，然后如果你/dev/sda是gpt分区的，

read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $[totsize-gptbackup]
1

Answer could be 1(1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).

答案可能是1（第一个 GPT 在扇区 1，一个扇区是 512 字节。GPT 备份位置在字节 32。有bs=8512 -> 64 + 32 -> 4 = 544 -> 68 个块要跳过...请参阅GUID 分区表在维基百科）。

Quick small write function...

快速小写功能...

write () { 
    local i=$[${2:-64}/8] o= v r
    r=$[i-1]
    for ((;i--;)) {
        printf -vv '\%03o' $[(>>8*(0${3+-1}?i:r-i))&255]
        o+=$v
    }
    printf "$o"
}

This function default to 64 bits, little endian.

此函数默认为 64 位，小端。

Usage: write <integer> [bits:64|32|16|8] [switchto big endian]

With two parameter, second parameter must be one of 8, 16, 32or 64, to be bit length of generated output.
With any dummy 3th parameter, (even empty string), function will switch to big endian.

有两个参数，第二参数必须是一个8，16，32或64，要生成的输出的比特长度。
使用任何虚拟的第三个参数（甚至是空字符串），函数将切换到大端。

.

read64 foo < <(write -12345);echo $foo
-12345

...

First post 2015...

2015年的第一篇文章...

Upgrade for adding specific bash version (with bashisms)

升级以添加特定的 bash 版本（使用 bashisms）

With new version of printfbuilt-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.

使用新版本的printf内置功能，您可以做很多事情而无需 fork ( $(...)) 使您的脚本更快。

First let see (by using seqand sed) how to parse hd output:

首先让我们看看（通过使用seq和sed）如何解析高清输出：

echo ;sed <(seq -f %02g 0 $[COLUMNS-1]) -ne '
    /0$/{s/^\(.*\)0$/\o0337\o033[A\o03380/;H;};
    /[1-9]$/{s/^.*\(.\)//;H};
    ${x;s/\n//g;p}';hd < <(echo Hello good world!)
0         1         2         3         4         5         6         7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
00000010  21 0a                                             |!.|
00000012

Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.

十六进制部分从第 10 列开始到第 56 列结束，由 3 个字符隔开，并在第 34 列有一个额外的空格。

So parsing this could by done by:

因此可以通过以下方式解析：

while read line ;do
    for x in ${line:10:48};do
        printf -v x \%o 0x$x
        printf $x
      done
  done < <( ls -l --color | hd )

Old original post

旧原帖

Edit 2for Hexadecimal, you could use hd

编辑 2为十六进制，你可以使用hd

echo Hello world | hd
00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|

or od

或者 od

echo Hello world | od -t x1 -t c
0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
          H   e   l   l   o       w   o   r   l   d  \n

shortly

不久

while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done

try them:

试试看：

while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)

Explain:

解释：

while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
    do [ "$car" ] &&      # Test if there is ``something''?
        echo -n "$car" || # then echo them
        echo              # Else, there is an end-of-line, so print one
  done

Edit; Question was edited: need hex values!?

编辑; 问题已编辑：需要十六进制值！？

od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done

Demo:

演示：

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            printf "\x$char"              # Print translate HEX to binary
      done
  done

Demo 2: We have both hex and binary

演示 2：我们有十六进制和二进制

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            bin="$(printf "\x$char")"     # translate HEX to binary
            dec=$(printf "%d" 0x$char)    # translate to decimal
            [ $dec -lt 32  ] ||           # if caracter not printable
            ( [ $dec -gt 128 ] &&         # change bin to a single dot.
              [ $dec -lt 160 ] ) && bin="."
            str="$str$bin" 
            echo -n $char \               # Print HEX value and a space
            ((i++))                       # count printed values
            if [ $i -gt 15 ] ;then
                i=0
                echo "  -  $str"
                str=""
              fi
      done
  done

New post on september 2016:

2016 年 9 月的新帖子：

This could be usefull on very specific cases, ( I've used them to manualy copy GPT partitions between two disk, at low level, without having /usrmounted...)

这在非常特殊的情况下可能很有用，（我已经使用它们在两个磁盘之间手动复制 GPT 分区，在低级别，没有/usr安装......）

Yes, bash could read binary!

是的，bash 可以读取二进制文件！

... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).

...但只有一个字节，一个一个......（因为无法正确读取`char（0）'，正确读取它们的唯一方法是考虑文件结尾，如果没有字符被读取并且未到达文件末尾，则读取的字符是 char(0))。

This is more a proof of concept than a relly usefull tool: there is a pure bashversion of hd(hexdump).

这与其说是一个真正有用的工具，不如说是一个概念证明：有一个纯bash版本的hd(hexdump)。

This use recent bashisms, under bash v4.3or higher.

这使用最近的bashisms，低于bash v4.3或更高。

#!/bin/bash

printf -v ascii \%o {32..126}
printf -v ascii "$ascii"

printf -v cntrl %-20sE abtnvfr

values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}

while LANG=C IFS= read -r -d '' -n 1 char ;do
    if [ "$char" ] ;then
        printf -v char "%q" "$char"
        ((${#char}==1)) && todisplay+=$char || todisplay+=.
        case ${#char} in
         1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
           7 ) char=${char#*\'\};values+=($((8#${char%\'})));;
           5 ) char=${char#*\'\};char=${cntrl%${char%\'}*};
                values+=($((${#char}+7)));;
           * ) echo >&2 ERROR: $char;;
        esac
      else
        values+=(0)
      fi

    if [ ${#values[@]} -gt 15 ] ;then
        printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
        ((address+=16))
        values=() todisplay=
      fi
  done

if [ "$values" ] ;then
        ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
            fmt="${fmt8:0:${#values[@]}*5}"
        printf "%08x $fmt%$((
                50-${#values[@]}*3-(${#values[@]}>8?1:0)
            ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}

You could try/use this, but don't try to compare performances!

您可以尝试/使用它，但不要尝试比较性能！

time hd < <(seq 1 10000|gzip)|wc
   1415   25480  111711
real    0m0.020s
user    0m0.008s
sys     0m0.000s

time ./hex.sh < <(seq 1 10000|gzip)|wc
   1415   25452  111669
real    0m2.636s
user    0m2.496s
sys     0m0.048s

same job: 20ms for hdvs 2000ms for my bash script.

同样的工作：20 毫秒hdvs 2000毫秒我的bash script.

... but if you wanna read 4 bytes in a file header or even a sector addressin an hard drive, this could do the job...

...但是如果您想读取文件头中的 4 个字节甚至硬盘驱动器中的扇区地址，这可以完成这项工作...

Answer 2

回答by anishsane

Did you try xxd? It gives hex dump directly, as you want..

你试了xxd吗？它可以根据需要直接提供十六进制转储。

For your case, the command would be:

对于您的情况，命令将是：

xxd -c 1 /path/to/input_file | while read offset hex char; do
  #Do something with $hex
done

Note: extract the char from hex, rather than while read line. This is required because read will not capture white space properly.

注意：从十六进制中提取字符，而不是在读取行时。这是必需的，因为 read 不会正确捕获空白。

Answer 3

回答by Grijesh Chauhan

using reada single char can be read at a time as follows:

使用 read单个字符可以一次读取如下：

read -n 1 c
echo $c

[ANSWER]

[回答]

Try this:

尝试这个：

#!/bin/bash
# data file
INPUT=/path/to/input.txt

# while loop
while IFS= read -r -n1 char
do
        # display one character at a time
    echo  "$char"
done < "$INPUT"

From this link

从这个链接

Second method, Using awk, loop through char by char

第二种方法，使用awk，逐个字符循环

awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql

third way,

第三种方式，

$ fold -1 /home/cscape/Desktop/table.sql  | awk '{print $ cat file
123A3445F 
}'

EDIT: To print each char as HEXnumber:

编辑：将每个字符打印为HEX数字：

Suppose I have a file name file:

假设我有一个文件名file：

$ cat x.awk
#!/bin/awk -f

BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
function ord(str,    c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

{ x=$ fold -1 /home/cscape/Desktop/file  | awk -f x.awk
1 , 31
2 , 32
3 , 33
A , 41
3 , 33
4 , 34
4 , 34
5 , 35
F , 46
; printf("%s , %x\n",for a in $( seq $( cat file.txt | wc -c ) ) ; do cat file.txt | head -c$a | tail -c1 | xargs -0 -I{} printf '%s %0X\n' {} "'{}" ; done
, ord(x) )}

I have written a awkscript (named x.awk) to that read char by char from fileand print into HEX:

我已经写了一个awk脚本 ( named x.awk) 来读取一个字符一个字符，file然后打印到HEX：

#!/bin/bash

function usage() {
    echo "Need file with size > 0"
    exit 1
}

test -s "" || usage

for a in $( seq $( cat  | wc -c ) )
do
    cat  | head -c$a | tail -c1 | \
    xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

To write this script I used awk-documentation
Now, You can use this awkscript for your work as follows:

为了编写此脚本，我使用了awk-documentation
现在，您可以使用此awk脚本进行工作，如下所示：

#!/bin/bash

test -s "" || { echo "Need a file with size greater than 0!"; exit 1; }

a=0
max=$(cat  | wc -c)
while [[ $((++a)) -lt $max ]]; do
  cat  | head -c$a | tail -c1 | \
  xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

NOTE: Avalue is 41in HEX decimal. To print in decimal change %xto %din last line of script x.awk.

注意：A值41以十六进制十进制表示。要以十进制更改%x为%d在脚本的最后一行打印x.awk。

Give it a Try!!

试一试！！

Answer 4

回答by Perleone

Yet another solution, using head, tail and printf:

另一个解决方案，使用 head、tail 和 printf：

while read -n 1 ch; do
  echo $ch
done < moemoe.txt

回答by syntaxerror

Although I rather wanted to expand Perleone's own post (as it was hisbasic concept!), my edit was rejected after all, and I was kindly adviced that this should be posted as a separate answer. Fair enough, so I will do that.

虽然我更想扩展 Perleone 自己的帖子（因为这是他的基本概念！），但我的编辑毕竟被拒绝了，我被善意地建议这应该作为单独的答案发布。很公平，所以我会这样做。

Considerations in short for the improvements on Perleone's original script:

简而言之，Perleone 原始脚本改进的注意事项：

seqwould be totally overkill here. A simple whileloop with aused as a (likewise simple) counter variable will do the job just fine (and much quicker too)
The max value, $(cat $1 | wc -c)mustbe assigned to a variable, otherwise it will be recalculated every time and make this alternate script run even slower than the one it was derived from.
There's no need to waste a function on a simple usage info line. However, it is necessary to know about the (mandatory) curly braces around two commands, for without the { }, the exit 1command will be executed in either case, and the script interpreter will never make it to the loop. (Last note: ( )will work too, but not in the same way! Parentheses will spawn a subshell, whilst curly braces will execute commands inside them in the currentshell.)

seq在这里会完全矫枉过正。用作（同样简单）计数器变量的简单while循环a可以很好地完成工作（并且速度也更快）
最大值，$(cat $1 | wc -c)必须分配给一个变量，否则每次都会重新计算，并使这个替代脚本的运行速度比它派生的脚本还要慢。
没有必要在简单的使用信息行上浪费一个函数。但是，有必要了解两个命令周围的（强制性）大括号，因为没有{ }，exit 1命令将在任何一种情况下执行，并且脚本解释器永远不会进入循环。（最后一点：( )也可以工作，但方式不同！括号将产生一个subshell，而大括号将在当前shell 中执行其中的命令。）

##代码##

Answer 6

回答by yasu

use readwith -noption.

read与-n选项一起使用。

##代码##

Answer 7

回答by Willian Mainieri

I have a suggestion to give, but would like a feedback from everybody and manly a personal advice from syntaxerror's user.

我有一个建议，但希望得到每个人的反馈以及来自语法错误用户的个人建议。

I don't know much about bash but I thought maybe it would be better to have "cat $1" stored in a variable.. but the problem is that echo command will also bring a small overhead right?

我对 bash 了解不多，但我认为将“cat $1”存储在变量中可能会更好..但问题是 echo 命令也会带来少量开销，对吗？

##代码##

in my opinion it would have a better performance but i haven't perf'tested..

在我看来，它会有更好的性能，但我还没有进行过性能测试..

在 BASH 中按字节读取文件

提问by michaeluskov

回答by F. Hauri

Full rewrite: september 2019!

完全重写：2019 年 9 月！

Yes , bashcould read binary:

是的，bash可以读取二进制文件：

Quick small write function...

快速小写功能...

First post 2015...

2015年的第一篇文章...

Upgrade for adding specific bash version (with bashisms)

升级以添加特定的 bash 版本（使用 bashisms）

Old original post

旧原帖

New post on september 2016:

2016 年 9 月的新帖子：

Yes, bash could read binary!

是的，bash 可以读取二进制文件！

回答by anishsane

回答by Grijesh Chauhan

回答by Perleone

回答by syntaxerror

回答by yasu

回答by Willian Mainieri

相关推荐

最近更新

标签

在 BASH 中按字节读取文件

提问by michaeluskov

回答by F. Hauri

Full rewrite: september 2019!

完全重写：2019 年 9 月！

Yes , bashcould read binary:

是的，bash可以读取二进制文件：

Quick small write function...

快速小写功能...

First post 2015...

2015年的第一篇文章...

Upgrade for adding specific bash version (with bashisms)

升级以添加特定的 bash 版本（使用 bashisms）

Old original post

旧原帖

New post on september 2016:

2016 年 9 月的新帖子：

Yes, bash could read binary!

是的，bash 可以读取二进制文件！

回答by anishsane

回答by Grijesh Chauhan

回答by Perleone

回答by syntaxerror

回答by yasu

回答by Willian Mainieri

相关推荐

如何使用 bash 工具搜索非 ASCII 字符？

bash 一个接一个地运行命令，即使我挂起第一个命令 (Ctrl-z)

如果条件，Bash 中的“一元运算符预期”错误

bash 添加到 .bashrc 文件的路径？

相关推荐

最近更新

标签