在 BASH 中按字节读取文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13889659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Read a file by bytes in BASH
提问by michaeluskov
I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes
我需要读取我指定的文件的第一个字节,然后是第二个字节,第三个字节等等。我怎么能在 BASH 上做到这一点?PS我需要得到这个字节的十六进制
回答by F. Hauri
Full rewrite: september 2019!
完全重写:2019 年 9 月!
A lot shorter and simplier than previous versions! (Something faster, but not so much)
比以前的版本更短更简单!(更快,但不是那么多)
Yes , bashcould read binary:
是的,bash可以读取二进制文件:
Syntax:
句法:
LANG=C IFS= read -r -d '' -n 1 foo
will populate $foo
with 1 binary byte. Unfortunely, as bash strings could'nt hold null bytes ($\0
), reading one byte onceis required.
将填充$foo
1 个二进制字节。不幸的是,由于 bash 字符串不能容纳空字节 ($ \0
),因此需要读取一个字节一次。
But for the valueof byte read, I've missed this in man bash
(have a look at 2016 post, at bottom of this):
但是对于字节读取的值,我错过了man bash
(查看 2016 年的帖子,在此底部):
printf [-v var] format [arguments] ... Arguments to non-string format specifiers are treated as C constants, except that ..., and if the leading character is a single or double quote, the value is the ASCII value of the following character.
printf [-v var] format [arguments] ... Arguments to non-string format specifiers are treated as C constants, except that ..., and if the leading character is a single or double quote, the value is the ASCII value of the following character.
So:
所以:
read8() {
local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car
printf -v $_r8_var %d \'$_r8_car
}
Will populate submited variable name (default to $OUTBIN
) with decimal ascii value of first byte from STDIN
将$OUTBIN
使用来自 STDIN 的第一个字节的十进制 ascii 值填充提交的变量名称(默认为)
read16() {
local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
read8 _r16_lb &&
read8 _r16_hb
printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}
Will populate submited variable name (default to $OUTBIN
) with decimal value of first 16 bits word from STDIN...
将$OUTBIN
使用来自 STDIN 的前 16 位字的十进制值填充提交的变量名称(默认为)...
Of course, for switching Endianness, you have to switch:
当然,要切换Endianness,您必须切换:
read8 _r16_hb &&
read8 _r16_lb
And so on:
等等:
# Usage:
# read[8|16|32|64] [varname] < binaryStdInput
read8() { local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car
printf -v $_r8_var %d \'$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
read8 _r16_lb && read8 _r16_hb
printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
read16 _r32_lw && read16 _r32_hw
printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
read32 _r64_ll && read32 _r64_hl
printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}
So you could source
this, then if your /dev/sda
is gpt
partitioned,
所以你可以source
这样,然后如果你/dev/sda
是gpt
分区的,
read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $[totsize-gptbackup]
1
Answer could be 1
(1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8
512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).
答案可能是1
(第一个 GPT 在扇区 1,一个扇区是 512 字节。GPT 备份位置在字节 32。有bs=8
512 -> 64 + 32 -> 4 = 544 -> 68 个块要跳过...请参阅GUID 分区表在维基百科)。
Quick small write function...
快速小写功能...
write () {
local i=$[${2:-64}/8] o= v r
r=$[i-1]
for ((;i--;)) {
printf -vv '\%03o' $[(>>8*(0${3+-1}?i:r-i))&255]
o+=$v
}
printf "$o"
}
This function default to 64 bits, little endian.
此函数默认为 64 位,小端。
Usage: write <integer> [bits:64|32|16|8] [switchto big endian]
- With two parameter, second parameter must be one of
8
,16
,32
or64
, to be bit length of generated output. - With any dummy 3th parameter, (even empty string), function will switch to big endian.
- 有两个参数,第二参数必须是一个
8
,16
,32
或64
,要生成的输出的比特长度。 - 使用任何虚拟的第三个参数(甚至是空字符串),函数将切换到大端。
.
.
read64 foo < <(write -12345);echo $foo
-12345
...
...
First post 2015...
2015年的第一篇文章...
Upgrade for adding specific bash version (with bashisms)
升级以添加特定的 bash 版本(使用 bashisms)
With new version of printf
built-in, you could do a lot without having to fork ($(...)
) making so your script a lot faster.
使用新版本的printf
内置功能,您可以做很多事情而无需 fork ( $(...)
) 使您的脚本更快。
First let see (by using seq
and sed
) how to parse hd output:
首先让我们看看(通过使用seq
和sed
)如何解析高清输出:
echo ;sed <(seq -f %02g 0 $[COLUMNS-1]) -ne '
/0$/{s/^\(.*\)0$/\o0337\o033[A\o03380/;H;};
/[1-9]$/{s/^.*\(.\)//;H};
${x;s/\n//g;p}';hd < <(echo Hello good world!)
0 1 2 3 4 5 6 7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000 48 65 6c 6c 6f 20 67 6f 6f 64 20 77 6f 72 6c 64 |Hello good world|
00000010 21 0a |!.|
00000012
Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.
十六进制部分从第 10 列开始到第 56 列结束,由 3 个字符隔开,并在第 34 列有一个额外的空格。
So parsing this could by done by:
因此可以通过以下方式解析:
while read line ;do
for x in ${line:10:48};do
printf -v x \%o 0x$x
printf $x
done
done < <( ls -l --color | hd )
Old original post
旧原帖
Edit 2for Hexadecimal, you could use hd
编辑 2为十六进制,你可以使用hd
echo Hello world | hd
00000000 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a |Hello world.|
or od
或者 od
echo Hello world | od -t x1 -t c
0000000 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a
H e l l o w o r l d \n
shortly
不久
while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done
try them:
试试看:
while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)
Explain:
解释:
while IFS= read -rn1 car # unset InputFieldSeparator so read every chars
do [ "$car" ] && # Test if there is ``something''?
echo -n "$car" || # then echo them
echo # Else, there is an end-of-line, so print one
done
Edit; Question was edited: need hex values!?
编辑; 问题已编辑:需要十六进制值!?
od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done
Demo:
演示:
od -An -t x1 < <(ls -l --color ) | # Translate binary to 1 byte hex
while read line;do # Read line of HEX pairs
for char in $line;do # For each pair
printf "\x$char" # Print translate HEX to binary
done
done
Demo 2: We have both hex and binary
演示 2:我们有十六进制和二进制
od -An -t x1 < <(ls -l --color ) | # Translate binary to 1 byte hex
while read line;do # Read line of HEX pairs
for char in $line;do # For each pair
bin="$(printf "\x$char")" # translate HEX to binary
dec=$(printf "%d" 0x$char) # translate to decimal
[ $dec -lt 32 ] || # if caracter not printable
( [ $dec -gt 128 ] && # change bin to a single dot.
[ $dec -lt 160 ] ) && bin="."
str="$str$bin"
echo -n $char \ # Print HEX value and a space
((i++)) # count printed values
if [ $i -gt 15 ] ;then
i=0
echo " - $str"
str=""
fi
done
done
New post on september 2016:
2016 年 9 月的新帖子:
This could be usefull on very specific cases, ( I've used them to manualy copy GPT partitions between two disk, at low level, without having /usr
mounted...)
这在非常特殊的情况下可能很有用,(我已经使用它们在两个磁盘之间手动复制 GPT 分区,在低级别,没有/usr
安装......)
Yes, bash could read binary!
是的,bash 可以读取二进制文件!
... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).
...但只有一个字节,一个一个......(因为无法正确读取`char(0)',正确读取它们的唯一方法是考虑文件结尾,如果没有字符被读取并且未到达文件末尾,则读取的字符是 char(0))。
This is more a proof of concept than a relly usefull tool: there is a pure bashversion of hd
(hexdump).
这与其说是一个真正有用的工具,不如说是一个概念证明:有一个纯bash版本的hd
(hexdump)。
This use recent bashisms, under bash v4.3
or higher.
这使用最近的bashisms,低于bash v4.3
或更高。
#!/bin/bash
printf -v ascii \%o {32..126}
printf -v ascii "$ascii"
printf -v cntrl %-20sE abtnvfr
values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}
while LANG=C IFS= read -r -d '' -n 1 char ;do
if [ "$char" ] ;then
printf -v char "%q" "$char"
((${#char}==1)) && todisplay+=$char || todisplay+=.
case ${#char} in
1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
7 ) char=${char#*\'\};values+=($((8#${char%\'})));;
5 ) char=${char#*\'\};char=${cntrl%${char%\'}*};
values+=($((${#char}+7)));;
* ) echo >&2 ERROR: $char;;
esac
else
values+=(0)
fi
if [ ${#values[@]} -gt 15 ] ;then
printf "%08x $fmt8 $fmt8 |%s|\n" $address ${values[@]} "$todisplay"
((address+=16))
values=() todisplay=
fi
done
if [ "$values" ] ;then
((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
fmt="${fmt8:0:${#values[@]}*5}"
printf "%08x $fmt%$((
50-${#values[@]}*3-(${#values[@]}>8?1:0)
))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}
You could try/use this, but don't try to compare performances!
您可以尝试/使用它,但不要尝试比较性能!
time hd < <(seq 1 10000|gzip)|wc
1415 25480 111711
real 0m0.020s
user 0m0.008s
sys 0m0.000s
time ./hex.sh < <(seq 1 10000|gzip)|wc
1415 25452 111669
real 0m2.636s
user 0m2.496s
sys 0m0.048s
same job: 20ms for hd
vs 2000ms for my bash script
.
同样的工作:20 毫秒hd
vs 2000毫秒我的bash script
.
... but if you wanna read 4 bytes in a file header or even a sector addressin an hard drive, this could do the job...
...但是如果您想读取文件头中的 4 个字节甚至硬盘驱动器中的扇区地址,这可以完成这项工作...
回答by anishsane
Did you try xxd
? It gives hex dump directly, as you want..
你试了xxd
吗?它可以根据需要直接提供十六进制转储。
For your case, the command would be:
对于您的情况,命令将是:
xxd -c 1 /path/to/input_file | while read offset hex char; do
#Do something with $hex
done
Note: extract the char from hex, rather than while read line. This is required because read will not capture white space properly.
注意:从十六进制中提取字符,而不是在读取行时。这是必需的,因为 read 不会正确捕获空白。
回答by Grijesh Chauhan
using read
a single char can be read at a time as follows:
使用 read
单个字符可以一次读取如下:
read -n 1 c
echo $c
[ANSWER]
[回答]
Try this:
尝试这个:
#!/bin/bash
# data file
INPUT=/path/to/input.txt
# while loop
while IFS= read -r -n1 char
do
# display one character at a time
echo "$char"
done < "$INPUT"
From this link
从这个链接
Second method,
Using awk
, loop through char by char
第二种方法,使用awk
,逐个字符循环
awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql
awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql
third way,
第三种方式,
$ fold -1 /home/cscape/Desktop/table.sql | awk '{print $ cat file
123A3445F
}'
EDIT: To print each char as HEX
number:
编辑:将每个字符打印为HEX
数字:
Suppose I have a file name file
:
假设我有一个文件名file
:
$ cat x.awk
#!/bin/awk -f
BEGIN { _ord_init() }
function _ord_init( low, high, i, t)
{
low = sprintf("%c", 7) # BEL is ascii 7
if (low == "\a") { # regular ascii
low = 0
high = 127
} else if (sprintf("%c", 128 + 7) == "\a") {
# ascii, mark parity
low = 128
high = 255
} else { # ebcdic(!)
low = 0
high = 255
}
for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}
function ord(str, c)
{
# only first character is of interest
c = substr(str, 1, 1)
return _ord_[c]
}
function chr(c)
{
# force c to be numeric by adding 0
return sprintf("%c", c + 0)
}
{ x=$ fold -1 /home/cscape/Desktop/file | awk -f x.awk
1 , 31
2 , 32
3 , 33
A , 41
3 , 33
4 , 34
4 , 34
5 , 35
F , 46
; printf("%s , %x\n",for a in $( seq $( cat file.txt | wc -c ) ) ; do cat file.txt | head -c$a | tail -c1 | xargs -0 -I{} printf '%s %0X\n' {} "'{}" ; done
, ord(x) )}
I have written a awk
script (named x.awk
) to that read char by char from file
and print into HEX
:
我已经写了一个awk
脚本 ( named x.awk
) 来读取一个字符一个字符,file
然后打印到HEX
:
#!/bin/bash
function usage() {
echo "Need file with size > 0"
exit 1
}
test -s "" || usage
for a in $( seq $( cat | wc -c ) )
do
cat | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done
To write this script I used awk-documentation
Now, You can use this awk
script for your work as follows:
为了编写此脚本,我使用了awk-documentation
现在,您可以使用此awk
脚本进行工作,如下所示:
#!/bin/bash
test -s "" || { echo "Need a file with size greater than 0!"; exit 1; }
a=0
max=$(cat | wc -c)
while [[ $((++a)) -lt $max ]]; do
cat | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done
NOTE: A
value is 41
in HEX decimal. To print in decimal change %x
to %d
in last line of script x.awk
.
注意:A
值41
以十六进制十进制表示。要以十进制更改%x
为%d
在脚本的最后一行打印x.awk
。
Give it a Try!!
试一试!!
回答by Perleone
Yet another solution, using head, tail and printf:
另一个解决方案,使用 head、tail 和 printf:
while read -n 1 ch; do
echo $ch
done < moemoe.txt
More readable:
更具可读性:
test -s "" || (echo "Need a file with size greater than 0!"; exit 1)
a=0
rfile=$(cat )
max=$(echo $rfile | wc -c)
while [[ $((++a)) -lt $max ]]; do
echo $rfile | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done
回答by syntaxerror
Although I rather wanted to expand Perleone's own post (as it was hisbasic concept!), my edit was rejected after all, and I was kindly adviced that this should be posted as a separate answer. Fair enough, so I will do that.
虽然我更想扩展 Perleone 自己的帖子(因为这是他的基本概念!),但我的编辑毕竟被拒绝了,我被善意地建议这应该作为单独的答案发布。很公平,所以我会这样做。
Considerations in short for the improvements on Perleone's original script:
简而言之,Perleone 原始脚本改进的注意事项:
seq
would be totally overkill here. A simplewhile
loop witha
used as a (likewise simple) counter variable will do the job just fine (and much quicker too)- The max value,
$(cat $1 | wc -c)
mustbe assigned to a variable, otherwise it will be recalculated every time and make this alternate script run even slower than the one it was derived from. - There's no need to waste a function on a simple usage info line. However, it is necessary to know about the (mandatory) curly braces around two commands, for without the
{ }
, theexit 1
command will be executed in either case, and the script interpreter will never make it to the loop. (Last note:( )
will work too, but not in the same way! Parentheses will spawn a subshell, whilst curly braces will execute commands inside them in the currentshell.)
seq
在这里会完全矫枉过正。用作(同样简单)计数器变量的简单while
循环a
可以很好地完成工作(并且速度也更快)- 最大值,
$(cat $1 | wc -c)
必须分配给一个变量,否则每次都会重新计算,并使这个替代脚本的运行速度比它派生的脚本还要慢。 - 没有必要在简单的使用信息行上浪费一个函数。但是,有必要了解两个命令周围的(强制性)大括号,因为没有
{ }
,exit 1
命令将在任何一种情况下执行,并且脚本解释器永远不会进入循环。(最后一点:( )
也可以工作,但方式不同!括号将产生一个subshell,而大括号将在当前shell 中执行其中的命令。)
回答by yasu
use read
with -n
option.
read
与-n
选项一起使用。
回答by Willian Mainieri
I have a suggestion to give, but would like a feedback from everybody and manly a personal advice from syntaxerror's user.
我有一个建议,但希望得到每个人的反馈以及来自语法错误用户的个人建议。
I don't know much about bash but I thought maybe it would be better to have "cat $1" stored in a variable.. but the problem is that echo command will also bring a small overhead right?
我对 bash 了解不多,但我认为将“cat $1”存储在变量中可能会更好..但问题是 echo 命令也会带来少量开销,对吗?
##代码##in my opinion it would have a better performance but i haven't perf'tested..
在我看来,它会有更好的性能,但我还没有进行过性能测试..