将人类可读的转换为 bash 中的字节
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26621647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert human readable to bytes in bash
提问by Devon
So I am trying to analyze very large log files in linux and I have seen plenty of solutions for the reverse of this, but the program that records the data doesn't allow for output formatting therefore it only outputs in human readable format (I know, what a pain). So the question is: How can I convert human readable to bytes using something like awk:
所以我试图在 linux 中分析非常大的日志文件,我已经看到了很多相反的解决方案,但是记录数据的程序不允许输出格式,因此它只以人类可读的格式输出(我知道,多么痛苦)。所以问题是:如何使用 awk 之类的东西将人类可读的内容转换为字节:
So converting this:
所以转换这个:
937
1.43K
120.3M
to:
到:
937
1464
126143693
I can afford and I expect some rounding errors.
我负担得起,我预计会出现一些舍入错误。
Thanks in advance.
提前致谢。
P.S. Doesn't have to be awk as long as it can provide in-line conversions.
PS 不必是 awk,只要它可以提供内联转换。
I found thisbut the awk command given doesn't appear to work correctly. It outputs something like 534K"0".
我发现了这一点,但给出的 awk 命令似乎无法正常工作。它输出类似 534K"0" 的内容。
I also found a solution using sed and bc, but because it uses bc it has limited effectiveness meaning it only can use one column at a time and all the data has to be appropriate for bc or else it fails.
我还找到了一个使用 sed 和 bc 的解决方案,但是因为它使用 bc,所以它的有效性有限,这意味着它一次只能使用一列,并且所有数据都必须适合 bc,否则就会失败。
sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc
sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc
采纳答案by tink
$ cat dehumanise
937
1.43K
120.3M
$ awk '/[0-9]$/{print ;next};/[mM]$/{printf "%u\n", *(1024*1024);next};/[kK]$/{printf "%u\n", *1024;next}' dehumanise
937
1464
126143692
回答by starfry
Here's a function that understands binary and decimal prefixes and is easily extendable for large units should there be a need:
这是一个理解二进制和十进制前缀的函数,如果需要,它可以很容易地扩展到大单位:
dehumanise() {
for v in "${@:-$(</dev/stdin)}"
do
echo $v | awk \
'BEGIN{IGNORECASE = 1}
function printpower(n,b,p) {printf "%u\n", n*b^p; next}
/[0-9]$/{print ;next};
/K(iB)?$/{printpower(, 2, 10)};
/M(iB)?$/{printpower(, 2, 20)};
/G(iB)?$/{printpower(, 2, 30)};
/T(iB)?$/{printpower(, 2, 40)};
/KB$/{ printpower(, 10, 3)};
/MB$/{ printpower(, 10, 6)};
/GB$/{ printpower(, 10, 9)};
/TB$/{ printpower(, 10, 12)}'
done
}
example:
例子:
$ dehumanise 2K 2k 2KiB 2KB
2048
2048
2048
2000
$ dehumanise 2G 2g 2GiB 2GB
2147483648
2147483648
2147483648
2000000000
The suffixes are case-insensitive.
后缀不区分大小写。
回答by brablc
Use numfmt --from=iec
from GNU coreutils.
numfmt --from=iec
从 GNU coreutils使用。
回答by ThorSummoner
Python tools exist
Python 工具存在
$pip install humanfriendly # Also available as a --user install in ~/.local/bin
$humanfriendly --parse-size="2 KB"
2000
$humanfriendly --parse-size="2 KiB"
2048
回答by Yzmir Ramirez
awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}'
awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp (2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}'
This is a modification on @starfry's answer.
这是对@starfry 答案的修改。
Let's break it down:
让我们分解一下:
function pp(p) { printf "%u\n", $0 * 1024^p }
函数 pp(p) { printf "%u\n", $0 * 1024^p }
Define a function named pp
that takes a single parameter p
and prints the $0
multiplied by 1024 raised to the p-th
power. The %u
will print the unsigned decimal integer of that number.
定义一个名为的函数pp
,它接受一个参数p
并打印$0
乘以 1024 的p-th
幂。在%u
将打印数量的无符号十进制整数。
/[0-9]$/ { print $0 }
/[0-9]$/ { 打印 $0 }
Match lines that end with a digit (the $
matches the end of the line), then run the code inside the {
and }
. Print the entire line ($0
)
匹配以数字$
结尾的行(匹配行的结尾),然后运行{
and 中的代码}
。打印整行 ( $0
)
/K$/ { pp(1) }
/K$/ { pp(1) }
Match lines that end with the capital letter K
, call the function pp() and pass 1 to it (p == 1). NOTE:When $0 (e.g. "1.43K") is used in a math equation only the beginning numbers (i.e. "1.43") will be used below. Example with $0 = "1.43K"
匹配以大写字母结尾的行K
,调用函数 pp() 并将 1 传递给它 (p == 1)。 注意:在数学方程式中使用 $0(例如“1.43K”)时,下面将仅使用开头的数字(即“1.43”)。$0 = "1.43K" 的示例
$ cat dehumanise
937
1.43K
120.3M
5G
933G
12.2T
bad
<>
* 1024^p == 120.3M * 1024^2 == 120.3M * 1024^2 == 120.3M * 1024*1024 = 120.3 * 1048576 = 126143692.8
* 1024^p == 1.43K * 1024^1 == 1.43K * 1024 = 1.43 * 1024 = 1464.32
/M$/ { pp(2) }
/M$/ { pp(2) }
Match lines that end with the capital letter M
, call the function pp() and pass 2 to it (p == 2). Example with $0 == "120.3M"
匹配以大写字母结尾的行M
,调用函数 pp() 并将 2 传递给它 (p == 2)。$0 == "120.3M" 的示例
$ awk 'function pp(p){printf "%u\n",##代码##*1024^p} /[0-9]$/{print ##代码##}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}' dehumanise
937
1464
126143692
5368709120
1001801121792
13414041858867
0
0
etc... for G
and T
等等...对于G
和T
/[^0-9KMGT]$/ { print 0 }
/[^0-9KMGT]$/ { 打印 0 }
Lines that do not end with a digit or the capital letters K, M, G, or T print "0".
不以数字或大写字母 K、M、G 或 T 结尾的行打印“0”。
Example:
例子:
##代码##Results:
结果:
##代码##