将人类可读的转换为 bash 中的字节

Question

提问by Devon

So I am trying to analyze very large log files in linux and I have seen plenty of solutions for the reverse of this, but the program that records the data doesn't allow for output formatting therefore it only outputs in human readable format (I know, what a pain). So the question is: How can I convert human readable to bytes using something like awk:

所以我试图在 linux 中分析非常大的日志文件，我已经看到了很多相反的解决方案，但是记录数据的程序不允许输出格式，因此它只以人类可读的格式输出（我知道，多么痛苦）。所以问题是：如何使用 awk 之类的东西将人类可读的内容转换为字节：

So converting this:

所以转换这个：

937
1.43K
120.3M

to:

到：

937
1464
126143693

I can afford and I expect some rounding errors.

我负担得起，我预计会出现一些舍入错误。

Thanks in advance.

提前致谢。

P.S. Doesn't have to be awk as long as it can provide in-line conversions.

PS 不必是 awk，只要它可以提供内联转换。

I found thisbut the awk command given doesn't appear to work correctly. It outputs something like 534K"0".

我发现了这一点，但给出的 awk 命令似乎无法正常工作。它输出类似 534K"0" 的内容。

I also found a solution using sed and bc, but because it uses bc it has limited effectiveness meaning it only can use one column at a time and all the data has to be appropriate for bc or else it fails.

我还找到了一个使用 sed 和 bc 的解决方案，但是因为它使用 bc，所以它的有效性有限，这意味着它一次只能使用一列，并且所有数据都必须适合 bc，否则就会失败。

sed -e 's/K/\*1024/g' -e 's/M/\*1048576/g' -e 's/G/\*1073741824/g' | bc

Answer 1

采纳答案by tink

$ cat dehumanise 
937
1.43K
120.3M

$ awk '/[0-9]$/{print ;next};/[mM]$/{printf "%u\n", *(1024*1024);next};/[kK]$/{printf "%u\n", *1024;next}' dehumanise
937
1464
126143692

Answer 2

回答by starfry

Here's a function that understands binary and decimal prefixes and is easily extendable for large units should there be a need:

这是一个理解二进制和十进制前缀的函数，如果需要，它可以很容易地扩展到大单位：

dehumanise() {
  for v in "${@:-$(</dev/stdin)}"
  do  
    echo $v | awk \
      'BEGIN{IGNORECASE = 1}
       function printpower(n,b,p) {printf "%u\n", n*b^p; next}
       /[0-9]$/{print ;next};
       /K(iB)?$/{printpower(,  2, 10)};
       /M(iB)?$/{printpower(,  2, 20)};
       /G(iB)?$/{printpower(,  2, 30)};
       /T(iB)?$/{printpower(,  2, 40)};
       /KB$/{    printpower(, 10,  3)};
       /MB$/{    printpower(, 10,  6)};
       /GB$/{    printpower(, 10,  9)};
       /TB$/{    printpower(, 10, 12)}'
  done
}

example:

例子：

$ dehumanise 2K 2k 2KiB 2KB 
2048
2048
2048
2000

$ dehumanise 2G 2g 2GiB 2GB 
2147483648
2147483648
2147483648
2000000000

The suffixes are case-insensitive.

后缀不区分大小写。

Answer 3

回答by brablc

Use numfmt --from=iecfrom GNU coreutils.

numfmt --from=iec从 GNU coreutils使用。

Answer 4

回答by ThorSummoner

Python tools exist

Python 工具存在

$pip install humanfriendly  # Also available as a --user install in ~/.local/bin

$humanfriendly --parse-size="2 KB"
2000
$humanfriendly --parse-size="2 KiB"
2048

Answer 5

回答by Yzmir Ramirez

awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}'

awk 'function pp(p){printf "%u\n",$0*1024^p} /[0-9]$/{print $0}/K$/{pp(1)}/M$/{pp (2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}'

This is a modification on @starfry's answer.

这是对@starfry 答案的修改。

Let's break it down:

让我们分解一下：

function pp(p) { printf "%u\n", $0 * 1024^p }

函数 pp(p) { printf "%u\n", $0 * 1024^p }

Define a function named ppthat takes a single parameter pand prints the $0multiplied by 1024 raised to the p-thpower. The %uwill print the unsigned decimal integer of that number.

定义一个名为的函数pp，它接受一个参数p并打印$0乘以 1024 的p-th幂。在%u将打印数量的无符号十进制整数。

/[0-9]$/ { print $0 }

/[0-9]$/ { 打印 $0 }

Match lines that end with a digit (the $matches the end of the line), then run the code inside the {and }. Print the entire line ($0)

匹配以数字$结尾的行（匹配行的结尾），然后运行{and 中的代码}。打印整行 ( $0)

/K$/ { pp(1) }

Match lines that end with the capital letter K, call the function pp() and pass 1 to it (p == 1). NOTE:When $0 (e.g. "1.43K") is used in a math equation only the beginning numbers (i.e. "1.43") will be used below. Example with $0 = "1.43K"

匹配以大写字母结尾的行K，调用函数 pp() 并将 1 传递给它 (p == 1)。 注意：在数学方程式中使用 $0（例如“1.43K”）时，下面将仅使用开头的数字（即“1.43”）。$0 = "1.43K" 的示例

$ cat dehumanise
937
1.43K
120.3M
5G
933G
12.2T
bad
<>
 * 1024^p == 120.3M * 1024^2 == 120.3M * 1024^2 == 120.3M * 1024*1024 = 120.3 * 1048576 = 126143692.8
 * 1024^p == 1.43K * 1024^1 == 1.43K * 1024 = 1.43 * 1024 = 1464.32

/M$/ { pp(2) }

Match lines that end with the capital letter M, call the function pp() and pass 2 to it (p == 2). Example with $0 == "120.3M"

匹配以大写字母结尾的行M，调用函数 pp() 并将 2 传递给它 (p == 2)。$0 == "120.3M" 的示例

$ awk 'function pp(p){printf "%u\n",##代码##*1024^p} /[0-9]$/{print ##代码##}/K$/{pp(1)}/M$/{pp(2)}/G$/{pp(3)}/T$/{pp(4)}/[^0-9KMGT]$/{print 0}' dehumanise
937
1464
126143692
5368709120
1001801121792
13414041858867
0
0

etc... for Gand T

等等...对于G和T

/[^0-9KMGT]$/ { print 0 }

/[^0-9KMGT]$/ { 打印 0 }

Lines that do not end with a digit or the capital letters K, M, G, or T print "0".

不以数字或大写字母 K、M、G 或 T 结尾的行打印“0”。

Example:

例子：

##代码##

Results:

结果：

##代码##

将人类可读的转换为 bash 中的字节

提问by Devon

采纳答案by tink

回答by starfry

回答by brablc

回答by ThorSummoner

回答by Yzmir Ramirez

相关推荐

最近更新

标签

将人类可读的转换为 bash 中的字节

提问by Devon

采纳答案by tink

回答by starfry

回答by brablc

回答by ThorSummoner

回答by Yzmir Ramirez

相关推荐

在 bash 脚本中捕获 mysqldump 错误

bash grep，否则打印不匹配的消息

bash 使用 socat 进行原始串行连接

bash 目标目录不存在时如何创建符号链接？

相关推荐

最近更新

标签