将 radix-2 数字字符串数组写入 Ruby 中的二进制文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16821435/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 06:00:32  来源:igfitidea点击:

Write array of radix-2 numeric strings to binary file in Ruby

rubyiobindata

提问by Ivan Kozlov

I've written a simple Huffman encoding in Ruby. As output I've got an array, for example:

我已经用 Ruby 编写了一个简单的 Huffman 编码。作为输出,我有一个数组,例如:

["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

I need to write, and then read, it to and from a file. I tried several methods:

我需要在文件中写入和读取它。我尝试了几种方法:

IO.binwrite("out.cake", array)

I get a simple text file and not binary.

我得到一个简单的文本文件而不是二进制文件。

Or:

或者:

File.open("out.cake", 'wb' ) do |output|
  array.each do | byte |
       output.print byte.chr
  end
end

Which looks like it works, but then I can't read it into array.

看起来它有效,但是我无法将其读入数组。

Which encoding should I use?

我应该使用哪种编码?

回答by M. Shiina

I think you can just use Array#packand String#unpacklike the following code:

我认为您可以使用Array#packString#unpack喜欢以下代码:

# Writing
a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]
File.open("out.cake", 'wb' ) do |output|
  output.write [a.join].pack("B*")
end

# Reading
s = File.binread("out.cake")
bits = s.unpack("B*")[0] # "01011111010110111000111000010011"

I don't know your preferred format for the result of reading and I know the above method is inefficient. But anyway you can take "0" or "1" sequentially from the result of unpackto traverse your Huffman tree.

我不知道您首选的阅读结果格式,我知道上述方法效率低下。但无论如何,您可以从 的结果中依次取“0”或“1”unpack来遍历您的霍夫曼树。

回答by Sergey Bolgov

If you want bits, then you have to do both packing and unpacking manually. Neither Ruby nor any other common-use language will do it for you.

如果你想要位,那么你必须手动打包和解包。Ruby 或任何其他常用语言都不会为您做这件事。

Your array contains strings that are groups of characters, but you need to build an array of bytes and write those bytes into the file.

您的数组包含由字符组组成的字符串,但您需要构建一个字节数组并将这些字节写入文件。

From this: ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

由此: ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

you should build these bytes: 01011111 01011011 10001110 00010011

你应该构建这些字节: 01011111 01011011 10001110 00010011

Since it's just four bytes, you can put them into a single 32-bit number 01011111010110111000111000010011that is 5F5B8E13hex.

由于它只有四个字节,您可以将它们放入一个 32 位的十六进制数010111110101101110001110000100115F5B8E13

Both samples of your code do different things. The first one writes into the file a string representation of a Ruby array. The second one writes 32 bytes where each is either 48('0') or 49('1').

您的代码的两个示例都做不同的事情。第一个将 Ruby 数组的字符串表示写入文件。第二个写入 32 个字节,其中每个字节为48('0') 或49('1')。

If you want bits, then your output file size should be just four bytes.

如果你想要位,那么你的输出文件大小应该只有四个字节。

Read about bit operations to learn how to achieve that.

阅读位操作以了解如何实现这一点。



Here is a draft. I didn't test it. Something may be wrong.

这是一个草稿。我没有测试它。可能有问题。

a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

# Join all the characters together. Add 7 zeros to the end.
bit_sequence = a.join + "0" * 7  # "010111110101101110001110000100110000000"

# Split into 8-digit chunks.
chunks = bit_sequence.scan(/.{8}/)  # ["01011111", "01011011", "10001110", "00010011"]

# Convert every chunk into character with the corresponding code.
bytes = chunks.map { |chunk| chunk.to_i(2).chr }  # ["_", "[", "\x8E", "\x13"]

File.open("my_huffman.bin", 'wb' ) do |output|
  bytes.each { |b| output.write b }
end

Note: seven zeros are added to handle case when the total number of characters is not divisible by 8. Without those zeros, bit_sequence.scan(/.{8}/)will drop the remaining characters.

注意:当字符总数不能被 8 整除时,会添加七个零来处理情况。没有这些零,bit_sequence.scan(/.{8}/)将删除剩余的字符。