将 radix-2 数字字符串数组写入 Ruby 中的二进制文件

Question

提问by Ivan Kozlov

I've written a simple Huffman encoding in Ruby. As output I've got an array, for example:

我已经用 Ruby 编写了一个简单的 Huffman 编码。作为输出，我有一个数组，例如：

["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

I need to write, and then read, it to and from a file. I tried several methods:

我需要在文件中写入和读取它。我尝试了几种方法：

IO.binwrite("out.cake", array)

I get a simple text file and not binary.

我得到一个简单的文本文件而不是二进制文件。

Or:

或者：

File.open("out.cake", 'wb' ) do |output|
  array.each do | byte |
       output.print byte.chr
  end
end

Which looks like it works, but then I can't read it into array.

看起来它有效，但是我无法将其读入数组。

Which encoding should I use?

我应该使用哪种编码？

Answer 1

回答by M. Shiina

I think you can just use Array#packand String#unpacklike the following code:

我认为您可以使用Array#pack并String#unpack喜欢以下代码：

# Writing
a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]
File.open("out.cake", 'wb' ) do |output|
  output.write [a.join].pack("B*")
end

# Reading
s = File.binread("out.cake")
bits = s.unpack("B*")[0] # "01011111010110111000111000010011"

I don't know your preferred format for the result of reading and I know the above method is inefficient. But anyway you can take "0" or "1" sequentially from the result of unpackto traverse your Huffman tree.

我不知道您首选的阅读结果格式，我知道上述方法效率低下。但无论如何，您可以从的结果中依次取“0”或“1”unpack来遍历您的霍夫曼树。

Answer 2

回答by Sergey Bolgov

If you want bits, then you have to do both packing and unpacking manually. Neither Ruby nor any other common-use language will do it for you.

如果你想要位，那么你必须手动打包和解包。Ruby 或任何其他常用语言都不会为您做这件事。

Your array contains strings that are groups of characters, but you need to build an array of bytes and write those bytes into the file.

您的数组包含由字符组组成的字符串，但您需要构建一个字节数组并将这些字节写入文件。

From this: ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

由此： ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

you should build these bytes: 01011111 01011011 10001110 00010011

你应该构建这些字节： 01011111 01011011 10001110 00010011

Since it's just four bytes, you can put them into a single 32-bit number 01011111010110111000111000010011that is 5F5B8E13hex.

由于它只有四个字节，您可以将它们放入一个 32 位的十六进制数01011111010110111000111000010011中5F5B8E13。

Both samples of your code do different things. The first one writes into the file a string representation of a Ruby array. The second one writes 32 bytes where each is either 48('0') or 49('1').

您的代码的两个示例都做不同的事情。第一个将 Ruby 数组的字符串表示写入文件。第二个写入 32 个字节，其中每个字节为48('0') 或49('1')。

If you want bits, then your output file size should be just four bytes.

如果你想要位，那么你的输出文件大小应该只有四个字节。

Read about bit operations to learn how to achieve that.

阅读位操作以了解如何实现这一点。

Here is a draft. I didn't test it. Something may be wrong.

这是一个草稿。我没有测试它。可能有问题。

a = ["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]

# Join all the characters together. Add 7 zeros to the end.
bit_sequence = a.join + "0" * 7  # "010111110101101110001110000100110000000"

# Split into 8-digit chunks.
chunks = bit_sequence.scan(/.{8}/)  # ["01011111", "01011011", "10001110", "00010011"]

# Convert every chunk into character with the corresponding code.
bytes = chunks.map { |chunk| chunk.to_i(2).chr }  # ["_", "[", "\x8E", "\x13"]

File.open("my_huffman.bin", 'wb' ) do |output|
  bytes.each { |b| output.write b }
end

Note: seven zeros are added to handle case when the total number of characters is not divisible by 8. Without those zeros, bit_sequence.scan(/.{8}/)will drop the remaining characters.

注意：当字符总数不能被 8 整除时，会添加七个零来处理情况。没有这些零，bit_sequence.scan(/.{8}/)将删除剩余的字符。

将 radix-2 数字字符串数组写入 Ruby 中的二进制文件

提问by Ivan Kozlov

回答by M. Shiina

回答by Sergey Bolgov

相关推荐

最近更新

标签

将 radix-2 数字字符串数组写入 Ruby 中的二进制文件

提问by Ivan Kozlov

回答by M. Shiina

回答by Sergey Bolgov

相关推荐

在 Ruby 中声明变量？

ruby 如何在 Mac 上删除捆绑器

Ruby 中具有默认值的可选参数

rbenv 未显示可用的 ruby​​ 版本

相关推荐

最近更新

标签

rbenv 未显示可用的 ruby 版本