C语言 编程新手:如何编写自己的数据压缩算法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6114189/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 08:44:46  来源:igfitidea点击:

Programming novice: How to program my own data compression algorithm?

calgorithmcompression

提问by araisbec

It is summer, and so I have decided to take it upon myself to write a data-compression program, preferably in C code. I have a decent beginners understanding of how compression works. I just have a few questions:

现在是夏天,所以我决定自己写一个数据压缩程序,最好是用 C 代码。我对压缩的工作原理有一个不错的初学者理解。我只是有几个问题:

1) Would c be a suitable programming language to accomplish this task?
2) Should I be working in byte's with the input file? Or at a binary level somehow?

1) c 会是完成这项任务的合适的编程语言吗?
2)我应该在输入文件中使用字节吗?或者以某种方式在二进制级别?

If someone could just give me a nudge in the correct direction, I'd really appreciate it. I would like to code this myself however, and not use a pre-existing compression library or anything like that.

如果有人能给我一个正确方向的推动,我真的很感激。然而,我想自己编写代码,而不是使用预先存在的压缩库或类似的东西。

采纳答案by S.Lott

1) Would c be a suitable programming language to accomplish this task?

1) c 会是完成这项任务的合适的编程语言吗?

Yes.

是的。

2) Should I be working in byte's with the input file? Or at a binary level somehow?

2)我应该在输入文件中使用字节吗?或者以某种方式在二进制级别?

They're the same, so the question makes no sense.

他们是一样的,所以这个问题没有意义。

not use a pre-existing compression library

不使用预先存在的压缩库

Can you use a pre-existing compression algorithm? There are dozens and "compression algorithm" -- when used with Google -- will reveal a great deal of helpful information.

您可以使用预先存在的压缩算法吗?有几十种“压缩算法”——当与谷歌一起使用时——将揭示大量有用的信息。

回答by Brian Lyttle

You could start by looking at Huffman Encoding. A lot of computer science classesimplement that as a project so it should be manageable. C would be appropriate for Huffman encoding, but it might be easier to do it first in a higher-level language so that you understand the concepts.There are slides, hints, and an example project availablein Java for a masters-level project at the University of Pennsylvania (search for "huff" on that page).

您可以从查看Huffman Encoding 开始。许多计算机科学课程将其作为一个项目来实施,因此它应该是可管理的。C 将适用于 Huffman 编码,但首先使用高级语言进行编码可能更容易,以便您理解概念。Java 中有幻灯片、提示和示例项目用于大师级项目,网址为宾夕法尼亚大学(在该页面上搜索“huff”)。

回答by Ivan Z. Siu

To answer your questions:

回答您的问题:

  1. C is suitable.
  2. It depends on the algorithm, or the way you are thinking about `compression'.
  1. C是合适的。
  2. 这取决于算法,或您考虑“压缩”的方式。

My opinion will be, first decide whether you want to do a lossless compressionor a lossy compression, then pick an algorithm to implement. Here are a few pointers:

我的意见是,首先决定你想要做 alossless compression还是 a lossy compression,然后选择一个算法来实现。这里有一些提示:

For the lossless one, some are very intuitive, such as the run-lengthencoding, e.g., if there is 11 as and 5 bs, you just encode them as 11a5b. Some algorithms use a dictionary, please refer to LZW encoding. Finally, I do recommend Huffmanencoding since it is very straight-forward, simple and helpful to gain experience in learning algorithm (for your educational purpose).

对于无损的,有些是非常直观的,比如run-length编码,例如,如果有a11s和5s b,你只需将它们编码为11a5b。一些算法使用a dictionary,请参考LZW encoding。最后,我确实推荐Huffman编码,因为它非常直接、简单且有助于获得学习算法的经验(用于您的教育目的)。

For lossy ones, Discrete Fourier Transform (DFT), or wavelet, is used in JPEG compression. This is useful to understand multimedia compression.

对于有损的,Discrete Fourier Transform (DFT)wavelet用于 JPEG 压缩。这对于理解多媒体压缩很有用。

Wikipedia pageis a good starting point.

维基百科页面是一个很好的起点。

回答by NPE

  1. Yes, C is well suited for this kind of work.

  2. Whether you work with bytes or bits will depend on the algorithm that you decide to implement. For example, Huffman coding is inherently bit-oriented whereas many other compression algorithms are not.

  1. 是的,C 非常适合这种工作。

  2. 您使用字节还是位将取决于您决定实施的算法。例如,霍夫曼编码本质上是面向位的,而许多其他压缩算法则不是。

回答by Carl Norum

  1. C is a great choice for writing a compression program. You can use plenty of other languages too, though.

  2. Your computer probably can't directly address units of memory smaller than a byte (pretty much by definition), so working with bytes is probably a good choice. Some of how you work with the data will be affected by the compression algorithm you choose.

  1. C 是编写压缩程序的绝佳选择。不过,您也可以使用许多其他语言。

  2. 您的计算机可能无法直接寻址小于一个字节的内存单元(几乎根据定义),因此使用字节可能是一个不错的选择。您处理数据的某些方式会受到您选择的压缩算法的影响。

Good luck!

祝你好运!