C++ 数据压缩算法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16469410/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Data Compression Algorithms
提问by Veridian
I was wondering if anyone has a list of data compression algorithms. I know basically nothing about data compression and I was hoping to learn more about different algorithms and see which ones are the newest and have yet to be developed on a lot of ASICs.
我想知道是否有人有数据压缩算法的列表。我基本上对数据压缩一无所知,我希望更多地了解不同的算法,看看哪些是最新的,但尚未在许多 ASIC 上开发。
I'm hoping to implement a data compression ASIC which is independent of the type of data coming in (audio,video,images,etc.)
我希望实现一个数据压缩 ASIC,它与传入的数据类型(音频、视频、图像等)无关。
If my question is too open ended, please let me know and I'll revise. Thank you
如果我的问题太开放,请告诉我,我会修改。谢谢
回答by Mohamad Ali Baydoun
There are a ton of compression algorithms out there. What you need here is a lossless compression algorithm. A lossless compression algorithm compresses data such that it can be decompressed to achieve exactly what was given before compression. The opposite would be a lossy compression algorithm. Lossy compression can remove data from a file. PNG images use lossless compression while JPEG images can and often do use lossy compression.
有大量的压缩算法。您在这里需要的是无损压缩算法。无损压缩算法压缩数据,以便可以对其进行解压缩,以实现压缩前的准确数据。相反的是有损压缩算法。有损压缩可以从文件中删除数据。PNG 图像使用无损压缩,而 JPEG 图像可以并且经常使用有损压缩。
Some of the most widely known compression algorithms include:
一些最广为人知的压缩算法包括:
ZIP archives use a combination of Huffman coding and LZ77 to give fast compression and decompression times andreasonably good compression ratios.
ZIP 档案使用霍夫曼编码和 LZ77 的组合来提供快速的压缩和解压缩时间以及相当好的压缩率。
LZ77 is pretty much a generalized form of RLE and it will often yield much better results.
LZ77 几乎是 RLE 的一种广义形式,它通常会产生更好的结果。
Huffman allows the most repeating bytes to represent the least number of bits. Imagine a text file that looked like this:
霍夫曼允许最多重复的字节代表最少的位数。想象一个看起来像这样的文本文件:
aaaaaaaabbbbbcccdd
A typical implementation of Huffman would result in the following map:
霍夫曼的典型实现将导致以下映射:
Bits Character
0 a
10 b
110 c
1110 d
So the file would be compressed to this:
所以文件将被压缩为:
00000000 10101010 10110110 11011101 11000000
^^^^^
Padding bits required
18 bytes go down to 5. Of course, the table must be included in the file. This algorithm works better with more data :P
18 个字节减少到 5 个。当然,该表必须包含在文件中。该算法适用于更多数据:P
Alex Allain has a nice articleon the Huffman Compression Algorithm in case the Wiki doesn't suffice.
Alex Allain 有一篇关于 Huffman 压缩算法的好文章,以防 Wiki 不够用。
Feel free to ask for more information. This topic is pretty darn wide.
随时询问更多信息。这个话题相当广泛。
回答by user984260
My paper A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems(permalink here) reviews many compression algorithms and also techniques for using them in modern processors. It reviews both research-grade and commercial-grade compression algorithms/techniques, so you may find one which has not yet been implemented in ASIC.
我的论文A Survey Of Data Compression for Data Compression in Cache and Main Memory Systems(此处为永久链接)回顾了许多压缩算法以及在现代处理器中使用它们的技术。它了研究级和商业级压缩算法/技术,因此您可能会发现尚未在 ASIC 中实现的算法/技术。
回答by sma
Here are some lossless algorithms (can perfectly recover the original data using these):
以下是一些无损算法(可以使用这些算法完美恢复原始数据):
- Huffman code
- LZ78 (and LZW variation)
- LZ77
- Arithmetic coding
- Sequitur
- prediction with partial match (ppm)
- 霍夫曼码
- LZ78(和 LZW 变体)
- LZ77
- 算术编码
- 顺位
- 部分匹配预测 (ppm)
Many of the well known formats like png or gif use variants or combinations of these.
许多众所周知的格式,如 png 或 gif 使用这些的变体或组合。
On the other hand there are lossy algorithms too (compromise accuracy to compress your data, but often works pretty well). State of the art lossy techniques combine ideas from differential coding, quantization, and DCT, among others.
另一方面,也有有损算法(牺牲准确性来压缩数据,但通常效果很好)。最先进的有损技术结合了来自差分编码、量化和 DCT 等的思想。
To learn more about data compression, I recommend https://www.elsevier.com/books/introduction-to-data-compression/sayood/978-0-12-809474-7. It is a very accessible introduction text. The 3rd edition out there in pdf online.
要了解有关数据压缩的更多信息,我推荐https://www.elsevier.com/books/introduction-to-data-compression/sayood/978-0-12-809474-7。这是一个非常容易理解的介绍文本。在线 pdf 版本的第 3 版。
回答by comingstorm
There are an awful lot of data compression algorithms around. If you're looking for something encyclopedic, I recommend the Handbook of Data Compressionby Salomon et al, which is about as comprehensive as you're likely to get (and has good sections on the principles and practice of data compression, as well).
周围有很多数据压缩算法。如果您正在寻找百科全书,我推荐Salomon 等人的Handbook of Data Compression,它与您可能获得的内容一样全面(并且也有关于数据压缩原理和实践的好部分) .
My best guess is that ASIC-based compression is usually implemented for a particular application, or as a specialized element of a SoC, rather than as a stand-alone compression chip. I also doubt that looking for a "latest and greatest" compression format is the way to go here -- I would expect standardization, maturity, and fitness for a particular purpose to be more important.
我最好的猜测是,基于 ASIC 的压缩通常是针对特定应用程序实现的,或者作为 SoC 的专用元素实现,而不是作为独立的压缩芯片。我也怀疑寻找“最新和最好的”压缩格式是否是通往这里的方式——我希望标准化、成熟度和特定用途的适用性更重要。
回答by schilippe
LZW or Lempel Ziv algorithm is a great lossless one. Pseudocode here: http://oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html
LZW 或 Lempel Ziv 算法是一种很好的无损算法。这里的伪代码:http: //oldwww.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html

