JavaScript 中的字符串压缩
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4570333/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
String compression in JavaScript
提问by Bambax
I'm looking for a JavaScript function that given a string returns a compressed (shorter) string.
我正在寻找一个 JavaScript 函数,它给定一个字符串,返回一个压缩的(较短的)字符串。
I'm developing a Chrome web application that saves long strings (HTML) to a local database. For testing purposes I tried to zip the file storing the database, and it shrank by a factor of five, so I figured it would help keep the database smaller if I compressed the things it stores.
我正在开发一个将长字符串 (HTML) 保存到本地数据库的 Chrome 网络应用程序。出于测试目的,我尝试压缩存储数据库的文件,它缩小了五倍,所以我认为如果我压缩它存储的内容,它将有助于保持数据库更小。
I've found an implementation of LZSS in JavaScript here: http://code.google.com/p/u-lzss/("U-LZSS").
我在 JavaScript 中找到了 LZSS 的实现:http: //code.google.com/p/u-lzss/(“U-LZSS”)。
It seemed to work when I tested it "by hand" with short example strings (decode === encode), and it's reasonably fast too, in Chrome. But when given big strings (100 ko) it seems to garble/mix up the last half of the string.
当我使用简短的示例字符串(解码 === 编码)“手动”测试它时,它似乎有效,而且在 Chrome 中它也相当快。但是当给定大字符串(100 ko)时,它似乎会混淆/混淆字符串的后半部分。
Is it possible that U-LZSS expects short strings and can't deal with larger strings? And would it be possible to adjust some parameters in order to move that upper limit?
U-LZSS 是否可能需要较短的字符串而无法处理较大的字符串?是否可以调整一些参数以移动该上限?
采纳答案by Bambax
At Piskvor's suggestion, I tested the code found in an answer to this question: JavaScript implementation of Gzip(top-voted answer: LZW implementation) and found that:
在 Piskvor 的建议下,我测试了在这个问题的答案中找到的代码:Gzip 的 JavaScript 实现(最高投票答案:LZW 实现)并发现:
- it works
- it reduces the size of the database by a factor of two
- 有用
- 它将数据库的大小减少了两倍
... which is less than 5 but better than nothing! So I used that.
... 小于 5 但总比没有好!所以我用了那个。
(I wish I could have accepted an answer by Piskvor but it was only a comment).
(我希望我能接受 Piskvor 的回答,但这只是评论)。
回答by pieroxy
I just released a small LZWimplementation especially tailored for this very purpose as none of the existing implementations did meet my needs.
我刚刚发布了一个小型LZW实现,专门为此目的量身定制,因为现有的实现都不能满足我的需求。
That's what I'm using going forward, and I will probably try to improve the library at some point.
这就是我正在使用的东西,我可能会在某个时候尝试改进库。
回答by Dave Brown
Here are encode (276 bytes, function en) and decode (191 bytes, function de) functions I modded from LZW in a fully working demo. There is no smaller or faster routine available on the internet than what I am giving you here.
这是我在一个完整的演示中从 LZW 修改的编码(276 字节,函数 en)和解码(191 字节,函数 de)函数。互联网上没有比我在这里给你的更小或更快的例程。
function en(c){var x='charCodeAt',b,e={},f=c.split(""),d=[],a=f[0],g=256;for(b=1;b<f.length;b++)c=f[b],null!=e[a+c]?a+=c:(d.push(1<a.length?e[a]:a[x](0)),e[a+c]=g,g++,a=c);d.push(1<a.length?e[a]:a[x](0));for(b=0;b<d.length;b++)d[b]=String.fromCharCode(d[b]);return d.join("")}
function de(b){var a,e={},d=b.split(""),c=f=d[0],g=[c],h=o=256;for(b=1;b<d.length;b++)a=d[b].charCodeAt(0),a=h>a?d[b]:e[a]?e[a]:f+c,g.push(a),c=a.charAt(0),e[o]=f+c,o++,f=a;return g.join("")}
var compressed=en("http://www.ScriptCompress.com - Simple Packer/Minify/Compress JavaScript Minify, Fixify & Prettify 75 JS Obfuscators In 1 App 25 JS Compressors (Gzip, Bzip, LZMA, etc) PHP, HTML & JS Packers In 1 App PHP Source Code Packers Text Packer HTML Packer or v2 or v3 or LZW Twitter Compress or More Words DNA & Base64 Packer (freq tool) or v2 JS JavaScript Code Golfer Encode Between Quotes Decode Almost Anything Password Protect Scripts HTML Minifier v2 or Encoder or Escaper CSS Minifier or Compressor v2 SVG Image Shrinker HTML To: SVG or SVGZ (Gzipped) HTML To: PNG or v2 2015 JS Packer v2 v3 Embedded File Generator Extreme Packer or version 2 Our Blog DemoScene JS Packer Basic JS Packer or New Version Asciify JavaScript Escape JavaScript Characters UnPacker Packed JS JavaScript Minify/Uglify Text Splitter/Chunker Twitter, Use More Characters Base64 Drag 'n Drop Redirect URL DataURI Get Words Repeated LZMA Archiver ZIP Read/Extract/Make BEAUTIFIER & CODE FIXER WHAK-A-SCRIPT JAVASCRIPT MANGLER 30 STRING ENCODERS CONVERTERS, ENCRYPTION & ENCODERS 43 Byte 1px GIF Generator Steganography PNG Generator WEB APPS VIA DATAURL OLD VERSION OF WHAK PAKr Fun Text Encrypt Our Google");
var decompressed=de(compressed);
document.writeln('<hr>'+compressed+'<hr><h1>'+compressed.length+' characters versus original '+decompressed.length+' characters.</h1><hr>'+decompressed+'<hr>');
回答by 6502
To me it doesn't seem reasonable to compress a string using UTF-8 as the destination... It looks like just looking for trouble. I think it would be better to lose some compression and using plain 7-bit ASCII as the destination.
对我来说,使用 UTF-8 作为目标来压缩字符串似乎不合理......看起来只是在寻找麻烦。我认为最好放弃一些压缩并使用普通的 7 位 ASCII 作为目标。
In a toy 4 KB JavaScript demoI wrote for fun I used an encoding for the result of compression that stores four binary bytes into five chars chosen from a subset of ASCII of 85 chars that is clean for embedding in a JavaScript string (85^5 is slightly more than 8^4, but still fits in the precision of JavaScript integers). This makes compressed data safe for example for JSONwithout need of any escaping.
在我为了好玩而编写的玩具4 KB JavaScript 演示中,我对压缩结果使用了一种编码,该编码将四个二进制字节存储到五个字符中,这些字符是从 85 个字符的 ASCII 子集中选择的,这些字符很干净,可以嵌入 JavaScript 字符串 (85^5略大于 8^4,但仍符合 JavaScript 整数的精度)。这使得压缩数据安全,例如JSON,无需任何转义。
回答by cherouvim
Try experimenting with textfiles before implementing anything because I think that the following does not necessarily hold:
在实施任何事情之前尝试使用文本文件进行试验,因为我认为以下不一定成立:
so I figured it would help keep the database smaller if I compressed the things it stores.
所以我想如果我压缩它存储的东西,它会帮助保持数据库更小。
That's because lossless compression algorithms are pretty good with repeating patterns (e.g whitespace).
那是因为无损压缩算法非常适合重复模式(例如空白)。
回答by Nils Ziehn
I think you should also look into lz-stringit's fast a compresses quite well and has some advantages they list on their page:
我认为您还应该查看lz-string它的压缩速度非常快,并且在其页面上列出了一些优点:
What about other libraries?
其他图书馆呢?
- some LZW implementations which gives you back arrays of numbers (terribly inefficient to store as tokens take 64bits) and don't support any character above 255.
- some other LZW implementations which gives you back a string (less terribly inefficient to store but still, all tokens take 16 bits) and don't support any character above 255.
- an LZMA implementation that is asynchronous and very slow - but hey, it's LZMA, not the implementation that is slow.
- a GZip implementation not really meant for browsers but meant for node.js, which weighted 70kb (with deflate.js and crc32.js on which it depends).
- 一些 LZW 实现为您提供数字数组(存储为 64 位的令牌效率极低)并且不支持任何大于 255 的字符。
- 一些其他的 LZW 实现会返回一个字符串(存储效率较低,但仍然需要 16 位)并且不支持任何大于 255 的字符。
- 一个异步且非常慢的 LZMA 实现 - 但是,嘿,它是 LZMA,而不是缓慢的实现。
- 一个 GZip 实现并不是真正适用于浏览器,而是适用于 node.js,它的权重为 70kb(它依赖于 deflate.js 和 crc32.js)。
The reasons why the author created lz-string:
作者创建lz-string的原因:
- Working on mobile I needed something fast.
- Working with Strings gathered from outside my website, I needed something that can take any kind of string as an input, including any UTF characters above 255.
- The library not taking 70kb was a definitive plus. Something that produces strings as compact as possible to store in localStorage. So none of the libraries I could find online worked well for my needs.
- 在移动设备上工作我需要一些快速的东西。
- 使用从我的网站外部收集的字符串,我需要一些可以将任何类型的字符串作为输入的东西,包括 255 以上的任何 UTF 字符。
- 不占用 70kb 的库是一个明确的加分项。产生尽可能紧凑的字符串以存储在 localStorage 中的东西。所以我在网上找不到的图书馆都不能很好地满足我的需求。
There are implementations of this lib in other languages, I am currently looking into the python implementation, but the decompression seems to have issues at the moment, but if you stick to JS only it looks really good to me.
这个库有其他语言的实现,我目前正在研究python的实现,但目前解压似乎有问题,但如果你只坚持使用JS,它对我来说看起来真的很好。
回答by 4esn0k
It seems, there is a proposal of compression/decompression API: https://github.com/wicg/compression/blob/master/explainer.md.
看来,有一个压缩/解压缩API的提议:https: //github.com/wicg/compression/blob/master/explainer.md。
And it is implemented in Chrome 80 (right now in Beta) according to a blog post at https://blog.chromium.org/2019/12/chrome-80-content-indexing-es-modules.html.
根据https://blog.chromium.org/2019/12/chrome-80-content-indexing-es-modules.html上的博客文章,它在 Chrome 80(目前处于 Beta 版)中实现。
I am not sure I am doing a good conversion between streams and strings, but here is my try to use the new API:
我不确定我是否在流和字符串之间进行了良好的转换,但这是我尝试使用新 API:
var encoding = 'deflate'; // or 'gzip'
function compress(text) {
var byteArray = new TextEncoder().encode(text);
var cs = new CompressionStream(encoding);
var writer = cs.writable.getWriter();
writer.write(byteArray);
writer.close();
return new Response(cs.readable).arrayBuffer();
}
function decompress(byteArray) {
var cs = new DecompressionStream(encoding);
var writer = cs.writable.getWriter();
writer.write(byteArray);
writer.close();
return new Response(cs.readable).arrayBuffer().then(function (arrayBuffer) {
return new TextDecoder().decode(arrayBuffer);
});
}
var test = "http://www.ScriptCompress.com - Simple Packer/Minify/Compress JavaScript Minify, Fixify & Prettify 75 JS Obfuscators In 1 App 25 JS Compressors (Gzip, Bzip, LZMA, etc) PHP, HTML & JS Packers In 1 App PHP Source Code Packers Text Packer HTML Packer or v2 or v3 or LZW Twitter Compress or More Words DNA & Base64 Packer (freq tool) or v2 JS JavaScript Code Golfer Encode Between Quotes Decode Almost Anything Password Protect Scripts HTML Minifier v2 or Encoder or Escaper CSS Minifier or Compressor v2 SVG Image Shrinker HTML To: SVG or SVGZ (Gzipped) HTML To: PNG or v2 2015 JS Packer v2 v3 Embedded File Generator Extreme Packer or version 2 Our Blog DemoScene JS Packer Basic JS Packer or New Version Asciify JavaScript Escape JavaScript Characters UnPacker Packed JS JavaScript Minify/Uglify Text Splitter/Chunker Twitter, Use More Characters Base64 Drag 'n Drop Redirect URL DataURI Get Words Repeated LZMA Archiver ZIP Read/Extract/Make BEAUTIFIER & CODE FIXER WHAK-A-SCRIPT JAVASCRIPT MANGLER 30 STRING ENCODERS CONVERTERS, ENCRYPTION & ENCODERS 43 Byte 1px GIF Generator Steganography PNG Generator WEB APPS VIA DATAURL OLD VERSION OF WHAK PAKr Fun Text Encrypt Our Google";
console.time('compress');
compress(test).then(function (x) {
console.timeEnd('compress');
console.log('compressed length', x.byteLength);
console.time('decompress');
decompress(x).then(function (y) {
console.timeEnd('decompress');
console.log('decompressed length', y.length);
console.assert(test === y);
});
});