如何压缩/解压缩 PHP 中的长查询字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2996049/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 08:21:57  来源:igfitidea点击:

How to compress/decompress a long query string in PHP?

php

提问by jodeci

I doubt if this is encryption but I can't find a better phrase.I need to pass a long query string like this:

我怀疑这是否是加密,但我找不到更好的短语。我需要传递一个像这样的长查询字符串:

http://test.com/test.php?key=[some_very_loooooooooooooooooooooooong_query_string]

The query string contains NO sensitive information so I'm not really concerned about security in this case. It's just...well, too long and ugly. Is there a library function that can let me encode/encrypt/compress the query string into something similar to the result of a md5() (similar as in, always a 32 character string), but decode/decrypt/decompress-able?

查询字符串不包含敏感信息,因此在这种情况下我并不真正关心安全性。只是……好吧,太长太丑了。是否有一个库函数可以让我将查询字符串编码/加密/压缩为类似于 md5() 的结果(类似于,始终为 32 个字符的字符串),但可以解码/解密/解压缩?

回答by Gumbo

You could try a combination of gzdeflate(raw deflateformat) to compress your data and base64_encodeto use only those characters that are allowed without Percent-encoding (additionally exchange the characters +and /by -and _):

您可以尝试组合使用gzdeflate(raw deflateformat) 来压缩您的数据,并base64_encode仅使用那些没有百分比编码的字符(另外交换字符+/by-_):

$output = rtrim(strtr(base64_encode(gzdeflate($input, 9)), '+/', '-_'), '=');

And the reverse:

反过来:

$output = gzinflate(base64_decode(strtr($input, '-_', '+/')));

Here is an example:

下面是一个例子:

$input = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

// percent-encoding on plain text
var_dump(urlencode($input));

// deflated input
$output = rtrim(strtr(base64_encode(gzdeflate($input, 9)), '+/', '-_'), '=');
var_dump($output);

The savings in this case is about 23%. But the actual efficiency of this compression precedure depends on the data you are using.

在这种情况下节省了大约 23%。但是这种压缩程序的实际效率取决于您使用的数据。

回答by deceze

The basic premise is very difficult. Transporting any value in the URL means you're restricted to a subset of ASCII characters. Using any sort of compression like gzcompresswould reduce the size of the string, but result in a binary blob. That binary blob can't be transported in the URL though, since it would produce invalid characters. To transport that binary blob using a subset of ASCII you need to encode it in some way and turn it into ASCII characters.

基本前提是非常困难的。传输 URL 中的任何值意味着您只能使用 ASCII 字符的子集。使用任何类型的压缩gzcompress都会减小字符串的大小,但会导致二进制 blob。但是,该二进制 blob 无法在 URL 中传输,因为它会产生无效字符。要使用 ASCII 的子集传输该二进制 blob,您需要以某种方式对其进行编码并将其转换为 ASCII 字符。

So, you'd turn ASCII characters into something else which you'd then turn into ASCII characters.

因此,您可以将 ASCII 字符转换为其他内容,然后再将其转换为 ASCII 字符。

But actually, most of the time the ASCII characters you start out with are already the optimal length. Here a quick test:

但实际上,大多数情况下,您开始使用的 ASCII 字符已经是最佳长度。这是一个快速测试:

$str = 'Hello I am a very very very very long search string';
echo $str . "\n";
echo base64_encode(gzcompress($str, 9)) . "\n";
echo bin2hex(gzcompress($str, 9)) . "\n";
echo urlencode(gzcompress($str, 9)) . "\n";

Hello I am a very very very very long search string
eNrzSM3JyVfwVEjMVUhUKEstqkQncvLz0hWKUxOLkjMUikuKMvPSAc+AEoI=
78daf348cdc9c957f05448cc554854284b2daa442772f2f3d2158a53138b9233148a4b8a32f3d201cf801282
x%DA%F3H%CD%C9%C9W%F0TH%CCUHT%28K-%AAD%27r%F2%F3%D2%15%8AS%13%8B%923%14%8AK%8A2%F3%D2%01%CF%80%12%82

As you can see, the original string is the shortest. Among the encoded compressions, base64 is the shortest since it uses the largest alphabet to represent the binary data. It's still longer than the original though.

如您所见,原始字符串最短。在编码压缩中,base64 是最短的,因为它使用最大的字母表来表示二进制数据。不过还是比原版长。

For some very specific combination of characters with some very specific compression algorithm that compresses to ASCII representable data it may be possible to achieve some compression, but that's rather theoretical.Update:Actually, that sounds too negative. The thing is you need to figure out if compression makes sense for your use case. Different data compresses differently and different encoding algorithms work differently. Also, longer strings may achieve a better compression ratio. There's probably a sweet spot somewhere where some compression can be achieved. You need to figure out if you're in that sweet spot most of the time or not.

对于一些非常具体的字符组合和一些非常具体的压缩算法,压缩成 ASCII 可表示的数据,可能会实现一些压缩,但这只是理论上的。更新:实际上,这听起来太消极了。问题是您需要弄清楚压缩是否对您的用例有意义。不同的数据压缩方式不同,不同的编码算法工作方式不同。此外,更长的字符串可以实现更好的压缩比。某处可能有一个可以实现一些压缩的最佳位置。你需要弄清楚你是否大部分时间都在那个甜蜜点。

Something like md5 is unsuitable since md5 is a hash, which means it's non-reversible. You can't get the original value back from it.

像 md5 这样的东西是不合适的,因为 md5 是一个hash,这意味着它是不可逆的。您无法从中取回原始值。

I'm afraid you can only send the parameter via POST, if it doesn't work in the URL.

恐怕您只能通过 POST 发送参数,如果它在 URL 中不起作用。

回答by Akyhne

This works great for me:

这对我很有用:

$out = urlencode(base64_encode(gzcompress($in)));

Saves a lot.

节省很多。

$in = 'Hello I am a very very very very long search string' // (51)
$out = 64

$in = 500
$out = 328

$in = 1000
$out = 342

$in = 1500
$out = 352

So the longer the string, the better compression. The compression parameter, doesn't seem to have any effect.

所以字符串越长,压缩效果越好。压缩参数,似乎没有任何影响。

回答by Felix Kling

Update:
gzcompress()won't help you. For example if you take Pekka's answer:

更新:
gzcompress()不会帮助你。例如,如果您采用 Pekka 的回答:

String length: 640
Compressed string length: 375
URL encoded string length: 925
(with base64_encode, it is only 500 characters ;) )

字符串长度:640
压缩字符串长度:375
URL 编码字符串长度:925
(使用 base64_encode,只有 500 个字符;))

So this way (passing the data via the URL) is probably not the best way...

所以这种方式(通过 URL 传递数据)可能不是最好的方式......

If you don't exceed the URLs limits with the string, why do you care about howthe string looks like? I assume it gets created, sent and processed automatically anyway, doesn't it?

如果你不与字符串超过网址限制,你为什么要关心如何串样子?我假设它无论如何都会自动创建、发送和处理,不是吗?

But if you want to use it as e.g. some kind of confirmation link in an email, you have to think about something short and easy to type for the user anyway. You could, e.g. store all the needed data in a database and create some kind of token.

但是,如果您想将其用作例如电子邮件中的某种确认链接,则无论如何您都必须考虑一些简短且易于输入的内容。例如,您可以将所有需要的数据存储在数据库中并创建某种令牌。



Maybe gzcompress()can help you. But this will result in not allowed characters, so you will have to use urlencode()too (which makes the string longer and ugly again ;) ).

也许gzcompress()可以帮到你。但这将导致不允许的字符,因此您也必须使用urlencode()(这会使字符串再次变得更长和丑陋;))。

回答by ESL

Basically, it's like they say: Compress text, and send it coded in a usefully way. But:

基本上,就像他们说的:压缩文本,并以有用的方式发送它。但是

1)Common compression methods are heavier than text because of dictionaries. If the data is always an undetermined order of determined chunks of data (like in a text are words or syllabes[3], and numbers and some symbols) you could use always the same static dictionary, and don't send it (don't paste it on the URL). Then you can save the spaceof the dictionary.

1)常用的压缩方式比文本重,因为有字典。如果数据始终是确定的数据块的不确定顺序(如文本中的单词或音节 [3],以及数字和一些符号),您可以始终使用相同的静态字典,并且不要发送它(不要t 将其粘贴到 URL 上)。这样就可以节省字典的空间了。

1.a)If you are already sending the language (or if it's always the same), you could generate a dictionary per lang.

1.a)如果您已经在发送语言(或者它总是相同的),您可以为每个语言生成一个字典。

1.b)Take advantage of the format limitations. If you known it's a number, you can code it directly (see 3). If you known it's a date, you could coded as Unix-time[1] (seconds since 01/01/1970), so "21/05/2013 23:45:18" turns "519C070E" (hex); if it's a date of the year, you could coded as days since new year including 29/02 (25/08 would be 237).

1.b)利用格式限制。如果您知道它是一个数字,则可以直接对其进行编码(参见 3)。如果你知道这是一个日期,你可以编码为 Unix-time[1](自 01/01/1970 以来的秒数),所以“21/05/2013 23:45:18”变成“519C070E”(十六进制);如果它是一年中的某个日期,您可以编码为自新年以来的天数,包括 29/02(25/08 将是 237)。

1.3)You known emails has to follow certain rules, and usually are from the same few servers (gmail, yahoo, etc.) You could take advantages of that to compress it with your own simple method:

1.3)您知道电子邮件必须遵循某些规则,并且通常来自相同的几个服务器(gmail,yahoo等)您可以利用这一点用您自己的简单方法对其进行压缩:

[email protected],[email protected],[email protected] => samplemail1:1,samplemail2:5,samplemail3@idontknowyou:1

2)If the data follows patterns, you can use that to help compression. For example, if always follows this patter:

2)如果数据遵循模式,您可以使用它来帮助压缩。例如,如果始终遵循此模式:

name=[TEXT 1]&phone=[PHONE]&mail=[MAIL]&desc=[TEXT 2]&create=[DATE 1]&modified=[DATE 2]&first=[NUMBER 1]&last=[NUMBER 2]

You could: 2.a)Ignore the similar text, and compress just the variable text. Like:

您可以: 2.a)忽略相似的文本,只压缩可变文本。喜欢:

[TEXT1]|[PHONE]|[MAIL]|[TEXT 2]|[DATE 1]|[DATE 2]|[NUMBER 1][NUMBER 2]

2.b)Encode or compress data by type (encode numbers using base64[2] or similar). Like at 1). This allows you even to supress separators. Like:

2.b)按类型编码或压缩数据(使用 base64[2] 或类似方法编码数字)。就像在 1)。这甚至可以让您抑制分隔符。喜欢:

[DATE 1][DATE 2][NUMBER 1][NUMBER 2][PHONE][MAIL]|[TEXT 1]|[TEXT 2]

3)Coding:

3)编码:

3.a)While it is true that if we compress coding with characters not supported by HTTP, they will be transformed into a more heavy ones (like 'a?o' => 'a%C3%B1o'), that can still be useful. Maybe you wanna compress it for store it at a Unicode or binary database, or to pasteit at web sites (Facebook, Twitter, etc.).

3.a)虽然如果我们用 HTTP 不支持的字符压缩编码,它们确实会被转换成更重的字符(比如 'a?o' => 'a%C3%B1o'),但仍然可以有用。也许您想压缩它以将其存储在 Unicode 或二进制数据库中,或者将其粘贴到网站(Facebook、Twitter 等)中。

3.b)Although Base64[2] it is a good method, you can squeeze more at the expense of speed (as you use user functions instead of compiled ones).

3.b)虽然 Base64[2] 是一个很好的方法,但你可以以牺牲速度为代价压缩更多(因为你使用用户函数而不是编译函数)。

At least with Javascript's function encodeURI(), you can use any of these 80 characters at parameter value without suffering modifications:

至少使用 Javascript 的函数 encodeURI(),您可以在参数值中使用这 80 个字符中的任何一个,而无需修改:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.:,;+*-_/()$=!@?~'

So, we can buil our one "Base 80" (d)encode functions.

因此,我们可以构建我们的一个“Base 80”(d)编码函数。

回答by Madhur Bhaiya

Not really an answer, but a comparison of various methods suggested here.

不是真正的答案,而是此处建议的各种方法的比较。

Used answers by @Gumbo and @deceze to get length comparison for a fairly long string I am using in a GET.

使用@Gumbo 和@deceze 的答案来获取我在 GET 中使用的相当长的字符串的长度比较。

<?php
    $test_str="33036,33037,33038,38780,38772,37671,36531,38360,39173,38676,37888,36828,39176,39196,37321,36840,38519,37946,36543,39287,38989,38976,36804,38880,38922,38292,38507,38893,38993,39035,37880,38897,38378,36880,38492,38910,36868,38196,38750,37938,39268,38209,36856,36767,37936,36805,39248,36777,39027,39056,38987,38779,38919,38771,36851,38675,37887,38246,38791,38783,38661,37899,36846,36834,39263,37928,36822,37947,38992,38516,39177,38904,38896,37320,39217,37879,38293,38511,38774,37670,38185,37927,37939,38286,38298,38977,37891,38881,38197,38457,36962,39171,36760,36748,39249,39231,39191,36951,36963,36755,38769,38891,38654,38792,36863,36875,36956,36968,38978,38299,36743,36753,37896,38926,39270,38372,37948,39250,38763,38190,38678,36761,37925,36776,36844,37323,38781,38744,38321,38202,38793,38510,38288,36816,38384,37906,38184,38192,38745,39218,38673,39178,39198,39036,38504,36754,39180,37919,38768,38195,36850,38203,38672,38882,38071,39189,36795,36783,38870,38764,39028,36762,36750,38980,36958,37924,38884,37920,38877,36858,38493,36742,37895,36835,37907,36823,38762,38361,37937,38373,37949,36950,39202,38495,38291,36533,39037,36716,38925,37620,38906,37878,37322,38754,36818,39029,39264,38297,38517,36969,38905,36957,36789,36741,37908,38302,38775,39216,36812,38767,36845,36849,39181,39168,38671,39188,38490,36961,39201,36717,38382,38070,37868,38984,36770,38981,38494,36807,38885,36759,36857,38924,39038,38888,38876,36879,37897,36534,36764,37931,38254,39030,38990,37909,38982,38290,36848,37857,37923,38249,38658,38383,36813,36765,36817,37263,36769,37869,38183,36861,38206,39031,36800,36788,36972,38508,38303,39051,38491,38983,38759,36740,37958,36967,37930,39174,39182,36806,36867,36855,39222,37862,36752,38242,37965,38894,38182,37922,37918,36814,36872,38886,36860,36527,38194,38975,36718,39224,37436,39032";

    echo(strlen($test_str)); echo("<br>");

    echo(strlen(base64_encode(gzcompress($test_str,9)))); echo("<br>");

    echo(strlen(bin2hex(gzcompress($test_str, 9)))); echo("<br>");

    echo(strlen(urlencode(gzcompress($test_str, 9)))); echo("<br>");

    echo(strlen(rtrim(strtr(base64_encode(gzdeflate($test_str, 9)), '+/', '-_'), '=')));
?>

Here are the results:

结果如下:

1799  (original length string)
928   (51.58% compression)
1388
1712
918   (51.028% compression)

Results are comparable for base64_encode with gzcompress AND base64_encode with gzdeflate (and some string transalations). gzdeflateseems to give slightly better efficiency

带有 gzcompress 的 base64_encode 和带有 gzdeflate 的 base64_encode(以及一些字符串转换)的结果相当。gzdeflate似乎提供了稍微好一点的效率

回答by stubben

These functions will compress and decompress a string or an array.

这些函数将压缩和解压缩字符串或数组。

Sometimes you might want to GET an array.

有时您可能想要获取一个数组。

function _encode_string_array ($stringArray) {
    $s = strtr(base64_encode(addslashes(gzcompress(serialize($stringArray),9))), '+/=', '-_,');
    return $s;
}

function _decode_string_array ($stringArray) {
    $s = unserialize(gzuncompress(stripslashes(base64_decode(strtr($stringArray, '-_,', '+/=')))));
    return $s;
}

回答by stubben

For long/very long string values, you would like to use POST method instead of GET !

对于长/非常长的字符串值,您希望使用 POST 方法而不是 GET !

for a good encoding you might wanna try urlencode()/urldecode()

为了获得良好的编码,您可能想尝试 urlencode()/urldecode()

or htmlentities()/html_entity_decode()

或 htmlentities()/html_entity_decode()

Also be carefull that '%2F' is translated to the browser as the '/' char (directory separator). If you use only urlencode you might wanna do a replace on it.

还要注意 '%2F' 被转换为浏览器作为 '/' 字符(目录分隔符)。如果您只使用 urlencode,您可能想对其进行替换。

i don't recommend gzcompress on GET parameters.

我不建议在 GET 参数上使用 gzcompress。

回答by Pekka

base64_encodemakes the string unreadable (while of course easily decodable) but blows the volume up by 33%.

base64_encode使字符串不可读(当然很容易解码)但将音量提高了 33%。

urlencode()turns any characters unsuitable for URLs into their URL-encoded counterparts. If your aim is to make the string work in the URL, this may be the right way for you.

urlencode()将任何不适合 URL 的字符转换为其 URL 编码的对应字符。如果您的目标是使字符串在 URL 中起作用,那么这可能是您的正确方法。

If you have a session running, you could also consider putting the query string into a session variable with a random (small) number, and put that random number into the GET string. This method won't survive longer than the current session, of course.

如果您有一个会话正在运行,您还可以考虑将查询字符串放入一个带有随机(小)数的会话变量中,然后将该随机数放入 GET 字符串中。当然,此方法的存活时间不会超过当前会话。

Note that a GET string should never exceed 1-2 kilobytes in size due to server and browser limitations.

请注意,由于服务器和浏览器的限制,GET 字符串的大小不应超过 1-2 KB。