string Base64长度计算?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13378815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Base64 length calculation?
提问by Royi Namir
After reading the base64 wiki...
阅读 base64 wiki 后...
I'm trying to figure out how'sthe formula working :
我试图弄清楚公式是如何工作的:
Given a string with length of n
, the base64 length will be
给定长度为 的字符串n
,base64 长度将为
Which is : 4*Math.Ceiling(((double)s.Length/3)))
这是: 4*Math.Ceiling(((double)s.Length/3)))
I already know that base64 length must be %4==0
to allow the decoder know what was the original text length.
我已经知道 base64 长度必须是%4==0
让解码器知道原始文本长度是多少。
The max number of padding for a sequence can be =
or ==
.
序列的最大填充数可以是=
或==
。
wiki :The number of output bytes per input byte is approximately 4 / 3 (33% overhead)
wiki : 每个输入字节的输出字节数约为 4 / 3(33% 开销)
Question:
题:
Howdoes the information above settle with the output length ?
上面的信息如何与输出长度有关 ?
回答by Paul R
Each character is used to represent 6 bits (log2(64) = 6
).
每个字符用于表示 6 位 ( log2(64) = 6
)。
Therefore 4 chars are used to represent 4 * 6 = 24 bits = 3 bytes
.
因此使用 4 个字符来表示4 * 6 = 24 bits = 3 bytes
.
So you need 4*(n/3)
chars to represent n
bytes, and this needs to be rounded up to a multiple of 4.
所以你需要用4*(n/3)
字符来表示n
字节,这需要四舍五入为 4 的倍数。
The number of unused padding chars resulting from the rounding up to a multiple of 4 will obviously be 0, 1, 2 or 3.
由四舍五入到 4 的倍数产生的未使用填充字符的数量显然是 0、1、2 或 3。
回答by Ren
4 * n / 3
gives unpadded length.
4 * n / 3
给出未填充的长度。
And round up to the nearest multiple of 4 for padding, and as 4 is a power of 2 can use bitwise logical operations.
并且四舍五入到最接近的 4 的倍数进行填充,因为 4 是 2 的幂可以使用按位逻辑运算。
((4 * n / 3) + 3) & ~3
回答by David Schwartz
For reference, the Base64 encoder's length formula is as follows:
作为参考,Base64编码器的长度公式如下:
As you said, a Base64 encoder given n
bytes of data will produce a string of 4n/3
Base64 characters. Put another way, every 3 bytes of data will result in 4 Base64 characters. EDIT: A comment correctly points out that my previous graphic did not account for padding; the correct formula isCeiling(4n/3)
.
正如您所说,给定n
字节数据的 Base64 编码器将产生一串4n/3
Base64 字符。换句话说,每 3 个字节的数据将产生 4 个 Base64 字符。编辑:评论正确指出我以前的图形没有考虑填充;正确的公式是Ceiling(4n/3)
。
The Wikipedia article shows exactly how the ASCII string Man
encoded into the Base64 string TWFu
in its example. The input string is 3 bytes, or 24 bits, in size, so the formula correctly predicts the output will be 4 bytes (or 32 bits) long: TWFu
. The process encodes every 6 bits of data into one of the 64 Base64 characters, so the 24-bit input divided by 6 results in 4 Base64 characters.
维基百科文章在其示例中准确显示了 ASCII 字符串如何Man
编码为 Base64 字符串TWFu
。输入字符串是3个字节,或24位,在大小,所以式正确地预测的输出将是4个字节(或32位)长:TWFu
。该过程将每 6 位数据编码为 64 个 Base64 字符之一,因此 24 位输入除以 6 产生 4 个 Base64 字符。
You ask in a comment what the size of encoding 123456
would be. Keeping in mind that every every character of that string is 1 byte, or 8 bits, in size (assuming ASCII/UTF8 encoding), we are encoding 6 bytes, or 48 bits, of data. According to the equation, we expect the output length to be (6 bytes / 3 bytes) * 4 characters = 8 characters
.
您在评论中询问编码的大小是多少123456
。请记住,该字符串的每个字符的大小都是 1 字节或 8 位(假设是 ASCII/UTF8 编码),我们正在对 6 字节或 48 位数据进行编码。根据等式,我们期望输出长度为(6 bytes / 3 bytes) * 4 characters = 8 characters
。
Putting 123456
into a Base64 encoder creates MTIzNDU2
, which is 8 characters long, just as we expected.
把123456
为Base64编码器创建MTIzNDU2
,这是8个字符长,正如我们的预期。
回答by Maarten Bodewes
Integers
整数
Generally we don't want to use doubles because we don't want to use the floating point ops, rounding errors etc. They are just not necessary.
通常我们不想使用双精度数,因为我们不想使用浮点运算、舍入误差等。它们只是没有必要。
For this it is a good idea to remember how to perform the ceiling division: ceil(x / y)
in doubles can be written as (x + y - 1) / y
(while avoiding negative numbers, but beware of overflow).
为此,最好记住如何执行上限除法:ceil(x / y)
在双打中可以写为(x + y - 1) / y
(同时避免负数,但要注意溢出)。
Readable
可读
If you go for readability you can of course also program it like this (example in Java, for C you could use macro's, of course):
如果您追求可读性,您当然也可以这样编程(例如在 Java 中,对于 C,您当然可以使用宏):
public static int ceilDiv(int x, int y) {
return (x + y - 1) / y;
}
public static int paddedBase64(int n) {
int blocks = ceilDiv(n, 3);
return blocks * 4;
}
public static int unpaddedBase64(int n) {
int bits = 8 * n;
return ceilDiv(bits, 6);
}
// test only
public static void main(String[] args) {
for (int n = 0; n < 21; n++) {
System.out.println("Base 64 padded: " + paddedBase64(n));
System.out.println("Base 64 unpadded: " + unpaddedBase64(n));
}
}
Inlined
内联
Padded
加垫
We know that we need 4 characters blocks at the time for each 3 bytes (or less). So then the formula becomes (for x = n and y = 3):
我们知道,对于每 3 个字节(或更少),我们一次需要 4 个字符块。那么公式变为(对于 x = n 和 y = 3):
blocks = (bytes + 3 - 1) / 3
chars = blocks * 4
or combined:
或组合:
chars = ((bytes + 3 - 1) / 3) * 4
your compiler will optimize out the 3 - 1
, so just leave it like this to maintain readability.
你的编译器会优化掉3 - 1
,所以就这样保持可读性。
Unpadded
无衬垫
Less common is the unpadded variant, for this we remember that each we need a character for each 6 bits, rounded up:
不太常见的是未填充的变体,为此我们记得每个 6 位我们需要一个字符,四舍五入:
bits = bytes * 8
chars = (bits + 6 - 1) / 6
or combined:
或组合:
chars = (bytes * 8 + 6 - 1) / 6
we can however still divide by two (if we want to):
然而,我们仍然可以除以二(如果我们愿意):
chars = (bytes * 4 + 3 - 1) / 3
Unreadable
不可读
In case you don't trust your compiler to do the final optimizations for you (or if you want to confuse your colleagues):
如果你不相信你的编译器会为你做最后的优化(或者如果你想迷惑你的同事):
Padded
加垫
((n + 2) / 3) << 2
Unpadded
无衬垫
((n << 2) | 2) / 3
So there we are, two logical ways of calculation, and we don't need any branches, bit-ops or modulo ops - unless we really want to.
所以我们有,两种逻辑计算方式,我们不需要任何分支,位操作或模操作 - 除非我们真的想要。
Notes:
笔记:
- Obviously you may need to add 1 to the calculations to include a null termination byte.
- For Mime you may need to take care of possible line termination characters and such (look for other answers for that).
- 显然,您可能需要在计算中加 1 以包含空终止字节。
- 对于 Mime,您可能需要处理可能的行终止字符等(为此寻找其他答案)。
回答by Pedro Silva
Here is a function to calculate the original size of an encoded Base 64 file as a String in KB:
这是一个函数,用于将编码的 Base 64 文件的原始大小计算为以 KB 为单位的字符串:
private Double calcBase64SizeInKBytes(String base64String) {
Double result = -1.0;
if(StringUtils.isNotEmpty(base64String)) {
Integer padding = 0;
if(base64String.endsWith("==")) {
padding = 2;
}
else {
if (base64String.endsWith("=")) padding = 1;
}
result = (Math.ceil(base64String.length() / 4) * 3 ) - padding;
}
return result / 1000;
}
回答by Ian Nartowicz
I think the given answers miss the point of the original question, which is how much space needs to be allocated to fit the base64 encoding for a given binary string of length n bytes.
我认为给定的答案忽略了原始问题的要点,即需要分配多少空间来适合给定长度为 n 字节的二进制字符串的 base64 编码。
The answer is (floor(n / 3) + 1) * 4 + 1
答案是 (floor(n / 3) + 1) * 4 + 1
This includes padding and a terminating null character. You may not need the floor call if you are doing integer arithmetic.
这包括填充和终止空字符。如果您在进行整数运算,您可能不需要发言。
Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately.
包括填充,base64 字符串对于原始字符串的每三个字节块需要四个字节,包括任何部分块。添加填充时,字符串末尾的一两个额外字节仍将转换为 base64 字符串中的四个字节。除非您有非常具体的用途,否则最好添加填充,通常是一个等于字符。我在 C 中为空字符添加了一个额外的字节,因为没有这个的 ASCII 字符串有点危险,您需要单独携带字符串长度。
回答by Valo
Seems to me that the right formula should be:
在我看来,正确的公式应该是:
n64 = 4 * (n / 3) + (n % 3 != 0 ? 4 : 0)
回答by Michael Adams
While everyone else is debating algebraic formulas, I'd rather just use BASE64 itself to tell me:
当其他人都在争论代数公式时,我宁愿使用 BASE64 本身来告诉我:
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately."| wc -c
525
525
$ echo "Including padding, a base64 string requires four bytes for every three-byte chunk of the original string, including any partial chunks. One or two bytes extra at the end of the string will still get converted to four bytes in the base64 string when padding is added. Unless you have a very specific use, it is best to add the padding, usually an equals character. I added an extra byte for a null character in C, because ASCII strings without this are a little dangerous and you'd need to carry the string length separately." | base64 | wc -c
710
710
So it seems the formula of 3 bytes being represented by 4 base64 characters seems correct.
因此,由 4 个 base64 字符表示的 3 个字节的公式似乎是正确的。
回答by igerard
I believe that this one is an exact answer if n%3 not zero, no ?
如果 n%3 不为零,我相信这是一个准确的答案,不是吗?
(n + 3-n%3)
4 * ---------
3
Mathematica version :
数学版:
SizeB64[n_] := If[Mod[n, 3] == 0, 4 n/3, 4 (n + 3 - Mod[n, 3])/3]
Have fun
玩得开心
GI
胃肠道
回答by qoomon
Simple implementantion in javascript
javascript中的简单实现
function sizeOfBase64String(base64String) {
if (!base64String) return 0;
const padding = (base64String.match(/(=*)$/) || [])[1].length;
return 4 * Math.ceil((base64String.length / 3)) - padding;
}