Java Base64 与 HEX 在 XML 文档中通过 Internet 发送二进制内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3183841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Base64 vs HEX for sending binary content over the internet in XML doc
提问by jax
What is the best way of sending binary content between system inside an XML document
在 XML 文档内的系统之间发送二进制内容的最佳方式是什么
I know of Base64 and Hex, what is the real difference. I am currently using Base64 but need to include an external commons library for this where as with HEX I think I could just create a function.
我知道 Base64 和 Hex,真正的区别是什么。我目前正在使用 Base64,但需要为此包含一个外部公共库,与 HEX 一样,我认为我可以创建一个函数。
采纳答案by Jon Skeet
You could just write your own method for Base64 as well... but I'd generally recommend using external, well-tested libraries for both. (It's not like there's any shortage of them.)
您也可以为 Base64 编写自己的方法......但我通常建议为两者使用外部的、经过良好测试的库。(这并不像他们有任何短缺。)
The difference between Base64 and hex is really just how bytes are represented. Hex is another way of saying "Base16". Hex will take two characters for each byte - Base64 takes 4 characters for every 3 bytes, so it's more efficient than hex. Assuming you're using UTF-8 to encode the XML document, a 100K file will take 200K to encode in hex, or 133K in Base64. Of course it may well be that you don't care about the space efficiency - in many cases it won't matter. If it doesmatter, then clearly Base64 is better on that front. (There are alternatives which are even more efficient, but they're not as common.)
Base64 和十六进制之间的区别实际上只是字节的表示方式。十六进制是另一种说法“Base16”。十六进制每个字节需要两个字符 - Base64 每 3 个字节需要 4 个字符,因此它比十六进制更有效。假设您使用 UTF-8 对 XML 文档进行编码,一个 100K 的文件将需要 200K 以十六进制编码,或 133K 以 Base64 编码。当然,很可能你不关心空间效率——在很多情况下,这无关紧要。如果确实重要,那么显然 Base64 在这方面更好。(有一些更有效的替代方法,但它们并不常见。)
回答by sharptooth
base64 has less overhead (base64 produces 4 characters for every 3 bytes of original data while hex produces 2 characters for every byte of original data). Hex is more readable - you just look at the two characters and immediately know what byte is behind, but with base64 you need effort decoding the 4-characters group, so debugging will be easier with hex.
base64 的开销较小(base64 为原始数据的每 3 个字节产生 4 个字符,而 hex 为原始数据的每个字节产生 2 个字符)。十六进制更具可读性 - 您只需查看两个字符即可立即知道后面的字节是什么,但是使用 base64 您需要努力解码 4 个字符组,因此使用十六进制进行调试会更容易。
回答by user207421
There only two 'real differences':
只有两个“真正的区别”:
The radix. Base64 is base-64, surprise, and hex is base-16.
The encoding: base-64 encodes 3 source bytes into 4 base-64 characters (http://en.wikipedia.org/wiki/Base64#Examples); hex encodes 1 byte into 2 hex characters.
基数。Base64 是 base-64,令人惊讶的是,十六进制是 base-16。
编码:base-64 将 3 个源字节编码为 4 个 base-64 字符(http://en.wikipedia.org/wiki/Base64#Examples);hex 将 1 个字节编码为 2 个十六进制字符。
So base64 is more compact than hex.
所以 base64 比十六进制更紧凑。
回答by sheldonh
Other answers made clear the efficiency difference between base16 and base64.
其他答案明确了 base16 和 base64 之间的效率差异。
There is more to base selection than efficiency.
基础选择不仅仅是效率。
Base64 uses more than just letters and numbers. Different implementationsuse different punctuation characters for indiciating padding, and making up the last two characters of the set of 64. These can include plus "+" and equal "=". both problematic in HTTP query strings.
Base64 不仅仅使用字母和数字。不同的实现使用不同的标点符号来表示填充,并组成 64 个字符集的最后两个字符。这些可以包括加号“+”和等号“=”。在 HTTP 查询字符串中都有问题。
So one reason to favour base16 over base64 is that base16 values can be composed directly into HTTP query strings without requiring additional encoding. Is that important to you?
因此,支持 base16 而不是 base64 的一个原因是 base16 值可以直接组合到 HTTP 查询字符串中,而无需额外的编码。这对你很重要吗?
Notice that this is an additional concern, over and above efficiency. Neither base is inherently better or worse; they're just two different points on a scale, at which you'll find different properties that will be more or less attractive in different situations.
请注意,除了效率之外,这是一个额外的问题。这两个基础本质上都不是更好或更差;它们只是尺度上的两个不同点,在这些点上,您会发现在不同情况下或多或少具有吸引力的不同属性。
For example, consider base32. It's 20% less efficient than base64, but is still suitable for use in HTTP query strings. Most of its inefficiency comes from being case-insensitive and avoiding zero "0" and one "1", to mistakes in reproduction by humans.
例如,考虑base32。它的效率比 base64 低 20%,但仍然适用于 HTTP 查询字符串。它的大部分低效率来自不区分大小写和避免零“0”和一个“1”,以及人类繁殖的错误。
So base32 introduces a new concern; ease of reproduction for humans. Is that a concern for you? If it's not, you could go for something like base62, which is still convenient in HTTP query strings, but is case sensitive and includes zero "0" and "1".
所以 base32 引入了一个新的关注点;便于人类繁殖。这对你来说是个问题吗?如果不是,您可以使用 base62 之类的东西,它在 HTTP 查询字符串中仍然很方便,但区分大小写并且包括零“0”和“1”。
Hopefully, I've clarified that the selection of your encoding base is a matter of sliding along a scale until you get the best efficiency you can have before sacrificing what's important to you.
希望我已经澄清,您的编码基础的选择是一个沿着比例滑动的问题,直到您在牺牲对您来说重要的东西之前获得最佳效率。
Wikipedia has a fun list of numeral systems.
维基百科有一个有趣的数字系统列表。
回答by hfossli
Is size important to you?
尺寸对你很重要吗?
Base64 is more space efficient. Using 4 characters to represent 3 bytes where as hex uses 2 characters for each byte. In other words: hex increases the size of the string with 100%. For small strings that fit as params in url requests I wouldn't mind the extra cost/size.
Base64 更节省空间。使用 4 个字符表示 3 个字节,而十六进制为每个字节使用 2 个字符。换句话说:十六进制将字符串的大小增加 100%。对于适合作为 url 请求中的参数的小字符串,我不介意额外的成本/大小。
Is ease of use important to you?
易用性对您来说重要吗?
Hex is easier to use than Base64 because you don't need to escape (it may contain +
, =
and /
) when using the string as a get parameter in url requests.
Hex 比 Base64 更容易使用,因为在 url 请求中使用字符串作为 get 参数时不需要转义(它可能包含+
, =
and /
)。
Is widespread use important to you?
广泛使用对您来说重要吗?
I don't have the numbers, but Base64 might be more known to the general developer than hex depending on several factors. I knew about base64 long before hex (base16).
我没有数字,但根据几个因素,一般开发人员可能比十六进制更了解 Base64。我早在十六进制(base16)之前就知道base64。
回答by Mitch McMabers
I was curious how on EARTH base64 can convert 3 input bytes into 4 output bytes for just 33% space growth (whereas hex converts 1 input byte into 2 output bytes for 100% space growth). Why specifically 3 input bytes?
我很好奇在 EARTH base64 上如何将 3 个输入字节转换为 4 个输出字节,空间增长仅为 33%(而十六进制将 1 个输入字节转换为 2 个输出字节,空间增长为 100%)。为什么特别是 3 个输入字节?
The answer is:
答案是:
3 bytes = 3 x 8 bits = 24 bits.
3 个字节 = 3 x 8 位 = 24 位。
Why that magic "24 bits" number? Well, base 64 represents the numbers 0 to 63. How are those represented in binary? With 000000 (0) to 111111 (63).
为什么是那个神奇的“24 位”数字?好吧,基数 64 代表数字 0 到 63。这些数字是如何用二进制表示的?000000 (0) 到 111111 (63)。
Bingo! Each base64 character represents 6 bits of input data using a single output byte (a single character such as "Z", etc).
答对了!每个 base64 字符使用单个输出字节(单个字符,如“Z”等)表示 6 位输入数据。
So 24 bits (3 full 8-bit bytes of input) / 6 bits (base64 alphabet) = 4 bytes of base64. That's it!
所以 24 位(3 个完整的 8 位输入字节)/ 6 位(base64 字母表)= 4 个字节的 base64。就是这样!
Or, described another way, every Base64 character (which is 1 byte (8 bits)) encodes 6 bits of real data. And if we divide 8bits/6bits we see where the 33% growth comes from, as mentioned at the top of this post... So yes, Base64 always increases data size by 33% (plus some potential padding by the =
characters that are sometimes added at the end of the base64 output).
或者,用另一种方式描述,每个 Base64 字符(1 个字节(8 位))编码 6 位真实数据。如果我们划分 8 位/6 位,我们会看到 33% 的增长来自何处,如本文顶部所述...所以是的,Base64 总是将数据大小增加 33%(加上一些潜在的填充 =
字符,有时添加在 base64 输出的末尾)。
You may think "Why not base128 (7 bits of input = 8 bits of output), at just 14% size growth when encoding?". The answer for that is that base64 is the best we can find, since the lower 128 ASCII characters aren't all printable. Many are control characters such as NULL etc.
您可能会想“为什么不使用 base128(7 位输入 = 8 位输出),编码时大小仅增长 14%?”。答案是 base64 是我们能找到的最好的,因为较低的 128 个 ASCII 字符并非都是可打印的。许多是控制字符,例如 NULL 等。
There are obviously ways to create other systems such as perhaps "base81" etc, since you can do anything you want if you create a custom encoding algorithm. But the beauty of base64 is how it encodes data so cleanly in chunks of 6 bits, and how you simply have to "read 3 bytes and output 4" to encode, and "read 4 bytes and output 3" to decode. So that encoding scheme became popular.
显然有多种方法可以创建其他系统,例如“base81”等,因为如果创建自定义编码算法,您可以做任何想做的事情。但是 base64 的美妙之处在于它如何以 6 位的块如此干净地编码数据,以及您如何简单地“读取 3 个字节并输出 4”进行编码,并“读取 4 个字节并输出 3”进行解码。所以这种编码方案开始流行。
Now you are hopefully wiser after having read this.
现在,您希望在阅读本文后变得更加明智。
Fun Update: Speaking of other encoding styles with more characters... It's come to my attention that Ascii85 aka Base85 exists and is slightly more efficient (25% data size growth when encoding as Base85 instead of 33% for Base64): https://en.wikipedia.org/wiki/Ascii85
有趣的更新:说到具有更多字符的其他编码样式......我注意到 Ascii85 aka Base85 存在并且效率更高(编码为 Base85 时数据大小增长 25%,而不是 Base64 的 33%):https:/ /en.wikipedia.org/wiki/Ascii85