存储字节数组的 Java 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1295934/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 07:55:54  来源:igfitidea点击:

Java Strings storing byte arrays

javastringbytearray

提问by Jon

I want to store a byte array wrapped in a String object. Here's the scenario

我想存储一个包裹在 String 对象中的字节数组。这是场景

  1. The user enters a password.
  2. The bytes of that password are obtained using the getBytes() String method.
  3. They bytes are encrypted using java's crypo package.
  4. Those bytes are then converted into a String using the constructor new String(bytes[])
  5. That String is stored, or otherwise passed around (NOT changed)
  6. The bytes of that String are obtained and they are different then the encoded bytes.
  1. 用户输入密码。
  2. 该密码的字节是使用 getBytes() String 方法获得的。
  3. 它们的字节使用 java 的 crypo 包进行加密。
  4. 然后使用构造函数 new String(bytes[]) 将这些字节转换为 String
  5. 该字符串被存储,或以其他方式传递(未更改)
  6. 获得该字符串的字节,它们与编码字节不同。

Here's a snippet of code that describes what I'm talking about.

这是一段代码,描述了我在说什么。

String s = "test123";
byte[] a = s.getBytes();
byte[] b = env.encrypt(a);
String t = new String(b);
byte[] c = t.getBytes();
byte[] d = env.decrypt(c);

Where env.encrypt() and env.decrypt() do the encryption and decryption. The problem I'm having is that the b array is of length 8 and the c array is of length 16. I would think that they would be equal. What's going on here? I tried to modify the code as below

其中 env.encrypt() 和 env.decrypt() 进行加密和解密。我遇到的问题是 b 数组的长度为 8,c 数组的长度为 16。我认为它们是相等的。这里发生了什么?我试图修改代码如下

String s = "test123";
Charset charset = Charset.getDefaultCharset();
byte[] a = s.getBytes(charset);
byte[] b = env.encrypt(a);
String t = new String(b, charset);
byte[] c = t.getBytes(charset);
byte[] d = env.decrypt(c);

but that didn't help.

但这没有帮助。

Any ideas?

有任何想法吗?

采纳答案by Jonathan

It's not a good idea to store binary data in a String object. You'd be better off using something like Base64 encoding, which is intended to make binary data into a printable string, and is completely reversible.

将二进制数据存储在 String 对象中并不是一个好主意。你最好使用 Base64 编码之类的东西,它旨在将二进制数据转换为可打印的字符串,并且是完全可逆的。

In fact, I just found a public domain base64 encoder for Java: http://iharder.sourceforge.net/current/java/base64/

事实上,我刚刚找到了一个 Java 的公共域 base64 编码器:http: //iharder.sourceforge.net/current/java/base64/

回答by Chris

I don't have a definitive answer for you, but if I were working on this, I'd print out the string or byte at each step and compare them to see what's happening. Also, b holds a return value from env.encrypt, but c is a return value from .getBytes, so in a way you're comparing apples to oranges in that case.

我没有明确的答案,但如果我正在研究这个,我会在每一步打印出字符串或字节并比较它们以查看发生了什么。此外,b 持有来自 env.encrypt 的返回值,但 c 是来自 .getBytes 的返回值,因此在这种情况下,您将苹果与橙子进行比较。

回答by Licky Lindsay

This is somewhat of an abuse of the String(byte[]) constructor and related methods.

这有点滥用 String(byte[]) 构造函数和相关方法。

This would work with certain encodings, and fail with others. Presumably your platform's default encoding is one of the ones where it fails.

这适用于某些编码,而适用于其他编码。据推测,您平台的默认编码是失败的编码之一。

You should use something like Commons Codec to convert these bytes to hex or base64.

您应该使用Commons Codec 之类的东西将这些字节转换为十六进制或 base64。

Also why are you encrypting passwords instead of hashing them with salt anyway?

另外,你为什么要加密密码而不是用盐对它们进行散列?

回答by Pavel Minaev

In both cases, you are using the OS default non-Unicode charset (which depends on locale). If you're passing the string from one system to another, they may have different locales, and thus different default charsets. You need to use one well-definedcharset to do what you're trying to do; e.g. ISO-8859-1.

在这两种情况下,您都使用操作系统默认的非 Unicode 字符集(取决于语言环境)。如果您将字符串从一个系统传递到另一个系统,它们可能具有不同的语言环境,因此具有不同的默认字符集。你需要使用一个定义良好的字符集来做你想做的事情;例如 ISO-8859-1。

Better yet, don't do the conversion, and pass the byte[]array directly.

更好的是,不要进行转换,byte[]直接传递数组。

回答by Nick

This isn't going to work properly. Storing a byte as a string is only going to work right for the ascii set (and a few others). If you NEED to store the encrypted result as a String, then what about converting the bytes to hex and then putting that in a String. That would work.

这不会正常工作。将字节存储为字符串仅适用于 ascii 集(以及其他一些)。如果您需要将加密结果存储为字符串,那么如何将字节转换为十六进制然后将其放入字符串中。那行得通。

I recommend you just keep the password as bytes. There's no real reason to store it as a String (unless you want to see what peoples passwords are).

我建议您将密码保留为字节。没有真正的理由将其存储为字符串(除非您想查看人们的密码是什么)。

回答by u7867

Several people have pointed out that this is not a proper use of the String(byte[])constructor. It is important to remember that in Java a Stringis made up of characters, which happen to be 16 bits, and not 8 bits, as a byte is. You are also forgetting about character encoding. Remember, a character is often not a byte.

有几个人指出,这不是String(byte[])构造函数的正确使用。重要的是要记住,在 Java 中,aString是由字符组成的,这些字符恰好是 16 位,而不是字节那样的 8 位。您也忘记了字符编码。请记住,字符通常不是字节。

Let's break it down bit by bit:

让我们一点一点地分解它:

String s = "test123";
byte[] a = s.getBytes();

At this point your byte array most likely contains 8 bytes if your system's default character encoding is Windows-1252or iso-8859-1or UTF-8.

此时你的字节数组最有可能包含8个字节,如果你的系统默认字符编码Windows-1252iso-8859-1UTF-8

byte[] b = env.encrypt(a);

Now bcontains some seemingly random data depending on your encryption, and isn't even guaranteed to be a certain length. Many encryption engines pad the input data so that the output matches a certain block size.

现在b包含一些看似随机的数据,具体取决于您的加密,甚至不能保证一定长度。许多加密引擎填充输入数据,以便输出匹配特定的块大小。

String t = new String(b);

This is taking your random bytes and asking Java to interpret them as character data. These characters may appear as gibberish and some sequences of bits are not valid characters for every encoding. Java dutifully does its best and creates a sequence of 16-bit chars.

这是获取您的随机字节并要求 Java 将它们解释为字符数据。这些字符可能会显示为乱码,并且某些位序列对于每种编码都不是有效字符。Java 尽其所能并创建了一个 16 位字符序列。

byte[] c = t.getBytes();

This may or may not give you the same byte array as b, depending on the encoding. You state in the problem description that you are seeing cas 16 bytes long; this is probably because the garbage in t doesn't convert well in the default character encoding.

这可能会或可能不会为您提供与 相同的字节数组b,具体取决于编码。您在问题描述中声明您看到的c长度为 16 个字节;这可能是因为 t 中的垃圾在默认字符编码中不能很好地转换。

byte[] d = env.decrypt(c);

This won't work because cis not the data you expect it to be but rather is corrupt.

这将不起作用,因为c不是您期望的数据而是已损坏的数据。

Solutions:

解决方案:

  1. Just store the byte array directly in the database or wherever. However you are still forgetting about the character encoding problem, more on that in a sec.
  2. Take the byte array data and encode it using Base64 or as hexadecimal digits and store that string:

    byte[] cypherBytes = env.encrypt(getBytes(plainText));
    StringBuffer cypherText = new StringBuffer(cypherBytes.length * 2);
    for (byte b : cypherBytes) {
      String hex = String.format("%02X", b); //$NON-NLS-1$
      cypherText.append(hex);
    }
    return cypherText.toString();
    
  1. 只需将字节数组直接存储在数据库中或任何地方。但是,您仍然忘记了字符编码问题,稍后会详细介绍。
  2. 获取字节数组数据并使用 Base64 或十六进制数字对其进行编码并存储该字符串:

    byte[] cypherBytes = env.encrypt(getBytes(plainText));
    StringBuffer cypherText = new StringBuffer(cypherBytes.length * 2);
    for (byte b : cypherBytes) {
      String hex = String.format("%02X", b); //$NON-NLS-1$
      cypherText.append(hex);
    }
    return cypherText.toString();
    

Character encoding:

字符编码:

A user's password may not be ASCII and thus your system is susceptible to problems because you don't specify the encoding.

用户的密码可能不是 ASCII,因此您的系统很容易出现问题,因为您没有指定编码。

Compare:

相比:

String s = "tést123";
byte[] a = s.getBytes();
byte[] b = env.encrypt(a);

with

String s = "tést123";
byte[] a = s.getBytes("UTF-8");
byte[] b = env.encrypt(a);

The byte array awon't have the same value with the UTF-8encoding as with the system default (unless your system default is UTF-8). It doesn't matter what encoding you use as long as A) you're consistent and B) your encoding can represent all the allowable characters for your data. You probably can't store Chinese text in the system default encoding. If your application is ever deployed on more than one computer, and one of those has a different system-default encoding, passwords encrypted on one system will become gibberish on the other system.

字节数组aUTF-8编码值与系统默认值不同(除非您的系统默认值是UTF-8)。只要 A) 您是一致的并且 B) 您的编码可以代表您的数据的所有允许字符,您使用什么编码并不重要。您可能无法以系统默认编码存储中文文本。如果您的应用程序部署在多台计算机上,并且其中一台计算机具有不同的系统默认编码,则在一个系统上加密的密码在另一个系统上将变得乱七八糟。

Moral of the story: Characters are not bytes and bytes are not characters. You have to remember which you are dealing with and how to convert back and forth between them.

这个故事的寓意:字符不是字节,字节也不是字符。您必须记住您正在处理的对象以及如何在它们之间来回转换。

回答by b1nary.atr0phy

Implement a StringWrapper class whose constructor takes a String arg and coverts it to a byte[]. Use "ISO-8859-1" encoding to ensure each char will be just 8 bits instead of 16. You can then obviously use encoding/decoding methods to manipulate those bytes.

实现一个 StringWrapper 类,它的构造函数接受一个 String arg 并将其转换为一个 byte[]。使用“ISO-8859-1”编码来确保每个字符只有 8 位而不是 16 位。然后您显然可以使用编码/解码方法来操作这些字节。