Java 将 ASCII 字节 [] 转换为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2201930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 04:41:11  来源:igfitidea点击:

Convert ASCII byte[] to String

javalog4jasciibytearray

提问by jwoolard

I am trying to pass a byte[] containing ASCII characters to log4j, to be logged into a file using the obvious representation. When I simply pass in the byt[] it is of course treated as an object and the logs are pretty useless. When I try to convert them to strings using new String(byte[] data), the performance of my application is halved.

我试图将包含 ASCII 字符的 byte[] 传递给 log4j,以使用明显的表示形式登录到文件中。当我简单地传入 byt[] 时,它当然被视为一个对象,并且日志非常无用。当我尝试使用 将它们转换为字符串时new String(byte[] data),我的应用程序的性能减半。

How can I efficiently pass them in, without incurring the approximately 30us time penalty of converting them to strings.

我怎样才能有效地传递它们,而不会产生将它们转换为字符串的大约 30us 的时间损失。

Also, why does it take so long to convert them?

另外,为什么转换它们需要这么长时间?

Thanks.

谢谢。

Edit

编辑

I should add that I am optmising for latency here - and yes, 30us does make a difference! Also, these arrays vary from ~100 all the way up to a few thousand bytes.

我应该补充一点,我在这里优化延迟 - 是的,30us 确实有所作为!此外,这些数组从 ~100 一直到几千字节不等。

采纳答案by Steven Schlansker

What you want to do is delay processing of the byte[] array until log4j decides that it actually wants to log the message. This way you could log it at DEBUG level, for example, while testing and then disable it during production. For example, you could:

您想要做的是延迟 byte[] 数组的处理,直到 log4j 决定它实际上想要记录消息。这样你就可以在调试级别记录它,例如,在测试时,然后在生产过程中禁用它。例如,您可以:

final byte[] myArray = ...;
Logger.getLogger(MyClass.class).debug(new Object() {
    @Override public String toString() {
        return new String(myArray);
    }
});

Now you don't pay the speed penalty unless you actually log the data, because the toString method isn't called until log4j decides it'll actually log the message!

现在,除非您实际记录数据,否则您无需支付速度损失,因为直到 log4j 决定实际记录消息时才会调用 toString 方法!

Now I'm not sure what you mean by "the obvious representation" so I've assumed that you mean convert to a String by reinterpreting the bytes as the default character encoding. Now if you are dealing with binary data, this is obviously worthless. In that case I'd suggest using Arrays.toString(byte[])to create a formatted string along the lines of

现在我不确定你所说的“明显表示”是什么意思,所以我假设你的意思是通过将字节重新解释为默认字符编码来转换为字符串。现在,如果您正在处理二进制数据,这显然毫无价值。在这种情况下,我建议使用Arrays.toString(byte[])沿着

[54, 23, 65, ...]

回答by finnw

ASCII is one of the few encodings that can be converted to/from UTF16 with no arithmetic or table lookups so it's possible to convert manually:

ASCII 是少数可以在没有算术或表查找的情况下转换为/从 UTF16 转换的编码之一,因此可以手动转换:

String convert(byte[] data) {
    StringBuilder sb = new StringBuilder(data.length);
    for (int i = 0; i < data.length; ++ i) {
        if (data[i] < 0) throw new IllegalArgumentException();
        sb.append((char) data[i]);
    }
    return sb.toString();
}

But make sure it really isASCII, or you'll end up with garbage.

但请确保它确实ASCII,否则你最终会得到垃圾。

回答by Joachim Sauer

Ifyour data is in fact ASCII (i.e. 7-bit data), then you should be using new String(data, "US-ASCII")instead of depending on the platform default encoding. This may be faster than trying to interpret it as your platform default encoding (which could be UTF-8, which requires more introspection).

如果您的数据实际上是 ASCII(即 7 位数据),那么您应该使用new String(data, "US-ASCII")而不是依赖于平台默认编码。这可能比尝试将其解释为您的平台默认编码(可能是 UTF-8,需要更多内省)更快。

You could also speed this up by avoiding the Charset-Lookup hit each time, by caching the Charsetinstance and calling new String(data, charset)instead.

您还可以通过缓存Charset实例并new String(data, charset)改为调用来避免每次命中 Charset-Lookup 来加快速度。

Having said that: it's been a very, very long time since I've seen real ASCII data in production environment

话虽如此:我已经很久没有在生产环境中看到真正的 ASCII 数据了

回答by BalusC

Halved performance? How large is this byte array? If it's for example 1MB, then there are certainly more factors to take into account than just "converting" from bytes to chars (which is supposed to be fast enough though). Writing1MB of data instead of "just" 100bytes (which the byte[].toString()may generate) to a log fileis obviously going to take some time. The disk file system is not as fast as RAM memory.

性能减半?这个字节数组有多大?如果它是例如 1MB,那么肯定有更多的因素需要考虑,而不仅仅是从字节“转换”到字符(虽然这应该足够快)。1MB 的数据而不是“仅”100 字节(byte[].toString()可能会生成)写入日志文件显然需要一些时间。磁盘文件系统不如 RAM 内存快。

You'll need to change the string representation of the byte array. Maybe with some more sensitive information, e.g. the name associated with it (filename?), its length and so on. After all, what does that byte array actuallyrepresent?

您需要更改字节数组的字符串表示形式。也许有一些更敏感的信息,例如与它相关的名称(文件名?),它的长度等等。毕竟,那个字节数组实际上代表什么?

Edit: I can't remember to have seen the "approximately 30us"phrase in your question, maybe you edited it in within 5 minutes after asking, but this is actually microoptimization and it should certainly not cause "halved performance" in general. Unless you write them a million times per second (still then, why would you want to do that? aren't you overusing the phenomenon "logging"?).

编辑:我不记得在您的问题中看到过“大约 30us”短语,也许您在询问后 5 分钟内对其进行了编辑,但这实际上是微优化,一般来说肯定不会导致“性能减半”。除非你每秒写一百万次(仍然如此,你为什么要这样做?你不是过度使用“记录”现象吗?)。