Java String.getBytes( charsetName ) vs String.getBytes ( Charset 对象)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23316755/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 21:50:08  来源:igfitidea点击:

Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

javastringcharacter-encoding

提问by Loc

I need to encode a String to byte array using UTF-8 encoding. I am using Google guava, it has Charsets class already define Charset instance for UTF-8 encoding. I have 2 ways to do:

我需要使用 UTF-8 编码将字符串编码为字节数组。我正在使用 Google 番石榴,它的 Charsets 类已经为 UTF-8 编码定义了 Charset 实例。我有两种方法:

  1. String.getBytes( charsetName )

    try {        
        byte[] bytes = my_input.getBytes ( "UTF-8" );
    } catch ( UnsupportedEncodingException ex) {
    
    }
    
  2. String.getBytes( Charset object )

    // Charsets.UTF_8 is an instance of Charset    
    
    byte[] bytes = my_input.getBytes ( Charsets.UTF_8 );
    
  1. String.getBytes( charsetName )

    try {        
        byte[] bytes = my_input.getBytes ( "UTF-8" );
    } catch ( UnsupportedEncodingException ex) {
    
    }
    
  2. String.getBytes(字符集对象)

    // Charsets.UTF_8 is an instance of Charset    
    
    byte[] bytes = my_input.getBytes ( Charsets.UTF_8 );
    

My question is which one I should use? They return the same result. For way 2 - I don't have to put try/catch! I take a look at the Java source code and I see that way 1 and way 2 are implemented differently.

我的问题是我应该使用哪一种?它们返回相同的结果。对于方式 2 - 我不必放置 try/catch!我查看了 Java 源代码,发现方式 1 和方式 2 的实现方式不同。

Anyone has any ideas?

任何人有任何想法?

回答by merlin2011

Since they return the same result, you should use method 2 because it generally safer and more efficient to avoid asking the library to parse and possibly break on a user-supplied string. Also, avoiding the try-catch will make your own code cleaner as well.

由于它们返回相同的结果,您应该使用方法 2,因为它通常更安全、更有效,以避免要求库解析并可能在用户提供的字符串上中断。此外,避免 try-catch 也会使您自己的代码更干净。

The Charsets.UTF_8can be more easily checked at compile-time, which is most likely the reason you do not need a try-catch.

Charsets.UTF_8可以在编译时,这是最有可能你并不需要一个理由更容易检查try-catch

回答by dasblinkenlight

The first API is for situations when you do not know the charset at compile time; the second one is for situations when you do. Since it appears that your code needs UTF-8 specifically, you should prefer the second API:

第一个 API 用于在编译时不知道字符集的情况;第二个是针对您这样做的情况。由于您的代码似乎特别需要 UTF-8,您应该更喜欢第二个 API:

byte[] bytes = my_input.getBytes ( Charsets.UTF_8 ); // <<== UTF-8 is known at compile time

The first API is for situations when the charset comes from outsideyour program - for example, from the configuration file, from user input, as part of a client request to the server, and so on. That is why there is a checked exception thrown from it - for situations when the charset specified in the configuration or through some other means is not available.

第一个 API 用于字符集来自程序外部的情况 - 例如,来自配置文件、来自用户输入、作为客户端对服务器请求的一部分等。这就是为什么会从它抛出一个已检查的异常 - 对于在配置中指定的字符集或通过其他方式指定的字符集不可用的情况。

回答by Andres

If you already have the Charset, then use the 2nd version as it's less error prone.

如果您已经有了 Charset,请使用第二个版本,因为它不太容易出错。

回答by Brian Roach

If you are going to use a string literal (e.g. "UTF-8") ... you shouldn't. Instead use the second version and supply the constant value from StandardCharsets(specifically, StandardCharsets.UTF_8, in this case).

如果您打算使用字符串文字(例如“UTF-8”)……您不应该这样做。而是使用第二个版本并提供来自StandardCharsets(特别是StandardCharsets.UTF_8,在这种情况下)的常量值。

The first version is used when the charset is dynamic. This is going to be the case when you don't know what the charset is at compile time; it's being supplied by an end user, read from a config file or system property, etc.

当字符集为动态时使用第一个版本。当您在编译时不知道字符集是什么时,就会出现这种情况;它由最终用户提供,从配置文件或系统属性等中读取。

Internally, both methods are calling a version of StringCoding.encode(). The first version of encode()is simply looking up the Charsetby the supplied name first, and throwing an exception if that charset is unknown / not available.

在内部,这两种方法都在调用StringCoding.encode(). 的第一个版本encode()只是首先查找Charset提供的名称,如果该字符集未知/不可用则抛出异常。