java 将 InputStream 的 Latin-1 内容转换为 UTF-8 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11854794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 06:40:36  来源:igfitidea点击:

Convert Latin-1 content of InputStream into UTF-8 String

javastringcharacter-encodinginputstream

提问by cyroxx

I need to convert the content of an InputStream into a String. The difficulty here is the input encoding, namely Latin-1. I tried several approaches and code snippets with String, getBytes, char[], etc. in order to get the encoding straight, but nothing seemed to work.

我需要将 InputStream 的内容转换为 String。这里的难点在于输入编码,即Latin-1。我尝试了几种方法和使用 String、getBytes、char[] 等的代码片段,以便直接进行编码,但似乎没有任何效果。

Finally, I came up with the working solution below. However, this code seems a little verbose to me, even for Java. So the question here is:

最后,我想出了下面的工作解决方案。然而,这段代码对我来说似乎有点冗长,即使对于 Java 也是如此。所以这里的问题是:

Is there a simpler and more elegant approach to achieve what is done here?

是否有更简单、更优雅的方法来实现这里所做的工作?

private String convertStreamToStringLatin1(java.io.InputStream is)
        throws IOException {

    String text = "";

    // setup readers with Latin-1 (ISO 8859-1) encoding
    BufferedReader i = new BufferedReader(new InputStreamReader(is, "8859_1"));

    int numBytes;
    CharBuffer buf = CharBuffer.allocate(512);
    while ((numBytes = i.read(buf)) != -1) {
        text += String.copyValueOf(buf.array(), 0, numBytes);
        buf.clear();
    }

    return text;
}

回答by oldrinb

Firstly, a few criticisms of the approach you've taken already. You shouldn't unnecessarily use an NIO CharBufferwhen you merely want a char[512]. You don't need to clearthe buffer each iteration, either.

首先,对您已经采取的方法提出一些批评。你不应该使用不必要的NIOCharBuffer当你只是想要一个char[512]。您也不需要clear每次迭代都缓冲。

int numBytes;
final char[] buf = new char[512];
while ((numBytes = i.read(buf)) != -1) {
    text += String.copyValueOf(buf, 0, numBytes);
}

You should also know that just constructing a Stringwith those arguments will have the same effect, as the constructor too copies the data.

您还应该知道,仅使用这些参数构造 aString将具有相同的效果,因为构造函数也会复制数据。

The contents of the subarray are copied; subsequent modification of the character array does not affect the newly created string.

复制子数组的内容;字符数组的后续修改不会影响新创建的字符串。



You can use a dynamic ByteArrayOutputStreamwhich grows an internal buffer to accommodate all the data. You can then use the entire byte[]from toByteArrayto decode into a String.

您可以使用动态ByteArrayOutputStream增长内部缓冲区来容纳所有数据。然后,您可以使用整个byte[]fromtoByteArray解码为String.

The advantage is that deferring decoding until the end avoids decoding fragments individually; while that may work for simple charsets like ASCII or ISO-8859-1, it will notwork on multi-byte schemes like UTF-8 and UTF-16. This means it is easierto change the character encoding in the future, since the code requires no modification.

优点是延迟解码到最后避免单独解码片段;虽然这可能适用于 ASCII 或 ISO-8859-1 等简单字符集,但它不适用于 UTF-8 和 UTF-16 等多字节方案。这意味着以后更容易更改字符编码,因为代码不需要修改。

private static final String DEFAULT_ENCODING = "ISO-8859-1";

public static final String convert(final InputStream in) throws IOException {
  return convert(in, DEFAULT_ENCODING);
}

public static final String convert(final InputStream in, final String encoding) throws IOException {
  final ByteArrayOutputStream out = new ByteArrayOutputStream();
  final byte[] buf = new byte[2048];
  int rd;
  while ((rd = in.read(buf, 0, 2048) >= 0) {
    out.write(buf, 0, rd);
  }
  return new String(out.toByteArray(), 0, encoding);
}

回答by Blacklight

I don't see how it could be much simpler. I did this a little different once.. if you already have a String, you can do this:

我不明白它怎么会简单得多。我做过一次有点不同..如果你已经有一个字符串,你可以这样做:

new String(originalString.getBytes(), "ISO-8859-1");

So something like this could also work:

所以这样的事情也可以工作:

BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
  sb.append(line + "\n");
}
is.close();
return new String(sb.toString().getBytes(), "ISO-8859-1");

EDIT: I should add, this is really just an alternative to your already working solution. When it comes to converting Streams in Java it won't be much simpler, so go for it. :)

编辑:我应该补充一点,这实际上只是您已经工作的解决方案的替代方案。当谈到在 Java 中转换 Streams 时,它不会简单得多,所以去吧。:)

回答by Fredrik LS

If you don't want to plumb it yourself you could have a look at the apache commons io project, IOUtils.toString(InputStream input, String encoding)which seems to do what you want. I haven't tried that method myself but the java doc states "Get the contents of an InputStream as a String using the specified character encoding."

如果您不想自己尝试,您可以查看 apache commons io 项目IOUtils.toString(InputStream input, String encoding),它似乎可以满足您的需求。我自己没有尝试过这种方法,但 java 文档指出“使用指定的字符编码将 InputStream 的内容作为字符串获取”。

回答by Mike Samuel

Guava's IO package is really nice this way.

Guava的 IO 包以这种方式非常好。

Files.toString(yourFile, CharSets.ISO_8859_1)

or from a stream

或从流

new String(ByteStreams.toByteArray(stream), CharSets.ISO_8859_1)

回答by cyroxx

I just found out that this answerto the question Read/convert an InputStream to a Stringcan be applied to my problem, please see the code below. Anyway, I very much appreciate the answers you've given so far.

我刚刚发现这个Read/convert an InputStream to a String问题的回答可以应用于我的问题,请看下面的代码。无论如何,我非常感谢您到目前为止给出的答案。

private String convertStreamToString(InputStream is, String charsetName) {
    try {
        return new java.util.Scanner(is, charsetName).useDelimiter("\A").next();
    } catch (java.util.NoSuchElementException e) {
        return "";
    }
}

So in order to encode from Latin-1, call it like this:

所以为了从 Latin-1 编码,这样称呼它:

String message = convertStreamToString(is, "8859_1");