java 将 InputStream 的 Latin-1 内容转换为 UTF-8 字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11854794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert Latin-1 content of InputStream into UTF-8 String
提问by cyroxx
I need to convert the content of an InputStream into a String. The difficulty here is the input encoding, namely Latin-1. I tried several approaches and code snippets with String, getBytes, char[], etc. in order to get the encoding straight, but nothing seemed to work.
我需要将 InputStream 的内容转换为 String。这里的难点在于输入编码,即Latin-1。我尝试了几种方法和使用 String、getBytes、char[] 等的代码片段,以便直接进行编码,但似乎没有任何效果。
Finally, I came up with the working solution below. However, this code seems a little verbose to me, even for Java. So the question here is:
最后,我想出了下面的工作解决方案。然而,这段代码对我来说似乎有点冗长,即使对于 Java 也是如此。所以这里的问题是:
Is there a simpler and more elegant approach to achieve what is done here?
是否有更简单、更优雅的方法来实现这里所做的工作?
private String convertStreamToStringLatin1(java.io.InputStream is)
throws IOException {
String text = "";
// setup readers with Latin-1 (ISO 8859-1) encoding
BufferedReader i = new BufferedReader(new InputStreamReader(is, "8859_1"));
int numBytes;
CharBuffer buf = CharBuffer.allocate(512);
while ((numBytes = i.read(buf)) != -1) {
text += String.copyValueOf(buf.array(), 0, numBytes);
buf.clear();
}
return text;
}
回答by oldrinb
Firstly, a few criticisms of the approach you've taken already. You shouldn't unnecessarily use an NIO CharBuffer
when you merely want a char[512]
. You don't need to clear
the buffer each iteration, either.
首先,对您已经采取的方法提出一些批评。你不应该使用不必要的NIOCharBuffer
当你只是想要一个char[512]
。您也不需要clear
每次迭代都缓冲。
int numBytes;
final char[] buf = new char[512];
while ((numBytes = i.read(buf)) != -1) {
text += String.copyValueOf(buf, 0, numBytes);
}
You should also know that just constructing a String
with those arguments will have the same effect, as the constructor too copies the data.
您还应该知道,仅使用这些参数构造 aString
将具有相同的效果,因为构造函数也会复制数据。
The contents of the subarray are copied; subsequent modification of the character array does not affect the newly created string.
复制子数组的内容;字符数组的后续修改不会影响新创建的字符串。
You can use a dynamic ByteArrayOutputStream
which grows an internal buffer to accommodate all the data. You can then use the entire byte[]
from toByteArray
to decode into a String
.
您可以使用动态ByteArrayOutputStream
增长内部缓冲区来容纳所有数据。然后,您可以使用整个byte[]
fromtoByteArray
解码为String
.
The advantage is that deferring decoding until the end avoids decoding fragments individually; while that may work for simple charsets like ASCII or ISO-8859-1, it will notwork on multi-byte schemes like UTF-8 and UTF-16. This means it is easierto change the character encoding in the future, since the code requires no modification.
优点是延迟解码到最后避免单独解码片段;虽然这可能适用于 ASCII 或 ISO-8859-1 等简单字符集,但它不适用于 UTF-8 和 UTF-16 等多字节方案。这意味着以后更容易更改字符编码,因为代码不需要修改。
private static final String DEFAULT_ENCODING = "ISO-8859-1";
public static final String convert(final InputStream in) throws IOException {
return convert(in, DEFAULT_ENCODING);
}
public static final String convert(final InputStream in, final String encoding) throws IOException {
final ByteArrayOutputStream out = new ByteArrayOutputStream();
final byte[] buf = new byte[2048];
int rd;
while ((rd = in.read(buf, 0, 2048) >= 0) {
out.write(buf, 0, rd);
}
return new String(out.toByteArray(), 0, encoding);
}
回答by Blacklight
I don't see how it could be much simpler. I did this a little different once.. if you already have a String, you can do this:
我不明白它怎么会简单得多。我做过一次有点不同..如果你已经有一个字符串,你可以这样做:
new String(originalString.getBytes(), "ISO-8859-1");
So something like this could also work:
所以这样的事情也可以工作:
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
return new String(sb.toString().getBytes(), "ISO-8859-1");
EDIT: I should add, this is really just an alternative to your already working solution. When it comes to converting Streams in Java it won't be much simpler, so go for it. :)
编辑:我应该补充一点,这实际上只是您已经工作的解决方案的替代方案。当谈到在 Java 中转换 Streams 时,它不会简单得多,所以去吧。:)
回答by Fredrik LS
If you don't want to plumb it yourself you could have a look at the apache commons io project, IOUtils.toString(InputStream input, String encoding)which seems to do what you want. I haven't tried that method myself but the java doc states "Get the contents of an InputStream as a String using the specified character encoding."
如果您不想自己尝试,您可以查看 apache commons io 项目IOUtils.toString(InputStream input, String encoding),它似乎可以满足您的需求。我自己没有尝试过这种方法,但 java 文档指出“使用指定的字符编码将 InputStream 的内容作为字符串获取”。
回答by Mike Samuel
回答by cyroxx
I just found out that this answerto the question Read/convert an InputStream to a Stringcan be applied to my problem, please see the code below. Anyway, I very much appreciate the answers you've given so far.
我刚刚发现这个对Read/convert an InputStream to a String问题的回答可以应用于我的问题,请看下面的代码。无论如何,我非常感谢您到目前为止给出的答案。
private String convertStreamToString(InputStream is, String charsetName) {
try {
return new java.util.Scanner(is, charsetName).useDelimiter("\A").next();
} catch (java.util.NoSuchElementException e) {
return "";
}
}
So in order to encode from Latin-1, call it like this:
所以为了从 Latin-1 编码,这样称呼它:
String message = convertStreamToString(is, "8859_1");