java java将字符串windows-1251转换为utf8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26995472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java convert String windows-1251 to utf8
提问by halem
Scanner sc = new Scanner(System.in);
System.out.println("Enter text: ");
String text = sc.nextLine();
try {
String result = new String(text.getBytes("windows-1251"), Charset.forName("UTF-8"));
System.out.println(result);
} catch (UnsupportedEncodingException e) {
System.out.println(e);
}
I'm trying change keyboard: input cyrylic keyboard, output latin. Example: qwerty +> йцукен
我正在尝试更改键盘:输入 cyrylic 键盘,输出拉丁语。示例:qwerty +> йцукен
It doesn't work, can anyone tell me what i'm doing wrong?
它不起作用,谁能告诉我我做错了什么?
回答by Joop Eggen
First java text, String/char/Reader/Writer is internally Unicode, so it can combine all scripts. This is a major difference with for instance C/C++ where there is no such standard.
首先是java文本,String/char/Reader/Writer内部是Unicode,所以可以组合所有的脚本。这是与没有这样标准的例如 C/C++ 的主要区别。
Now System.in is an InputStream for historical reasons. That needs an indication of encoding used.
由于历史原因,现在 System.in 是 InputStream。这需要指示使用的编码。
Scanner sc = new Scanner(System.in, "Windows-1251");
The above explicitly sets the conversion for System.in to Cyrillic. Without this optional parameter the default encoding is taken. If that was not changed by the software, it would be the platform encoding. So this might have been correct too.
上面明确地将 System.in 的转换设置为 Cyrillic。如果没有此可选参数,则采用默认编码。如果软件没有改变它,那就是平台编码。所以这也可能是正确的。
Now text
is correct, containing the Cyrillic from System.in as Unicode.
现在text
是正确的,包含来自 System.in 的西里尔字母作为 Unicode。
You would get the UTF-8 bytes as:
您将获得 UTF-8 字节为:
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
The old "recoding" of text was wrong; drop this line. in fact not all Windows-1251 bytes are valid UTF-8 multi-byte sequences.
旧的文本“重新编码”是错误的;放下这条线。事实上,并非所有 Windows-1251 字节都是有效的 UTF-8 多字节序列。
String result = text;
System.out.println(result);
System.out is a PrintStream, a rather rarely used historic class. It prints using the default platform encoding. More or less rely on it, that the default encoding is correct.
System.out 是一个 PrintStream,一个很少使用的历史类。它使用默认平台编码进行打印。或多或少依赖它,默认编码是正确的。
System.out.println(result);
For printing to an UTF-8 encoded file:
要打印到 UTF-8 编码的文件:
byte[] bytes = ("\uFEFF" + text).getBytes(StandardCharsets.UTF_8);
Path path = Paths.get("C:/Temp/test.txt");
Files.writeAllBytes(path, bytes);
Here I have added a Unicode BOM character in front, so Windows Notepad may recognize the encoding as UTF-8. In general one should evade using a BOM. It is a zero-width space (=invisible) and plays havoc with all kind of formats: CSV, XML, file concatenation, cut-copy-paste.
这里我在前面加了一个Unicode BOM字符,所以Windows记事本可能会识别编码为UTF-8。一般来说,应该避免使用 BOM。它是一个零宽度空间(=不可见)并且对各种格式造成严重破坏:CSV、XML、文件连接、剪切复制粘贴。
回答by v010dya
The reason why you have gotten the answer to a different question, and nobody answered yours, is because your title doesn't fit the question. You were not attempting to convert between charsets, but rather between keyboard layouts.
你得到了另一个问题的答案,而没有人回答你的,是因为你的标题不适合这个问题。您没有尝试在字符集之间进行转换,而是在键盘布局之间进行转换。
Here you shouldn't worry about character layout at all, simply read the line, convert it to an array of characters, go through them and using a predefined map convert these.
在这里,您根本不必担心字符布局,只需读取该行,将其转换为字符数组,遍历它们并使用预定义的映射来转换它们。
The code will be something like this:
代码将是这样的:
Map<char, char> table = new TreeMap<char, char>();
table.put('q', 'й');
table.put('Q', 'Й');
table.put('w', 'ц');
// .... etc
String text = sc.nextLine();
char[] cArr = text.toCharArray();
for(int i=0; i<cArr.length; ++i)
{
if(table.containsKey(cArr[i]))
{
cArr[i] = table.get(cArr[i]);
}
}
text = new String(cArr);
System.out.println(text);
Now, i don't have time to test that code, but you should get the idea of how to do your task.
现在,我没有时间测试该代码,但您应该了解如何完成您的任务。