如何让 Eclipse 打印出 unicode 中的奇怪字符？

Question

提问by wynnch

So I'm trying to make my program output a text file with a list of names. Some of the names have weird characters, such as ?str?m.

所以我试图让我的程序输出一个带有名称列表的文本文件。一些名称具有奇怪的字符，例如 ?str?m。

I have grabbed these list of names from a webpage that is encoded in "UTF-8", or at least I'm pretty sure it does because the page source says

我从以“UTF-8”编码的网页中获取了这些名称列表，或者至少我很确定它确实如此，因为页面来源说

" meta http-equiv="Content-Type" content="text/html; charset=UTF-8" / "

" meta http-equiv="Content-Type" content="text/html; 字符集=UTF-8"/"

This is what I've tried so far:

这是我迄今为止尝试过的：

public static void write(List<String> list) throws IOException  {
        Writer out = new OutputStreamWriter(new FileOutputStream("test.txt"), "UTF-8");
        try {
            for (int i=0;i<list.size();i++) {
                try {
                    byte[] utf8Bytes = list.get(i).getBytes("UTF-8");
                    out.write(new String(utf8Bytes, "UTF-8"));
                } catch (UnsupportedEncodingException e) {
                    e.printStackTrace();
                }

                out.write(System.getProperty("line.separator"));

            }
        }
        finally {
        out.close();
        }
    }

and I'm a little confused as to why it's not working. The output I get is "?…str??m", which is very weird.

我有点困惑为什么它不起作用。我得到的输出是“?...str??m”，这很奇怪。

Can someone please point me in the right direction? Thanks!

有人可以指出我正确的方向吗？谢谢！

And on another unrelated note, is there an easier way to write a new line to a text file besides the clunky

在另一个不相关的注释中，除了笨重的文本文件之外，还有没有更简单的方法可以将新行写入文本文件

out.write(System.getProperty("line.separator"));

that I have? I saw that online somewhere and it works, but I was just wondering if there was a cleaner way.

我有？我在网上某处看到它并且它有效，但我只是想知道是否有更清洁的方法。

Answer 1

回答by trashgod

Set your Eclipse > Preferences > General > Workspace > Text file encodingto UTF-8.

将你的设置Eclipse > Preferences > General > Workspace > Text file encoding为 UTF-8。

Answer 2

回答by Javier C

The content is indeed in UTF-8 and it appears OK if printed to the console. What may be causing the problem is the decoding and encoding of the string which is unnecessary. Instead of an OutputStreamWriter try using a java.io.PrintWriter. It has the printlnmethods that print out the string with the system line separator at the end. It would look something like:

内容确实是 UTF-8 格式，如果打印到控制台，它看起来没问题。可能导致问题的原因是不必要的字符串解码和编码。尝试使用 java.io.PrintWriter 而不是 OutputStreamWriter。它有println方法，可以打印出以系统行分隔符结尾的字符串。它看起来像：

printStream.println(list.get(i));

Also, when opening the file to see it try using a browser. They allow you to choose the encoding after opening it so you can try several encodings quickly to see what is being really used.

此外，当打开文件查看它时，请尝试使用浏览器。它们允许您在打开后选择编码，以便您可以快速尝试多种编码以查看真正使用的编码。

Answer 3

回答by McDowell

Notepad is not a particularly feature rich editor. It will attempt to guess the document encoding, sometimes with unexpected results. "Plain text" documents don't carry any metadata about their encoding which gives them certain limitations. Windows apps (Notepad included) often rely on the byte-order-mark(U+FEFF or "\uFEFF"in Java strings) to determine if the encoding is a Unicode format. That might help out Notepad; it's going to be useless for your web page problem.

记事本并不是一个功能特别丰富的编辑器。它会尝试猜测文档编码，有时会产生意想不到的结果。“纯文本”文档不携带任何有关其编码的元数据，这给了它们一定的限制。Windows 应用程序（包括记事本）通常依赖字节顺序标记（U+FEFF 或"\uFEFF"Java 字符串）来确定编码是否为 Unicode 格式。这可能有助于记事本；它对您的网页问题毫无用处。

The HTML 4 spec defines how output encoding should be set. You should set the Content-TypeHTTP header in addition to specifying the meta encoding.

HTML 4 规范定义了应该如何设置输出编码。Content-Type除了指定元编码之外，您还应该设置HTTP 标头。

You don't mention what you're using in your web app. A servlet should set the content type setContentType("text/html; charset=UTF-8"); a JSP should use the page directive to do the same. Other view technologies will provide similar mechanisms.

您没有提及您在 Web 应用程序中使用的内容。servlet 应该设置内容类型setContentType("text/html; charset=UTF-8")；JSP 应该使用 page 指令来做同样的事情。其他视图技术将提供类似的机制。

byte[] utf8Bytes = list.get(i).getBytes("UTF-8");
out.write(new String(utf8Bytes, "UTF-8"));

This code performs some useless operations; it transcodes character data from UTF-16 to UTF-8, then back from UTF-8 to UTF-16, then writes data to a Writer(which will transcode the UTF-16 to UTF-8 again). This code is equivalent:

这段代码执行了一些无用的操作；它将字符数据从 UTF-16 转码为 UTF-8，然后从 UTF-8 转回 UTF-16，然后将数据写入 a Writer（它将再次将 UTF-16 转码为 UTF-8）。此代码等效：

String str = list.get(i);
out.write(str);

Use a PrintWriterto get newline support.

使用 aPrintWriter获得换行支持。

You can read more about character encoding in Java here, hereand here.

您可以在此处、此处和此处阅读有关 Java 字符编码的更多信息。

如何让 Eclipse 打印出 unicode 中的奇怪字符？

提问by wynnch

回答by trashgod

回答by Javier C

回答by McDowell

相关推荐

最近更新

标签

如何让 Eclipse 打印出 unicode 中的奇怪字符？

提问by wynnch

回答by trashgod

回答by Javier C

回答by McDowell

相关推荐

eclipse 保存日食透视布局和其他东西

Android/Eclipse 问题 - “id 无法解析或不是字段”错误

如何在 Eclipse 中为 Java 项目启用任务列表？

在 Eclipse 中查找和替换文本

相关推荐

最近更新

标签