Java 将字符串文字 unicode 打印为实际字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1402877/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 11:41:33  来源:igfitidea点击:

Print string literal unicode as the actual character

javaunicode

提问by digiarnie

In my Java application I have been passed in a string that looks like this:

在我的 Java 应用程序中,我传入了一个如下所示的字符串:

"\u00a5123"

“\u00a5123”

When printing that string into the console, I get the same string as the output (as expected).

将该字符串打印到控制台时,我得到与输出相同的字符串(如预期)。

However, I want to print that out by having the unicode converted into the actual yen symbol (\u00a5 -> yen symbol) - how would I go about doing this?

但是,我想通过将 unicode 转换为实际的日元符号(\u00a5 -> 日元符号)来打印出来 - 我该怎么做?

i.e. so it looks like this: "[yen symbol]123"

即它看起来像这样:“[日元符号]123”

采纳答案by aperkins

I wrote a little program:

我写了一个小程序:

public static void main(String[] args) {
    System.out.println("\u00a5123");
}

It's output:

它的输出:

¥123

¥123

i.e. it output exactly what you stated in your post. I am not sure there is not something else going on. What version of Java are you using?

即它输出的内容与您在帖子中所说的完全相同。我不确定没有其他事情发生。您使用的是什么版本的 Java?

edit:

编辑:

In response to your clarification, there are a couple of different techniques. The most straightforward is to look for a "\u" followed by 4 hex-code characters, extract that piece and replace with a unicode version with the hexcode (using the Character class). This of course assumes the string will not have a \u in front of it.

为了回应您的澄清,有几种不同的技术。最直接的是查找“\u”后跟 4 个十六进制代码字符,提取该部分并用十六进制代码(使用 Character 类)替换为 unicode 版本。这当然假设字符串前面没有 \u 。

I am not aware of any particular system to parse the String as though it was an encoded Java String.

我不知道有任何特定系统可以将字符串解析为编码的 Java 字符串。

回答by Licky Lindsay

You're probably going to have to write a parse for these, unless you can find one in a third party library. There is nothing in the JDK to parse these for you, I know because I fairly recently had an idea to use these kind of escapes as a way to smuggle unicode through a Latin-1-only database. (I ended up doing something else btw)

您可能必须为这些编写解析,除非您可以在第三方库中找到。JDK 中没有任何内容可以为您解析这些内容,我知道是因为我最近有一个想法,将这些类型的转义用作通过仅包含 Latin-1 的数据库走私 unicode 的方法。(顺便说一句,我最终做了别的事情)

I will tell you that java.util.Properties escapes and unescapes Unicode characters in this manner when reading and writing files (since the files have to be ASCII). The methods it uses for this are private, so you can't call them, but you could use the JDK source code to inspire your solution.

我会告诉你 java.util.Properties 在读取和写入文件时以这种方式转义和取消转义 Unicode 字符(因为文件必须是 ASCII)。它为此使用的方法是私有的,因此您无法调用它们,但您可以使用 JDK 源代码来激发您的解决方案。

回答by Abhinav Maheshwari

As has been mentioned before, these strings will have to be parsed to get the desired result.

如前所述,必须解析这些字符串才能获得所需的结果。

  1. Tokenize the string by using \u as separator. For example: \u63A5\u53D7 => { "63A5", "53D7" }

  2. Process these strings as follows:

    String hex = "63A5";
    int intValue = Integer.parseInt(hex, 16);
    System.out.println((char)intValue);
    
  1. 使用 \u 作为分隔符来标记字符串。例如:\u63A5\u53D7 => { "63A5", "53D7" }

  2. 按如下方式处理这些字符串:

    String hex = "63A5";
    int intValue = Integer.parseInt(hex, 16);
    System.out.println((char)intValue);
    

回答by Joel Swanson

Could replace the above with this:

可以用这个替换上面的:

System.out.println((char)0x63A5);

Here is the code to print all of the box building unicode characters.

这是打印所有框构建 unicode 字符的代码。

public static void printBox()
{
    for (int i=0x2500;i<=0x257F;i++)
    {
        System.out.printf("0x%x : %c\n",i,(char)i);
    }
}