Java - 从 unicode 转换为 ANSI

Question

提问by Abdullah Md. Zubair

I have a string \u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF. I need to convert it in Avwg wKsew?—i K_v ejwQ` which is in ANSI format. How can I convert this Unicode to ANSI characters in java.

我有一个字符串\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF。我需要将它转换为Avwg wKsew?-i K_v ejwQ` ANSI 格式。如何在 java 中将此 Unicode 转换为 ANSI 字符。

Edit:

编辑：

resultView.setTypeface(typeFace);
String str=new String("\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF");               
resultView.setText(str);

Answer 1

回答by bobince

I need to convert it in AvwgwKsew?—i K_v ejwQwhich is in ANSI format.

我需要将其转换AvwgwKsew?—i K_v ejwQ为 ANSI 格式。

That's not ANSI format. The (misleadingly-named) "ANSI" code pages in Windows are all based around ASCII, with different characters added in the high bytes. Byte 0x41 (A) as a leading letter in an ANSI code page always means Latin Aand not Bengali ?.

那不是ANSI格式。Windows 中的（误导性命名的）“ANSI”代码页都基于 ASCII，在高字节中添加了不同的字符。字节 0x41( A) 作为 ANSI 代码页中的前导字母始终表示拉丁语A而不是孟加拉语?。

What I think you have is a custom symbol font, that maps arbitrary symbols to completely unrelated codepoints. Every such font has its own visual encoding; to convert between Unicode and the custom visual encoding you'd have to build up your own translation table by looking at the glyphs for each character and matching them to the Unicode character that represents the same letter.

我认为您拥有的是自定义符号字体，它将任意符号映射到完全不相关的代码点。每个这样的字体都有自己的视觉编码；要在 Unicode 和自定义视觉编码之间进行转换，您必须通过查看每个字符的字形并将它们与表示相同字母的 Unicode 字符进行匹配来构建自己的翻译表。

I would strongly advise getting a proper Unicode-aware font that supports Bengali instead. Content stuck in an arbitrary font-specific encoding is difficult to deal with (because semantically you really are dealing with a string that means "AvwgwKsew?—i K_v ejwQ", with all the editing and case-changing gotchas that implies.

我强烈建议您使用支持孟加拉语的正确的 Unicode 感知字体。停留在任意字体特定编码中的内容很难处理（因为从语义上讲，您确实在处理一个表示“AvwgwKsew？—i K_v ejwQ”的字符串，其中包含所有暗示的编辑和大小写更改问题。

Visual-encoded fonts are an unhappy relic of the time before Windows had good Unicode (or even ISCII) support. They should not be used for anything today.

在 Windows 具有良好的 Unicode（甚至 ISCII）支持之前，视觉编码字体是令人不快的遗物。今天它们不应该用于任何事情。

Answer 2

回答by laher

I'm not sure exactly what you're asking, but I'll assume you're asking how to convert some characters from Unicode into an 8-bit character set. (e.g. ISO-8859-1 is the characterset for 'Western European' languages, like English).

我不确定您在问什么，但我假设您问的是如何将某些字符从 Unicode 转换为 8 位字符集。（例如，ISO-8859-1 是“西欧”语言（如英语）的字符集）。

I don't know of any way to automatically detect the relevant 8-bit charset, so I looked up one of your characters (on here http://unicode.org/charts/), and I can see that these characters are Bengali.

我不知道有什么方法可以自动检测相关的 8 位字符集，所以我查找了你的一个字符（在这里http://unicode.org/charts/），我可以看到这些字符是孟加拉语.

I thinkthe equivalent 8-bit character set for Bengali is known as x-iscii-be. I don't have this installed on my system, so I couldn't do the conversion successfully.

我认为孟加拉语的等效 8 位字符集称为x-iscii-be. 我的系统上没有安装这个，所以我无法成功进行转换。

EDIT: Java does not support the charset x-iscii-be, but I'll leave the remainder of this answer for illustration purposes. See http://download.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.htmlfor a list of supported Charsets.

编辑：Java 不支持 charset x-iscii-be，但我将保留此答案的其余部分以供说明。有关支持的字符集列表，请参阅http://download.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html。

EDIT2: Android certainly doesn't guarantee support for this charset (the only 8-bit characterset it guarantees is ISO-8859-1). See: http://developer.android.com/reference/java/nio/charset/Charset.html.

EDIT2：Android 当然不保证支持这个字符集（它保证的唯一 8 位字符集是 ISO-8859-1）。请参阅：http: //developer.android.com/reference/java/nio/charset/Charset.html。

*So, I think you should run some Charset-detecting code on a Bengali Android device - perhaps it supports this charset. Everything you need is in my code sample. *

*所以，我认为您应该在孟加拉语 Android 设备上运行一些字符集检测代码 - 也许它支持这个字符集。您需要的一切都在我的代码示例中。*

In order for Java to convert your data in a different charset, all you need to do in Java is to check that the desired Charset is installed, and then specify the desired Charset when you convert the String into bytes.

为了让 Java 将您的数据转换为不同的字符集，您在 Java 中需要做的就是检查是否安装了所需的字符集，然后在将字符串转换为字节时指定所需的字符集。

The conversion itself would be extremely simple:

转换本身将非常简单：

    str.getBytes("x-iscii-be");

So, you see, the String itself is stored in a kind of 'normalised' form (i.e. the defaultCharset), and you can treat the getBytes(charsetName) as kind of 'alternative output format' for the String. Sorry - poor explanation!

因此，您会看到，字符串本身以一种“规范化”形式（即 defaultCharset）存储，您可以将 getBytes(charsetName) 视为字符串的一种“替代输出格式”。对不起 - 糟糕的解释！

In your situation, perhaps you just need to assign a Charset to the resultView, and the framework will work its magic for you ...

在您的情况下，也许您只需要为 resultView 分配一个字符集，框架就会为您发挥它的魔力......

Here's some test code I put together to illustrate the point, and to check whether a given charset is supported on a system.

下面是一些我放在一起的测试代码来说明这一点，并检查系统是否支持给定的字符集。

I have got this code to output the byte-arrays as 'hex' strings, so that you can see that the data is different after conversion.

我有这段代码将字节数组输出为“十六进制”字符串，以便您可以看到转换后的数据是不同的。

import java.io.UnsupportedEncodingException;
import java.math.BigInteger;
import java.nio.charset.Charset;
import java.util.Map.Entry;
import java.util.SortedMap;

public class UnicodeTest {
    public static void main(String[] args) throws UnsupportedEncodingException {
        testWestern();
        testBengali();
    }

    public static void testWestern() throws UnsupportedEncodingException {
        String unicodeStr= "\u00c2"; //This is a capital A with an accent.;
        String charsetName= "ISO-8859-1";
        System.out.println("Input (outputted as default charset - normally unicode): "+unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void testBengali() throws UnsupportedEncodingException {
        String unicodeStr = "\u0986\u09AE\u09BF \u0995\u09BF\u0982\u09AC\u09A6\u09A8\u09CD\u09A4\u09BF\u09B0 \u0995\u09A5\u09BE \u09AC\u09B2\u099B\u09BF";
        String charsetName= "x-iscii-be";
        System.out.println(unicodeStr);
        attempt8bitCharsetConversion(unicodeStr, charsetName);
    }

    public static void attempt8bitCharsetConversion(String input, String charsetName) throws UnsupportedEncodingException {
        SortedMap<String, Charset> availableCharsets = Charset
                .availableCharsets();
        for (Entry<String, Charset> entry : availableCharsets.entrySet()) {
            if (charsetName.equalsIgnoreCase(entry.getKey())) {
                System.out.println("HEXED input : "+ toHex(input.getBytes(Charset.defaultCharset().name())));
                System.out.println("HEXED output: "+ toHex(input.getBytes(entry.getKey())));
            }
        }
        throw new UnsupportedEncodingException(charsetName+ " is not supported on this system");
    }

    public static String toHex(byte[] input) throws UnsupportedEncodingException {
        return String.format("%x", new BigInteger(input));
    }
}

See also here for more information on charset conversion: http://download.oracle.com/javase/tutorial/i18n/text/string.html

有关字符集转换的更多信息，请参见此处：http: //download.oracle.com/javase/tutorial/i18n/text/string.html

Charactersets are a tricky business, so please forgive my convoluted answer.

字符集是一件棘手的事情，所以请原谅我的复杂回答。

HTH

Answer 3

回答by HeLLBoY

I've written a class which can solve the problem of 09CB ?, 09CC ?, 09C7 ?, 09C8 ?,09BF ? ??,??,? in UTF-8, I reshape it by editing font glyph, you don't need to change it to extended ASCII, :( but still i couldn't solve your bengali conjugates. For proper render it require android 3.5 or higher, it'll work smooth on android 4.0 (Ice Cream Sandwich).

我写了一个类可以解决 09CB ?, 09CC ?, 09C7 ?, 09C8 ?,09BF ? ??,??,? 在 UTF-8 中，我通过编辑字体字形来重塑它，您不需要将其更改为扩展的 ASCII，:( 但我仍然无法解决您的孟加拉语共轭。要正确渲染它需要 android 3.5 或更高版本，它'将在 android 4.0（冰淇淋三明治）上顺利运行。

Java - 从 unicode 转换为 ANSI

提问by Abdullah Md. Zubair

回答by bobince

回答by laher

回答by HeLLBoY

相关推荐

最近更新

标签

Java - 从 unicode 转换为 ANSI

提问by Abdullah Md. Zubair

回答by bobince

回答by laher

回答by HeLLBoY

相关推荐

java Talend -- 一行到多，输出行数可变

java 什么是 EJB 3.0 版本的 ejbCreate 方法

java JUnit 自定义注解

java 将一个简单的 hello world servlet 编译成 tomcat 的 .war

相关推荐

最近更新

标签