java 如何使用ICU4J库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11730271/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 06:10:15  来源:igfitidea点击:

How to use ICU4J library

javaunicodeicu

提问by Adham

While I am searching in the website about how to to display the RTLtext correctly I found this postabout the ICUlibrary, in fact I don't have any previous experience on how to use it . and tho almost there is no clear online resources .

当我在网站上搜索如何RTL正确显示文本时,我发现了这篇关于ICU库的帖子,实际上我以前没有任何关于如何使用它的经验。而且几乎没有明确的在线资源。

Any guy here has a previous experience with using it ? or at least tell me what I have to search for to get what I want ?

这里有人有使用它的经验吗?或者至少告诉我我必须寻找什么才能得到我想要的东西?

回答by Ibraheem Osama

Hi Adham I have e little experience in ICU4J I was trying to read an LTR Arabic text and convert it to RTL Text I changed the numbers from English to Arabic numbers and set the alignment to RTL This is a simple code that do the job I hope my little experience helped you this is the demosin the ICU4J site

嗨 Adham 我在 ICU4J 中的经验很少我试图阅读 LTR 阿拉伯文本并将其转换为 RTL 文本我将数字从英语更改为阿拉伯数字并将对齐设置为 RTL 这是一个简单的代码,可以完成我希望的工作我的一点经验帮助了您 这是ICU4J 站点中的演示

        PdfReader reader = new PdfReader(INPUTFILE);


        String txt=PdfTextExtractor.getTextFromPage(reader, 1);

        BiDiClass bidiClass = new BiDiClass();

        String arabicNumber = bidiClass.englishToArabicNumber(txt);

        String out=bidiClass.makeLineLogicalOrder(arabicNumber, true);


        System.out.println(out);

and this is the BiDiClass

这是 BiDiClass

import com.ibm.icu.text.Bidi;
 import com.ibm.icu.text.Normalizer;


//Editor : Ibraheem Osama Mohamed



/**
 * This class is an implementation the the ICU4J class. TextNormalize
 * will call this only if the ICU4J library exists in the classpath.
 * @author <a href="mailto:[email protected]">Brian Carrier</a>
 * @version $Revision: 1.0 $
 */
public class BiDiClass {


    private static final String REPLACE_CHARS = "0123456789.";
    private Bidi bidi;




    private StringBuilder sb = new StringBuilder();

    /**
     * Constructor.
     */
    public BiDiClass()
    {
        bidi = new Bidi();

        /* We do not use bidi.setInverse() because that uses
         * Bidi.REORDER_INVERSE_NUMBERS_AS_L, which caused problems
         * in some test files. For example, a file had a line of:
         * 0 1 / ARABIC
         * and the 0 and 1 were reversed in the end result. 
         * REORDER_INVERSE_LIKE_DIRECT is the inverse Bidi mode
         * that more closely reflects the Unicode spec.
         */
        bidi.setReorderingMode(Bidi.REORDER_INVERSE_LIKE_DIRECT);
    }

   /**
     * Takes a line of text in presentation order and converts it to logical order.
     * @see TextNormalize.makeLineLogicalOrder(String, boolean)    
     * 
     * @param str String to convert
     * @param isRtlDominant RTL (right-to-left) will be the dominant text direction
     * @return The converted string
     */
    public String makeLineLogicalOrder(String str, boolean isRtlDominant)
    {   
        bidi.setPara(str, isRtlDominant?Bidi.RTL:Bidi.LTR, null);

        /* Set the mirror flag so that parentheses and other mirror symbols
         * are properly reversed, when needed.  With this removed, lines
         * such as (CBA) in the PDF file will come out like )ABC( in logical
         * order.
         */
        return bidi.writeReordered(Bidi.DO_MIRRORING);
    }

  //algorithm to change form English number to Arabic number
    public String englishToArabicNumber(String string){

        char[] ch=string.toCharArray();

        for (char c : ch) {
             if (REPLACE_CHARS.contains(String.valueOf(c))) {

                   c = (char) ('\u0660' - '0' + c);

             }
             sb.append(c);
          }


        return sb.toString();
    }


    /**
     * Normalize presentation forms of characters to the separate parts.
     * @see TextNormalize.normalizePres(String)
     *
     * @param str String to normalize
     * @return Normalized form
     */
    public String normalizePres(String str)
    {
        StringBuilder builder = null;
        int p = 0;
        int q = 0;
        int strLength = str.length();
        for (; q < strLength; q++) /* >>>*/
        {
            // We only normalize if the codepoint is in a given range.
            // Otherwise, NFKC converts too many things that would cause
            // confusion. For example, it converts the micro symbol in
            // extended Latin to the value in the Greek script. We normalize
            // the Unicode Alphabetic and Arabic A&B Presentation forms.
            char c = str.charAt(q);
            if ((0xFB00 <= c && c <= 0xFDFF) || (0xFE70 <= c && c <= 0xFEFF))/* >>>*/
            {
                if (builder == null) {
                    builder = new StringBuilder(strLength * 2);
                }
                builder.append(str.substring(p, q));
                // Some fonts map U+FDF2 differently than the Unicode spec.
                // They add an extra U+0627 character to compensate.
                // This removes the extra character for those fonts.
                if(c == 0xFDF2 && q > 0 && (str.charAt(q-1) == 0x0627 ||     str.charAt(q-1) == 0xFE8D))
                {
                    builder.append("\u0644\u0644\u0647");
                }
                else
                {
                    // Trim because some decompositions have an extra space,
                    // such as U+FC5E
                    builder.append(
                            Normalizer.normalize(c, Normalizer.NFKC).trim());
                }
                p = q + 1;
            }
        }
        if (builder == null) {
            return str;
        } else {
            builder.append(str.substring(p, q));
            return builder.toString();
        }
    }



    /**
     * Decomposes Diacritic characters to their combining forms.
     *
     * @param str String to be Normalized
     * @return A Normalized String
     */     
    public String normalizeDiac(String str)
    {
        StringBuilder retStr = new StringBuilder();
        int strLength = str.length();
        for (int i = 0; i < strLength; i++) /* >>>*/
        {
            char c = str.charAt(i);
            if(Character.getType(c) == Character.NON_SPACING_MARK
                    || Character.getType(c) == Character.MODIFIER_SYMBOL
                    || Character.getType(c) == Character.MODIFIER_LETTER)
            {
                /*
                 * Trim because some decompositions have an extra space, such as
                 * U+00B4
                 */
                retStr.append(Normalizer.normalize(c, Normalizer.NFKC).trim());
            }
            else
            {
                retStr.append(str.charAt(i));
            }
        }
        return retStr.toString();
    }

      }

回答by Dhaval Jivani

Android N now offers ICU4J Android Framework APIs

Android N 现在提供ICU4J Android 框架 API

Android N exposes a subset of the ICU4J APIs via the android.icu package, rather than com.ibm.icu. The Android framework may choose not to expose ICU4J APIs for various reasons

Android N 通过 android.icu 包而不是 com.ibm.icu 公开了 ICU4J API 的子集。Android 框架可能出于各种原因选择不公开 ICU4J API

Here are a few important things to note:

以下是一些需要注意的重要事项:

  1. The ICU4J Android framework APIs do not include all the ICU4J APIs.
  2. NDK developers should know that Android ICU4C is not supported.
  3. The APIs in the Android framework do not replace Android's support for localizing with resources.
  1. ICU4J Android 框架 API 不包括所有 ICU4J API。
  2. NDK 开发者应该知道不支持 Android ICU4C。
  3. Android 框架中的 API 不会取代 Android 对资源本地化的支持。