java 包含阿拉伯和西方字符的字符串连接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6177294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 14:42:22  来源:igfitidea点击:

String concatenation containing Arabic and Western characters

javastringinternationalizationarabic

提问by Carlos Ferreira

I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.

我正在尝试连接包含阿拉伯语和西方字符(混合在同一个字符串中)的几个字符串。问题是结果是一个字符串,它很可能在语义上是正确的,但与我想要获得的不同,因为字符的顺序被 Unicode 双向算法改变了。基本上,我只想将它们连接起来,就好像它们都是 LTR 一样,而忽略了一些是 RTL 的事实,这是一种“不可知”的连接。

I'm not sure if I was clear in my explanation, but I don't think I can do it any better.

我不确定我的解释是否清楚,但我认为我不能做得更好。

Hope someone can help me.

希望可以有人帮帮我。

Kind regards,

亲切的问候,

Carlos Ferreira

卡洛斯·费雷拉

BTW, the strings are being obtained from the database.

顺便说一句,字符串是从数据库中获取的。

EDIT

编辑

enter image description here

在此处输入图片说明

The first 2 Strings are the strings I want to concatenate and the third is the result.

前两个字符串是我想要连接的字符串,第三个是结果。

EDIT 2

编辑 2

Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.

实际上,连接后的字符串与图像中的有点不同,它在复制+粘贴过程中发生了变化,1 在第一个 A 之后,而不是在第二个 A 之前。

回答by Mike Samuel

You can embed bidi regions using unicode format control codepoints:

您可以使用 unicode 格式控制代码点嵌入双向区域:

  • Left-to-right embedding (U+202A)
  • Right-to-left embedding (U+202B)
  • Pop directional formatting (U+202C)
  • 从左到右嵌入 (U+202A)
  • 从右到左嵌入 (U+202B)
  • 流行方向格式 (U+202C)

So in java, to embed a RTL language like Arabic in an LTR language like English, you would do

所以在java中,要将像阿拉伯语这样的RTL语言嵌入到像英语这样的LTR语言中,你会这样做

myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish

and to do the reverse

并做相反的事情

myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic

See Bidirectional General Formattingfor more details, or the Unicode specification chapter on "Directional Formatting Codes"for the source material.

有关更多详细信息,请参阅双向通用格式,或有关源材料的“方向格式代码”的 Unicode 规范章节

回答by MicSim

It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codesof the Unicode Bidirectional Algorithm specification.

您很可能需要在字符串中插入 Unicode 方向格式代码才能正确显示字符串。有关详细信息,请参阅Unicode 双向算法规范的定向格式代码

Maybe the Bidiclass can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.

也许Bidi类可以帮助您确定正确的序列,因为它实现了 Unicode 双向算法。

回答by MRAB

It's not changing order of the codepoints. What's happening is that when it comes to display the string, it sees that the string starts with a right-to-left script, so it displays it right-to-left.

它不会改变代码点的顺序。发生的情况是,在显示字符串时,它看到字符串以从右到左的脚本开头,因此它从右到左显示。