JavaScript:如何检查字符是否为 RTL?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12006095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JavaScript: how to check if character is RTL?
提问by Kryzhovnik
How can I programmatically check if the browser treats some character as RTL in JavaScript?
如何以编程方式检查浏览器是否将某些字符视为 JavaScript 中的 RTL?
Maybe creating some transparent DIV and looking at where text is placed?
也许创建一些透明的 DIV 并查看放置文本的位置?
A bit of context.Unicode 5.2 added Avestan alphabet support. So, if the browser has Unicode 5.2 support, it treats characters like U+10B00 as RTL (currently only Firefox does). Otherwise, it treats these characters as LTR, because this is the default.
一点上下文。Unicode 5.2 添加了 Avestan 字母支持。因此,如果浏览器支持 Unicode 5.2,它会将 U+10B00 等字符视为 RTL(目前只有 Firefox 支持)。否则,它会将这些字符视为 LTR,因为这是默认设置。
How do I programmatically check this? I'm writing an Avestan input script and I want to override the bidi direction if the browser is too dumb. But, if browser does support Unicode, bidi settings shouldn't be overriden (since this will allow mixing Avestan and Cyrillic).
我如何以编程方式检查这个?我正在编写一个 Avestan 输入脚本,如果浏览器太笨,我想覆盖比迪方向。但是,如果浏览器确实支持 Unicode,则不应覆盖 bidi 设置(因为这将允许混合 Avestan 和 Cyrillic)。
I currently do this:
我目前这样做:
var ua = navigator.userAgent.toLowerCase();
if (ua.match('webkit') || ua.match('presto') || ua.match('trident')) {
var input = document.getElementById('orig');
if (input) {
input.style.direction = 'rtl';
input.style.unicodeBidi = 'bidi-override';
}
}
But, obviously, this would render script less usable after Chrome and Opera start supporting Unicode 5.2.
但是,很明显,在 Chrome 和 Opera 开始支持 Unicode 5.2 之后,这会使脚本变得不那么可用。
采纳答案by Kryzhovnik
Thanks for your comments, but it seems I've done this myself:
感谢您的评论,但似乎我自己已经做到了:
function is_script_rtl(t) {
var d, s1, s2, bodies;
//If the browser doesn't support this, it probably doesn't support Unicode 5.2
if (!("getBoundingClientRect" in document.documentElement))
return false;
//Set up a testing DIV
d = document.createElement('div');
d.style.position = 'absolute';
d.style.visibility = 'hidden';
d.style.width = 'auto';
d.style.height = 'auto';
d.style.fontSize = '10px';
d.style.fontFamily = "'Ahuramzda'";
d.appendChild(document.createTextNode(t));
s1 = document.createElement("span");
s1.appendChild(document.createTextNode(t));
d.appendChild(s1);
s2 = document.createElement("span");
s2.appendChild(document.createTextNode(t));
d.appendChild(s2);
d.appendChild(document.createTextNode(t));
bodies = document.getElementsByTagName('body');
if (bodies) {
var body, r1, r2;
body = bodies[0];
body.appendChild(d);
var r1 = s1.getBoundingClientRect();
var r2 = s2.getBoundingClientRect();
body.removeChild(d);
return r1.left > r2.left;
}
return false;
}
Example of using:
使用示例:
Avestan in <script>document.write(is_script_rtl('') ? "RTL" : "LTR")</script>,
Arabic is <script>document.write(is_script_rtl('???????') ? "RTL" : "LTR")</script>,
English is <script>document.write(is_script_rtl('English') ? "RTL" : "LTR")</script>.
It seems to work. :)
它似乎工作。:)
回答by vsync
function isRTL(s){
var ltrChars = 'A-Za-z\u00C0-\u00D6\u00D8-\u00F6\u00F8-\u02B8\u0300-\u0590\u0800-\u1FFF'+'\u2C00-\uFB1C\uFDFE-\uFE6F\uFEFD-\uFFFF',
rtlChars = '\u0591-\u07FF\uFB1D-\uFDFD\uFE70-\uFEFC',
rtlDirCheck = new RegExp('^[^'+ltrChars+']*['+rtlChars+']');
return rtlDirCheck.test(s);
};
回答by mcarthurart
I realize this is quite a while after the original question was asked and answered but I found vsync's update to be rather useful and just wanted to add some observations. I would add this in comment to his answer but my reputation is not high enough yet.
我意识到这是在提出并回答原始问题之后很长一段时间,但我发现 vsync 的更新非常有用,只是想添加一些观察结果。我会将此添加到他的回答的评论中,但我的声誉还不够高。
Instead of a regular expression that searches from the start of the line zero or more non-LTR characters and then one RTL character, wouldn't it make more sense to search from the start of the line zero or more weak/neutral characters and then one RTL character? Otherwise you have the potential for matching many RTL characters unnecessarily. I would welcome a more thorough examination of my weak/neutral character group as I merely used the negation of the combined LTR and RTL character groups.
代替从行首零个或多个非 LTR 字符然后一个 RTL 字符开始搜索的正则表达式,从行首零个或多个弱/中性字符开始搜索,然后再搜索不是更有意义吗?一个 RTL 字符?否则,您可能会不必要地匹配许多 RTL 字符。我欢迎对我的弱/中性字符组进行更彻底的检查,因为我只是使用了组合 LTR 和 RTL 字符组的否定。
Additionally, shouldn't characters such as LTR/RTL marks, embeds, overrides be included in the appropriate character groupings?
此外,不应该将 LTR/RTL 标记、嵌入、覆盖等字符包含在适当的字符分组中吗?
I would think then that the final code should look something like:
我认为最终的代码应该是这样的:
function isRTL(s){
var weakChars = '\u0000-\u0040\u005B-\u0060\u007B-\u00BF\u00D7\u00F7\u02B9-\u02FF\u2000-\u2BFF\u2010-\u2029\u202C\u202F-\u2BFF',
rtlChars = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',
rtlDirCheck = new RegExp('^['+weakChars+']*['+rtlChars+']');
return rtlDirCheck.test(s);
};
Update
更新
There may be some ways to speed up the above regular expression. Using a negated character class with a lazy quantifier seems to help improve speed (tested on http://regexhero.net/tester/?id=6dab761c-2517-4d20-9652-6d801623eeec, site requires Silverlight 5)
可能有一些方法可以加快上述正则表达式的速度。使用带有惰性量词的否定字符类似乎有助于提高速度(在http://regexhero.net/tester/?id=6dab761c-2517-4d20-9652-6d801623eeec 上测试,站点需要 Silverlight 5)
Additionally, if the directionality of the string is unknown, my guess is that for most cases the string will be LTR instead of RTL and creating an isLTR
function would return results faster if that is the case but as OP is asking for isRTL
, will provide isRTL
function:
此外,如果字符串的方向性未知,我的猜测是,在大多数情况下,字符串将是 LTR 而不是 RTL,isLTR
如果是这种情况,创建函数会更快地返回结果,但正如 OP 所要求的那样isRTL
,将提供isRTL
函数:
function isRTL(s){
var rtlChars = '\u0591-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC',
rtlDirCheck = new RegExp('^[^'+rtlChars+']*?['+rtlChars+']');
return rtlDirCheck.test(s);
};
回答by jimmont
Testing for both Hebrew and Arabic (the only modern RTL languages/character sets I know which flow right-to-left except for any Persian-related which I've not researched):
测试希伯来语和阿拉伯语(我所知道的唯一现代 RTL 语言/字符集,除了我没有研究过的任何与波斯语相关的语言/字符集之外,它们是从右向左流动的):
/[\u0590-\u06FF]/.test(textarea.value)
More research suggests something along the lines of:
更多的研究表明:
/[\u0590-\u07FF\u200F\u202B\u202E\uFB1D-\uFDFD\uFE70-\uFEFC]/.test(textarea.value)
回答by Jukka K. Korpela
First addressing the question in the heading:
首先解决标题中的问题:
There are no tools in JavaScript as such for accessing Unicode properties of characters. You would need to find a library or service for the purpose (I'm afraid that might be difficult, if you need something reliable) or to extract the relevant information from the Unicode character “database” (a collection of text files in specific formats) and to write your own code to use it.
JavaScript 中没有用于访问字符的 Unicode 属性的工具。您需要为此目的找到图书馆或服务(恐怕这可能很困难,如果您需要可靠的东西)或从 Unicode 字符“数据库”(特定格式的文本文件的集合)中提取相关信息) 并编写自己的代码来使用它。
Then the question in message body:
然后是消息正文中的问题:
This seems even more desperate. But as this would probably be something for a limited number of users who are knowledgeable and know Avestan, maybe it would not be too bad to display a string of Avestan characters along with an image of them in proper directionality and ask the user click on a button if the order is wrong. And you could save this selection in a cookie, so that the user needs to do this only once (per browser; though it should be relatively short-lived cookie, as the browser may get updated).
这似乎更加绝望了。但是,由于这可能适用于知识渊博并了解 Avestan 的有限数量的用户,因此以正确的方向显示一串 Avestan 字符以及它们的图像并要求用户单击按钮,如果顺序错误。您可以将此选择保存在 cookie 中,这样用户只需执行一次(每个浏览器;尽管它应该是相对较短的 cookie,因为浏览器可能会更新)。