java Java字符串搜索忽略重音
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2397804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java string searching ignoring accents
提问by DaveJohnston
I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.
我正在尝试为我的应用程序编写一个过滤器函数,它将接受一个输入字符串并以某种方式过滤掉与给定输入不匹配的所有对象。最简单的方法是使用 String 的 contains 方法,即只检查对象(对象中的 String 变量)是否包含过滤器中指定的字符串,但这不会考虑重音。
The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.
有问题的对象基本上是人,我试图匹配的字符串是名称。因此,例如,如果有人搜索 Joao,我希望 Joáo 包含在结果集中。我已经在我的应用程序中使用了 Collator 类来按名称排序,它运行良好,因为它可以进行比较,即使用 UK Locale á 在 b 之前但在 a 之后。但很明显,如果您比较 a 和 á,它不会返回 0,因为它们不相等。
So does anyone have any idea how I might be able to do this?
那么有没有人知道我如何能够做到这一点?
回答by BalusC
Make use of java.text.Normalizerand a shot of regex to get rid of the diacritics.
使用java.text.Normalizer正则表达式来去除变音符号。
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\p{InCombiningDiacriticalMarks}+", "");
}
Which you can use as follows:
您可以按如下方式使用:
String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao
回答by Benny Bottema
Collator doesreturn 0 for a and á, ifyou configure it to ignore diacritics:
Collator确实会为 a 和 á 返回 0,如果您将其配置为忽略变音符号:
public boolean isSame(String a, String b) {
Collator insenstiveStringComparator = Collator.getInstance();
insenstiveStringComparator.setStrength(Collator.PRIMARY);
// Collator.PRIMARY also works, but is case senstive
return insenstiveStringComparator.compare(a, b) == 0;
}
isSame("a", "á") yields true now
isSame("a", "á") 现在产生真
回答by mehdok
I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.
我编写了一个通过忽略变音符号(不删除它们)来搜索低谷阿拉伯语文本的类。也许你可以得到这个想法或以某种方式使用它。

