java Java字符串搜索忽略重音

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2397804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 20:57:24  来源:igfitidea点击:

Java string searching ignoring accents

javastringlocalizationfilterdiacritics

提问by DaveJohnston

I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.

我正在尝试为我的应用程序编写一个过滤器函数,它将接受一个输入字符串并以某种方式过滤掉与给定输入不匹配的所有对象。最简单的方法是使用 String 的 contains 方法,即只检查对象(对象中的 String 变量)是否包含过滤器中指定的字符串,但这不会考虑重音。

The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.

有问题的对象基本上是人,我试图匹配的字符串是名称。因此,例如,如果有人搜索 Joao,我希望 Joáo 包含在结果集中。我已经在我的应用程序中使用了 Collat​​or 类来按名称排序,它运行良好,因为它可以进行比较,即使用 UK Locale á 在 b 之前但在 a 之后。但很明显,如果您比较 a 和 á,它不会返回 0,因为它们不相等。

So does anyone have any idea how I might be able to do this?

那么有没有人知道我如何能够做到这一点?

回答by BalusC

Make use of java.text.Normalizerand a shot of regex to get rid of the diacritics.

使用java.text.Normalizer正则表达式来去除变音符号

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\p{InCombiningDiacriticalMarks}+", "");
}

Which you can use as follows:

您可以按如下方式使用:

String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao

回答by Benny Bottema

Collator doesreturn 0 for a and á, ifyou configure it to ignore diacritics:

Collat​​or确实会为 a 和 á 返回 0,如果您将其配置为忽略变音符号:

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    // Collator.PRIMARY also works, but is case senstive
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") yields true now

isSame("a", "á") 现在产生真

回答by mehdok

I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.

我编写了一个通过忽略变音符号(不删除它们)来搜索低谷阿拉伯语文本的类。也许你可以得到这个想法或以某种方式使用它。

DiacriticInsensitiveSearch.java

DiacriticInsensitiveSearch.java