java Java字符串搜索忽略重音

Question

提问by DaveJohnston

I am trying to write a filter function for my application that will take an input string and filter out all objects that don't match the given input in some way. The easiest way to do this would be to use String's contains method, i.e. just check if the object (the String variable in the object) contains the string specified in the filter, but this won't account for accents.

我正在尝试为我的应用程序编写一个过滤器函数，它将接受一个输入字符串并以某种方式过滤掉与给定输入不匹配的所有对象。最简单的方法是使用 String 的 contains 方法，即只检查对象（对象中的 String 变量）是否包含过滤器中指定的字符串，但这不会考虑重音。

The objects in question are basically Persons, and the strings I am trying to match are names. So for example if someone searches for Joao I would expect Joáo to be included in the result set. I have already used the Collator class in my application to sort by name and it works well because it can do compare, i.e. using the UK Locale á comes before b but after a. But obvisouly it doesn't return 0 if you compare a and á because they are not equal.

有问题的对象基本上是人，我试图匹配的字符串是名称。因此，例如，如果有人搜索 Joao，我希望 Joáo 包含在结果集中。我已经在我的应用程序中使用了 Collator 类来按名称排序，它运行良好，因为它可以进行比较，即使用 UK Locale á 在 b 之前但在 a 之后。但很明显，如果您比较 a 和 á，它不会返回 0，因为它们不相等。

So does anyone have any idea how I might be able to do this?

那么有没有人知道我如何能够做到这一点？

Answer 1

回答by BalusC

Make use of java.text.Normalizerand a shot of regex to get rid of the diacritics.

使用java.text.Normalizer正则表达式来去除变音符号。

public static String removeDiacriticalMarks(String string) {
    return Normalizer.normalize(string, Form.NFD)
        .replaceAll("\p{InCombiningDiacriticalMarks}+", "");
}

Which you can use as follows:

您可以按如下方式使用：

String value = "Joáo";
String comparisonMaterial = removeDiacriticalMarks(value); // Joao

Answer 2

回答by Benny Bottema

Collator doesreturn 0 for a and á, ifyou configure it to ignore diacritics:

Collator确实会为 a 和 á 返回 0，如果您将其配置为忽略变音符号：

public boolean isSame(String a, String b) {
    Collator insenstiveStringComparator = Collator.getInstance();
    insenstiveStringComparator.setStrength(Collator.PRIMARY);
    // Collator.PRIMARY also works, but is case senstive
    return insenstiveStringComparator.compare(a, b) == 0;
}

isSame("a", "á") yields true now

isSame("a", "á") 现在产生真

Answer 3

回答by mehdok

I have written a class for searching trough arabic texts by ignoring diacritic (NOT removing them). maybe you can get the idea or use it in some way.

我编写了一个通过忽略变音符号（不删除它们）来搜索低谷阿拉伯语文本的类。也许你可以得到这个想法或以某种方式使用它。

DiacriticInsensitiveSearch.java

java Java字符串搜索忽略重音

提问by DaveJohnston

回答by BalusC

回答by Benny Bottema

回答by mehdok

相关推荐

最近更新

标签

java Java字符串搜索忽略重音

提问by DaveJohnston

回答by BalusC

回答by Benny Bottema

回答by mehdok

相关推荐

java Java中的绝对相对文件路径

java 如何在Ant中添加相当于java -D的系统属性

Java- FileWriter/BufferedWriter - 附加到文本文件的末尾？

java Ant调用java方法的常见用法

相关推荐

最近更新

标签