Java 爪哇。比较字符串时忽略重音
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2373213/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java. Ignore accents when comparing strings
提问by framara
The problem it's easy. Is there any function in JAVA to compare two Strings and return true ignoring the accented chars?
问题很简单。JAVA 中是否有任何函数可以比较两个字符串并在忽略重音字符的情况下返回 true?
ie
IE
String x = "Joao";
String y = "Jo?o";
return that are equal.
回报相等。
Thanks
谢谢
采纳答案by DaveJohnston
I think you should be using the Collatorclass. It allows you to set a strength and locale and it will compare characters appropriately.
我认为您应该使用Collator类。它允许您设置强度和语言环境,并适当地比较字符。
From the Java 1.6 API:
来自 Java 1.6 API:
You can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ě" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.
您可以设置 Collator 的强度属性来确定在比较中被视为显着的差异级别。提供了四种强度:PRIMARY、SECONDARY、TERTIARY 和 IDENTICAL。语言特征强度的确切分配取决于语言环境。例如,在捷克语中,“e”和“f”被认为是主要差异,而“e”和“ě”是次要差异,“e”和“E”是三级差异,而“e”和“e”是相同的.
I think the important point here (which people are trying to make) is that "Joao"and "Jo?o" should never be considered as equal, but if you are doing sorting you don't want them to be compared based on their ASCII value because then you would have something like Joao, John, Jo?o, which is not good. Using the collator class definitely handles this correctly.
我认为这里的重点(人们试图提出的)是“Joao”和“Jo?o”永远不应该被视为平等,但是如果你在进行排序,你不希望他们根据他们的ASCII 值,因为那样你会得到像 Joao、John、Jo?o 这样的东西,这并不好。使用 collator 类肯定可以正确处理这个问题。
回答by Uri
The problem with these sort of conversions is that there isn't always a clear-cut mapping from accented to non-accented characters. It depends on codepages, localizations, etc. For example, is this a with an accent equivalent to an "a"? Not a problem for a human, but trickier for the computer.
这类转换的问题在于,从重音字符到非重音字符的映射并不总是清晰的。这取决于代码页、本地化等。例如,这是一个与“a”等效的重音符号吗?对人类来说不是问题,但对计算机来说更棘手。
AFAIK Java does not have a built in conversion that can look up the current localization options and make these sort of conversions. You may need some external library that handles unicode better, like ICU (http://site.icu-project.org/)
AFAIK Java 没有可以查找当前本地化选项并进行此类转换的内置转换。您可能需要一些可以更好地处理 unicode 的外部库,例如 ICU(http://site.icu-project.org/)
回答by Chris Jester-Young
You didn't hear this from me (because I disagree with the premise of the question), but, you can use java.text.Normalizer
, and normalize with NFD
: this splits off the accent from the letter it's attached to. You can then filter off the accent characters and compare.
你没有从我这里听到这个(因为我不同意这个问题的前提),但是,你可以使用java.text.Normalizer
, 并使用NFD
: 这将重音从它所附加的字母中分离出来。然后,您可以过滤掉重音字符并进行比较。
回答by Benny Bottema
Collator returns 0 for a and á, if you configure it to ignore diacritics:
如果您将 Collator 配置为忽略变音符号,则对 a 和 á 返回 0:
public boolean isSame(String a, String b) {
Collator insenstiveStringComparator = Collator.getInstance();
insenstiveStringComparator.setStrength(Collator.PRIMARY);
return insenstiveStringComparator.compare(a, b) == 0;
}
isSame("a", "á") yields true
isSame("a", "á") 产生真
回答by Daniel
Or use stripAccentsfrom apache StringUtils library if you want to compare/sort ignoring accents :
如果您想比较/排序忽略重音,或者使用apache StringUtils 库中的stripAccents:
public int compareStripAccent(String a, String b) {
return StringUtils.stripAccents(a).compareTo(StringUtils.stripAccents(b));
}
回答by Carlos Federico Lopez Spindola
public boolean insenstiveStringComparator (String a, String b) {
java.text.Collator collate = java.text.Collator.getInstance();
collate.setStrength(java.text.Collator.PRIMARY);
collate.setDecomposition(java.text.Collator.CANONICAL_DECOMPOSITION);
return collate.equals(a, b);
}