在字符串比较中忽略重音字母
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/359827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Ignoring accented letters in string comparison
提问by Jon Tackabury
I need to compare 2 strings in C# and treat accented letters the same as non-accented letters. For example:
我需要在 C# 中比较 2 个字符串,并将重音字母与非重音字母相同。例如:
string s1 = "hello";
string s2 = "héllo";
s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);
These 2 strings need to be the same (as far as my application is concerned), but both of these statements evaluate to false. Is there a way in C# to do this?
这两个字符串必须相同(就我的应用程序而言),但是这两个语句都评估为 false。C# 中有没有办法做到这一点?
采纳答案by Serge Wautier
EDIT 2012-01-20: Oh boy! The solution was so much simpler and has been in the framework nearly forever. As pointed out by knightpfhor:
编辑 2012-01-20:哦,天哪!解决方案要简单得多,并且几乎永远存在于框架中。正如 Knightpfhor 指出的那样:
string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);
Here's a function that strips diacritics from a string:
这是一个从字符串中去除变音符号的函数:
static string RemoveDiacritics(string text)
{
string formD = text.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
foreach (char ch in formD)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(ch);
}
}
return sb.ToString().Normalize(NormalizationForm.FormC);
}
More details on MichKap's blog(RIP...).
有关MichKap 博客的更多详细信息( RIP...)。
The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.
原理是将 'é' 变成 2 个连续的字符 'e',锐角。然后它遍历字符并跳过变音符号。
"héllo" becomes "he<acute>llo", which in turn becomes "hello".
“你好”变成了“他<急性> llo”,反过来又变成了“你好”。
Debug.Assert("hello"==RemoveDiacritics("héllo"));
Note: Here's a more compact .NET4+ friendly version of the same function:
注意:这是相同功能的更紧凑的 .NET4+ 友好版本:
static string RemoveDiacritics(string text)
{
return string.Concat(
text.Normalize(NormalizationForm.FormD)
.Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
UnicodeCategory.NonSpacingMark)
).Normalize(NormalizationForm.FormC);
}
回答by Jon Tackabury
try this overload on the String.Compare Method.
在 String.Compare 方法上试试这个重载。
String.Compare Method (String, String, Boolean, CultureInfo)
String.Compare 方法(字符串、字符串、布尔值、CultureInfo)
It produces a int value based on the compare operations including cultureinfo. the example in the page compares "Change" in en-US and en-CZ. CH in en-CZ is a single "letter".
它根据包括cultureinfo 在内的比较操作生成一个int 值。页面中的示例比较了 en-US 和 en-CZ 中的“Change”。en-CZ 中的 CH 是单个“字母”。
example from the link
来自链接的示例
using System;
using System.Globalization;
class Sample {
public static void Main() {
String str1 = "change";
String str2 = "dollar";
String relation = null;
relation = symbol( String.Compare(str1, str2, false, new CultureInfo("en-US")) );
Console.WriteLine("For en-US: {0} {1} {2}", str1, relation, str2);
relation = symbol( String.Compare(str1, str2, false, new CultureInfo("cs-CZ")) );
Console.WriteLine("For cs-CZ: {0} {1} {2}", str1, relation, str2);
}
private static String symbol(int r) {
String s = "=";
if (r < 0) s = "<";
else if (r > 0) s = ">";
return s;
}
}
/*
This example produces the following results.
For en-US: change < dollar
For cs-CZ: change > dollar
*/
therefor for accented languages you will need to get the culture then test the strings based on that.
因此,对于重音语言,您需要获取文化,然后基于此测试字符串。
回答by Ryan Cook
The following method CompareIgnoreAccents(...)
works on your example data. Here is the article where I got my background information: http://www.codeproject.com/KB/cs/EncodingAccents.aspx
以下方法CompareIgnoreAccents(...)
适用于您的示例数据。这是我获得背景信息的文章:http: //www.codeproject.com/KB/cs/EncodingAccents.aspx
private static bool CompareIgnoreAccents(string s1, string s2)
{
return string.Compare(
RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0;
}
private static string RemoveAccents(string s)
{
Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");
return destEncoding.GetString(
Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}
I think an extension method would be better:
我认为扩展方法会更好:
public static string RemoveAccents(this string s)
{
Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");
return destEncoding.GetString(
Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}
Then the use would be this:
那么用途是这样的:
if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) {
...
回答by knightpfhor
If you don't need to convert the string and you just want to check for equality you can use
如果您不需要转换字符串而只想检查相等性,则可以使用
string s1 = "hello";
string s2 = "héllo";
if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
{
// both strings are equal
}
or if you want the comparison to be case insensitive as well
或者如果您希望比较也不区分大小写
string s1 = "HEllO";
string s2 = "héLLo";
if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0)
{
// both strings are equal
}
回答by Guish
I had to do something similar but with a StartsWith method. Here is a simple solution derived from @Serge - appTranslator.
我不得不做类似的事情,但使用 StartsWith 方法。这是一个从@Serge 派生的简单解决方案 - appTranslator。
Here is an extension method:
这是一个扩展方法:
public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
{
if (str.Length >= value.Length)
return string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
else
return false;
}
And for one liners freaks ;)
对于一个班轮怪胎;)
public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
{
return str.Length >= value.Length && string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
}
Accent incensitive and case incensitive startsWith can be called like this
Accent incensitive 和 case incensitive startsWith 可以这样调用
value.ToString().StartsWith(str, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase)
回答by Newton Carlos Dantas
A more simple way to remove accents:
一种更简单的去除重音的方法:
Dim source As String = "áéíóú?"
Dim result As String
Dim bytes As Byte() = Encoding.GetEncoding("Cyrillic").GetBytes(source)
result = Encoding.ASCII.GetString(bytes)