在字符串比较中忽略重音字母

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/359827/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:29:36  来源:igfitidea点击:

Ignoring accented letters in string comparison

c#stringlocalization

提问by Jon Tackabury

I need to compare 2 strings in C# and treat accented letters the same as non-accented letters. For example:

我需要在 C# 中比较 2 个字符串,并将重音字母与非重音字母相同。例如:

string s1 = "hello";
string s2 = "héllo";

s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);

These 2 strings need to be the same (as far as my application is concerned), but both of these statements evaluate to false. Is there a way in C# to do this?

这两个字符串必须相同(就我的应用程序而言),但是这两个语句都评估为 false。C# 中有没有办法做到这一点?

采纳答案by Serge Wautier

EDIT 2012-01-20: Oh boy! The solution was so much simpler and has been in the framework nearly forever. As pointed out by knightpfhor:

编辑 2012-01-20:哦,天哪!解决方案要简单得多,并且几乎永远存在于框架中。正如 Knightpfhor 指出的那样

string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);


Here's a function that strips diacritics from a string:

这是一个从字符串中去除变音符号的函数:

static string RemoveDiacritics(string text)
{
  string formD = text.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in formD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return sb.ToString().Normalize(NormalizationForm.FormC);
}

More details on MichKap's blog(RIP...).

有关MichKap 博客的更多详细信息( RIP...)。

The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.

原理是将 'é' 变成 2 个连续的字符 'e',锐角。然后它遍历字符并跳过变音符号。

"héllo" becomes "he<acute>llo", which in turn becomes "hello".

“你好”变成了“他<急性> llo”,反过来又变成了“你好”。

Debug.Assert("hello"==RemoveDiacritics("héllo"));


Note: Here's a more compact .NET4+ friendly version of the same function:

注意:这是相同功能的更紧凑的 .NET4+ 友好版本:

static string RemoveDiacritics(string text)
{
  return string.Concat( 
      text.Normalize(NormalizationForm.FormD)
      .Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
                                    UnicodeCategory.NonSpacingMark)
    ).Normalize(NormalizationForm.FormC);
}

回答by Jon Tackabury

try this overload on the String.Compare Method.

在 String.Compare 方法上试试这个重载。

String.Compare Method (String, String, Boolean, CultureInfo)

String.Compare 方法(字符串、字符串、布尔值、CultureInfo)

It produces a int value based on the compare operations including cultureinfo. the example in the page compares "Change" in en-US and en-CZ. CH in en-CZ is a single "letter".

它根据包括cultureinfo 在内的比较操作生成一个int 值。页面中的示例比较了 en-US 和 en-CZ 中的“Change”。en-CZ 中的 CH 是单个“字母”。

example from the link

来自链接的示例

using System;
using System.Globalization;

class Sample {
    public static void Main() {
    String str1 = "change";
    String str2 = "dollar";
    String relation = null;

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("en-US")) );
    Console.WriteLine("For en-US: {0} {1} {2}", str1, relation, str2);

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("cs-CZ")) );
    Console.WriteLine("For cs-CZ: {0} {1} {2}", str1, relation, str2);
    }

    private static String symbol(int r) {
    String s = "=";
    if      (r < 0) s = "<";
    else if (r > 0) s = ">";
    return s;
    }
}
/*
This example produces the following results.
For en-US: change < dollar
For cs-CZ: change > dollar
*/

therefor for accented languages you will need to get the culture then test the strings based on that.

因此,对于重音语言,您需要获取文化,然后基于此测试字符串。

http://msdn.microsoft.com/en-us/library/hyxc48dt.aspx

http://msdn.microsoft.com/en-us/library/hyxc48dt.aspx

回答by Ryan Cook

The following method CompareIgnoreAccents(...)works on your example data. Here is the article where I got my background information: http://www.codeproject.com/KB/cs/EncodingAccents.aspx

以下方法CompareIgnoreAccents(...)适用于您的示例数据。这是我获得背景信息的文章:http: //www.codeproject.com/KB/cs/EncodingAccents.aspx

private static bool CompareIgnoreAccents(string s1, string s2)
{
    return string.Compare(
        RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0;
}

private static string RemoveAccents(string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

I think an extension method would be better:

我认为扩展方法会更好:

public static string RemoveAccents(this string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

Then the use would be this:

那么用途是这样的:

if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) {
   ...

回答by knightpfhor

If you don't need to convert the string and you just want to check for equality you can use

如果您不需要转换字符串而只想检查相等性,则可以使用

string s1 = "hello";
string s2 = "héllo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
{
    // both strings are equal
}

or if you want the comparison to be case insensitive as well

或者如果您希望比较也不区分大小写

string s1 = "HEllO";
string s2 = "héLLo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0)
{
    // both strings are equal
}

回答by Guish

I had to do something similar but with a StartsWith method. Here is a simple solution derived from @Serge - appTranslator.

我不得不做类似的事情,但使用 StartsWith 方法。这是一个从@Serge 派生的简单解决方案 - appTranslator。

Here is an extension method:

这是一个扩展方法:

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        if (str.Length >= value.Length)
            return string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
        else
            return false;            
    }

And for one liners freaks ;)

对于一个班轮怪胎;)

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        return str.Length >= value.Length && string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
    }

Accent incensitive and case incensitive startsWith can be called like this

Accent incensitive 和 case incensitive startsWith 可以这样调用

value.ToString().StartsWith(str, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase)

回答by Newton Carlos Dantas

A more simple way to remove accents:

一种更简单的去除重音的方法:

    Dim source As String = "áéíóú?"
    Dim result As String

    Dim bytes As Byte() = Encoding.GetEncoding("Cyrillic").GetBytes(source)
    result = Encoding.ASCII.GetString(bytes)