在字符串比较中忽略重音字母

Question

提问by Jon Tackabury

I need to compare 2 strings in C# and treat accented letters the same as non-accented letters. For example:

我需要在 C# 中比较 2 个字符串，并将重音字母与非重音字母相同。例如：

string s1 = "hello";
string s2 = "héllo";

s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);

These 2 strings need to be the same (as far as my application is concerned), but both of these statements evaluate to false. Is there a way in C# to do this?

这两个字符串必须相同（就我的应用程序而言），但是这两个语句都评估为 false。C# 中有没有办法做到这一点？

Answer 1

采纳答案by Serge Wautier

EDIT 2012-01-20: Oh boy! The solution was so much simpler and has been in the framework nearly forever. As pointed out by knightpfhor:

编辑 2012-01-20：哦，天哪！解决方案要简单得多，并且几乎永远存在于框架中。正如 Knightpfhor 指出的那样：

string.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace);

Here's a function that strips diacritics from a string:

这是一个从字符串中去除变音符号的函数：

static string RemoveDiacritics(string text)
{
  string formD = text.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in formD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return sb.ToString().Normalize(NormalizationForm.FormC);
}

More details on MichKap's blog(RIP...).

有关MichKap 博客的更多详细信息( RIP...)。

The principle is that is it turns 'é' into 2 successive chars 'e', acute. It then iterates through the chars and skips the diacritics.

原理是将 'é' 变成 2 个连续的字符 'e'，锐角。然后它遍历字符并跳过变音符号。

"héllo" becomes "he<acute>llo", which in turn becomes "hello".

“你好”变成了“他<急性> llo”，反过来又变成了“你好”。

Debug.Assert("hello"==RemoveDiacritics("héllo"));

Note: Here's a more compact .NET4+ friendly version of the same function:

注意：这是相同功能的更紧凑的 .NET4+ 友好版本：

static string RemoveDiacritics(string text)
{
  return string.Concat( 
      text.Normalize(NormalizationForm.FormD)
      .Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
                                    UnicodeCategory.NonSpacingMark)
    ).Normalize(NormalizationForm.FormC);
}

Answer 2

回答by Jon Tackabury

try this overload on the String.Compare Method.

在 String.Compare 方法上试试这个重载。

String.Compare Method (String, String, Boolean, CultureInfo)

String.Compare 方法（字符串、字符串、布尔值、CultureInfo）

It produces a int value based on the compare operations including cultureinfo. the example in the page compares "Change" in en-US and en-CZ. CH in en-CZ is a single "letter".

它根据包括cultureinfo 在内的比较操作生成一个int 值。页面中的示例比较了 en-US 和 en-CZ 中的“Change”。en-CZ 中的 CH 是单个“字母”。

example from the link

来自链接的示例

using System;
using System.Globalization;

class Sample {
    public static void Main() {
    String str1 = "change";
    String str2 = "dollar";
    String relation = null;

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("en-US")) );
    Console.WriteLine("For en-US: {0} {1} {2}", str1, relation, str2);

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("cs-CZ")) );
    Console.WriteLine("For cs-CZ: {0} {1} {2}", str1, relation, str2);
    }

    private static String symbol(int r) {
    String s = "=";
    if      (r < 0) s = "<";
    else if (r > 0) s = ">";
    return s;
    }
}
/*
This example produces the following results.
For en-US: change < dollar
For cs-CZ: change > dollar
*/

therefor for accented languages you will need to get the culture then test the strings based on that.

因此，对于重音语言，您需要获取文化，然后基于此测试字符串。

http://msdn.microsoft.com/en-us/library/hyxc48dt.aspx

Answer 3

回答by Ryan Cook

The following method CompareIgnoreAccents(...)works on your example data. Here is the article where I got my background information: http://www.codeproject.com/KB/cs/EncodingAccents.aspx

以下方法CompareIgnoreAccents(...)适用于您的示例数据。这是我获得背景信息的文章：http: //www.codeproject.com/KB/cs/EncodingAccents.aspx

private static bool CompareIgnoreAccents(string s1, string s2)
{
    return string.Compare(
        RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0;
}

private static string RemoveAccents(string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

I think an extension method would be better:

我认为扩展方法会更好：

public static string RemoveAccents(this string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

Then the use would be this:

那么用途是这样的：

if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) {
   ...

Answer 4

回答by knightpfhor

If you don't need to convert the string and you just want to check for equality you can use

如果您不需要转换字符串而只想检查相等性，则可以使用

string s1 = "hello";
string s2 = "héllo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
{
    // both strings are equal
}

or if you want the comparison to be case insensitive as well

或者如果您希望比较也不区分大小写

string s1 = "HEllO";
string s2 = "héLLo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0)
{
    // both strings are equal
}

Answer 5

回答by Guish

I had to do something similar but with a StartsWith method. Here is a simple solution derived from @Serge - appTranslator.

我不得不做类似的事情，但使用 StartsWith 方法。这是一个从@Serge 派生的简单解决方案 - appTranslator。

Here is an extension method:

这是一个扩展方法：

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        if (str.Length >= value.Length)
            return string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
        else
            return false;            
    }

And for one liners freaks ;)

对于一个班轮怪胎;)

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        return str.Length >= value.Length && string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
    }

Accent incensitive and case incensitive startsWith can be called like this

Accent incensitive 和 case incensitive startsWith 可以这样调用

value.ToString().StartsWith(str, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase)

Answer 6

回答by Newton Carlos Dantas

A more simple way to remove accents:

一种更简单的去除重音的方法：

    Dim source As String = "áéíóú?"
    Dim result As String

    Dim bytes As Byte() = Encoding.GetEncoding("Cyrillic").GetBytes(source)
    result = Encoding.ASCII.GetString(bytes)

在字符串比较中忽略重音字母

提问by Jon Tackabury

采纳答案by Serge Wautier

回答by Jon Tackabury

回答by Ryan Cook

回答by knightpfhor

回答by Guish

回答by Newton Carlos Dantas

相关推荐

最近更新

标签

在字符串比较中忽略重音字母

提问by Jon Tackabury

采纳答案by Serge Wautier

回答by Jon Tackabury

回答by Ryan Cook

回答by knightpfhor

回答by Guish

回答by Newton Carlos Dantas

相关推荐

C# 以编程方式将用户权限添加到 Sharepoint 中的列表

如何将命令的输出直接复制到剪贴板？

Linux 服务不支持 chkconfig

如何在 C# 中以编程方式安装 Windows 服务？

相关推荐

最近更新

标签