从 UTF-8 国际字符中删除重音符号的 Ruby 方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15686752/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 21:41:53  来源:igfitidea点击:

Ruby method to remove accents from UTF-8 international characters

ruby-on-railsutf-8internationalization

提问by Gus Shortz

I am trying to create a 'normalized' copy of a string, to help reduce duplicate names in a database. The names contain many international characters (ie. accented letters), and I want to create a copy with the accents removed.

我正在尝试创建字符串的“规范化”副本,以帮助减少数据库中的重复名称。名称包含许多国际字符(即重音字母),我想创建一个删除重音的副本。

I did come across the method below, but cannot get it to work. I can't seem to find what the Unicode Hacks plugin is.

我确实遇到了下面的方法,但无法让它发挥作用。我似乎找不到 Unicode Hacks 插件是什么。

  # Utility method that retursn an ASCIIfied, downcased, and sanitized string.
  # It relies on the Unicode Hacks plugin by means of String#chars. We assume
  # $KCODE is 'u' in environment.rb. By now we support a wide range of latin
  # accented letters, based on the Unicode Character Palette bundled inMacs.
  def self.normalize(str)
     n = str.chars.downcase.strip.to_s
     n.gsub!(/[? ???¢?£?¤?¥???]/u,    'a')
     n.gsub!(/?|/u,                  'ae')
     n.gsub!(/[???]/u,                'd')
     n.gsub!(/[?§???????]/u,          'c')
     n.gsub!(/[?¨???a????????????]/u, 'e')
     n.gsub!(/??/u,                   'f')
     n.gsub!(/[??????£]/u,            'g')
     n.gsub!(/[?¥?§]/,                'h')
     n.gsub!(/[?????-???ˉ?????-]/u,     'i')
     n.gsub!(/[?ˉ?±?3?μ]/u,           'j')
     n.gsub!(/[?·??]/u,               'k')
     n.gsub!(/[?????o????]/u,         'l')
     n.gsub!(/[?±??????????]/u,       'n')
     n.gsub!(/[?2?3?′?μ?????????]/u,  'o')
     n.gsub!(/??/u,                  'oe')
     n.gsub!(/??/u,                   'q')
     n.gsub!(/[??????]/u,             'r')
     n.gsub!(/[???????è?]/u,          's')
     n.gsub!(/[?¥?£?§è?]/u,           't')
     n.gsub!(/[?1?o???????ˉ?±?-???3]/u,'u')
     n.gsub!(/?μ/u,                   'w')
     n.gsub!(/[?????·]/u,             'y')
     n.gsub!(/[?????o]/u,             'z')
     n.gsub!(/\s+/,                   ' ')
     n.gsub!(/[^\sa-z0-9_-]/,          '')
     n
  end

Do I need to 'require' a particular library/gem? Or maybe someone could recommend another way to go about this.

我需要“要求”一个特定的图书馆/宝石吗?或者也许有人可以推荐另一种方法来解决这个问题。

I am not using Rails, nor do I plan on doing so.

我没有使用 Rails,也不打算这样做。

回答by user2398029

I generally use I18n to handle this:

我一般使用 I18n 来处理这个:

1.9.3p392 :001 > require "i18n"
 => true
1.9.3p392 :002 > I18n.transliterate("Hé les mecs!")
 => "He les mecs!"

回答by Gus Shortz

So far the following is the only way I've been able to accomplish what I need:

到目前为止,以下是我能够完成我需要的唯一方法:

str.tr(
"àá????àáa???āā??????????????De????èéê?èéê?ēē??????ěě????????????ìí??ìí????īī????????????????????????ń???ň???òó????òó????ōō?????????????????????????ùú?üùú?ü??ūū??????????Yy??????????",
"AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz")

But using this feels very 'hackish', and I would love to find a better way.

但是使用它感觉非常“hackish”,我很想找到更好的方法。

回答by AlexGuti

The parameterizemethod could be a nice and simple solution to remove special characters in order to use the string as human readable identifier:

所述参数化的方法可以是一个很好的和简单的解决方案,以便使用字符串作为人类可读的标识符,以除去特殊字符:

> "Fran?oise Isa?e".parameterize
=> "francoise-isaie"

回答by Naved Khan

If you are using rails,

如果您使用的是导轨,

my_string = "L'Oréal"
my_string.parameterize(separator=' ')