如何忽略高光功能中的重音符号

时间:2020-03-06 14:48:26  来源:igfitidea点击:

我有一个微型微型搜索引擎,该引擎在Rails应用程序中突出显示搜索词。搜索忽略重音符号,并且突出显示不区分大小写。几乎完美。
但是,例如,如果我有一条文本为" p?o de queijo"的记录,并搜索" pao de queijo",则返回记录,但不会突出显示iext。同样,如果我搜索" p?o de queijo",则返回记录,但未正确突出显示。

我的代码很简单:

<%= highlight(result_pessoa.observacoes, search_string, '<span style="background-color: yellow;"></span>') %>

解决方案

听起来我们正在使用两种不同的方法来确定是否已发生匹配:一种用于搜索,另一种用于高亮。使用与搜索突出显示相同的方法,它应该将其选中,不是吗?

也许我们是直接针对MySQL数据库搜索UTF-8字符串?

正确配置的MySQL服务器(以及可能的任何其他主流数据库服务器)将通过不区分大小写和不区分重音的比较正确执行。

但是,Ruby并非如此。从1.8版开始,Ruby不支持Unicode字符串。因此,我们可以从数据库服务器获得正确的结果,但是使用gsub的Rails突出显示功能无法找到搜索字符串。我们需要使用支持Unicode的字符串库(例如ICU4R)来重新实现突出显示。

我刚刚向Rails提交了一个补丁,可以解决此问题。

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/3593-patch-support-for-highlighting-with-ignoring-special-chars

# Highlights one or more +phrases+ everywhere in +text+ by inserting it into
  # a <tt>:highlighter</tt> string. The highlighter can be specialized by passing <tt>:highlighter</tt>
  # as a single-quoted string with  where the phrase is to be inserted (defaults to
  # '<strong class="highlight"></strong>')
  #
  # ==== Examples
  #   highlight('You searched for: rails', 'rails')
  #   # => You searched for: <strong class="highlight">rails</strong>
  #
  #   highlight('You searched for: ruby, rails, dhh', 'actionpack')
  #   # => You searched for: ruby, rails, dhh
  #
  #   highlight('You searched for: rails', ['for', 'rails'], :highlighter => '<em></em>')
  #   # => You searched <em>for</em>: <em>rails</em>
  #
  #   highlight('You searched for: rails', 'rails', :highlighter => '<a href="search?q="></a>')
  #   # => You searched for: <a href="search?q=rails">rails</a>
  #
  #   highlight('?umné diev?atá', ['?umňe', 'dievca'], :ignore_special_chars => true)
  #   # => <strong class="highlight">?umné</strong> <strong class="highlight">diev?a</strong>tá  
  #
  # You can still use <tt>highlight</tt> with the old API that accepts the
  # +highlighter+ as its optional third parameter:
  #   highlight('You searched for: rails', 'rails', '<a href="search?q="></a>')     # => You searched for: <a href="search?q=rails">rails</a>
  def highlight(text, phrases, *args)
    options = args.extract_options!
    unless args.empty?
      options[:highlighter] = args[0] || '<strong class="highlight"></strong>'
    end
    options.reverse_merge!(:highlighter => '<strong class="highlight"></strong>')

    if text.blank? || phrases.blank?
      text
    else
      haystack = text.clone
      match = Array(phrases).map { |p| Regexp.escape(p) }.join('|')
      if options[:ignore_special_chars]
        haystack = haystack.mb_chars.normalize(:kd)
        match = match.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]+/n, '').gsub(/\w/, '##代码##[^\x00-\x7F]*')
      end
      highlighted = haystack.gsub(/(#{match})(?!(?:[^<]*?)(?:["'])[^<>]*>)/i, options[:highlighter])
      highlighted = highlighted.mb_chars.normalize(:kc) if options[:ignore_special_chars]
      highlighted
    end
  end

这是我的文章,它解释了不需要Rails或者ActiveSupport的优雅解决方案。