JavaScript 模糊搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9206013/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 05:55:48  来源:igfitidea点击:

JavaScript fuzzy search

javascriptfuzzy-search

提问by Ionu? Staicu

I'm working on this filtering thing where I have about 50-100 list items. And each items have markup like this:

我正在处理这个过滤的事情,我有大约 50-100 个列表项。每个项目都有这样的标记:

<li>
  <input type="checkbox" name="services[]" value="service_id" />
  <span class="name">Restaurant in NY</span>
  <span class="filters"><!-- hidden area -->
    <span class="city">@city: new york</span>
    <span class="region">@reg: ny</span>
    <span class="date">@start: 02/05/2012</span>
    <span class="price">@price: 100</span>
  </span>
</li>

I created markup like this because I initally used List.js

我创建这样的标记是因为我最初使用的是List.js

So, probably you guessed already, what I want is to do searches like this:@region: LA @price: 124and so on. The problem is that i also want to display more than one item, in order to select more than... one :)

所以,可能你已经猜到了,我想要的是做这样的搜索:@region: LA @price: 124等等。问题是我还想显示多个项目,以便选择多个......一个:)

I assume this needs fuzzy search, but the problem is that i didn't find anything functional.

我认为这需要模糊搜索,但问题是我没有找到任何功能。

Any idea or starting point?

任何想法或起点?

// edit: because I have a fairly small amount of items, I would like a client side solution.

// 编辑:因为我的项目数量很少,我想要一个客户端解决方案。

采纳答案by Ionu? Staicu

One year later, List.js got a nice plugin for fuzzy searchthat works pretty great.

一年后,List.js 得到了一个很好的模糊搜索插件,效果非常好。

回答by Dziad Borowy

I was looking for "fuzzy search" in javascript but haven't found a solution here, so I wrote my own function that does what I need.

我一直在 javascript 中寻找“模糊搜索”,但在这里没有找到解决方案,所以我编写了自己的函数来满足我的需要。

The algorithm is very simple: loop through needle letters and check if they occur in the same order in the haystack:

算法非常简单:遍历针头字母并检查它们是否以相同的顺序出现在大海捞针中:

String.prototype.fuzzy = function (s) {
    var hay = this.toLowerCase(), i = 0, n = -1, l;
    s = s.toLowerCase();
    for (; l = s[i++] ;) if (!~(n = hay.indexOf(l, n + 1))) return false;
    return true;
};

e.g.:

例如:

('a haystack with a needle').fuzzy('hay sucks');    // false
('a haystack with a needle').fuzzy('sack hand');    // true

回答by gnj

Another (simple) solution. Non case-sensitive and ignores order of the letters.

另一个(简单的)解决方案。不区分大小写并忽略字母顺序。

It performs a check for every letter of the search-term. If the original string contains that letter, it will count up (or down if it doesn't). Based on the ratio of matches / string-length it will return true or false.

它对搜索词的每个字母进行检查。如果原始字符串包含该字母,它将向上计数(如果不包含则向下计数)。根据匹配/字符串长度的比率,它将返回 true 或 false。

String.prototype.fuzzy = function(term, ratio) {
    var string = this.toLowerCase();
    var compare = term.toLowerCase();
    var matches = 0;
    if (string.indexOf(compare) > -1) return true; // covers basic partial matches
    for (var i = 0; i < compare.length; i++) {
        string.indexOf(compare[i]) > -1 ? matches += 1 : matches -=1;
    }
    return (matches/this.length >= ratio || term == "")
};

Examples:

例子:

("Test").fuzzy("st", 0.5) // returns true
("Test").fuzzy("tes", 0.8) // returns false cause ratio is too low (0.75)
("Test").fuzzy("stet", 1) // returns true
("Test").fuzzy("zzzzzest", 0.75) // returns false cause too many alien characters ("z")
("Test").fuzzy("es", 1) // returns true cause partial match (despite ratio being only 0.5)

回答by Roland Hentschel

I have a little function, searching a string in an array ( at least for me it produces better results than levenshtein ):

我有一个小函数,在数组中搜索一个字符串(至少对我来说它比 levenshtein 产生更好的结果):

function fuzzy(item,arr) {
  function oc(a) {
    var o = {}; for (var i=0; i<a.length; i++) o[a[i]] = ""; return o;
  }
  var test = [];
  for (var n=1; n<=item.length; n++)
    test.push(item.substr(0,n) + "*" + item.substr(n+1,item.length-n));
  var result = [];
  for (var r=0; r<test.length; r++) for (var i=0; i<arr.length; i++) {
    if (arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[0]) != -1)
    if (arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[1]) != -1)
    if (0 < arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[1]) 
          - arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[0] < 2 ) )
    if (!(arr[i] in oc(result)))  result.push(arr[i]);
  }
  return result;
}

回答by Unamata Sanatarai

And I made my own. It uses regexand serves more like a proof-of-concept as it is completely not stress-tested.

我自己做的。它使用正则表达式,更像是一个概念验证,因为它完全没有经过压力测试。

enjoy the javascript fuzzy search/fuzzy matchhttp://unamatasanatarai.github.io/FuzzyMatch/test/index.html

享受javascript 模糊搜索/模糊匹配http://unamatasanatarai.github.io/FuzzyMatch/test/index.html

回答by Marc Lundgren

I wasn't satisfied with list.js, so I created my own. This is probably not exactly fuzzy search, but I don't know what to call it. I simply wanted it to match a query without regard to the order of my words in the query.

我对 list.js 不满意,所以我创建了自己的。这可能不是完全模糊的搜索,但我不知道该怎么称呼它。我只是想让它匹配查询而不考虑我在查询中的单词顺序。

Consider the following scenario:

考虑以下场景:

  • a collection of articles in memory exists
  • order of query words appearance doesn't matter (e.g. "hello world" vs "world hello")
  • The code should be easily readable
  • 内存中的文章集合存在
  • 查询词出现的顺序无关紧要(例如“hello world”与“world hello”)
  • 代码应该易于阅读

Here is an example:

下面是一个例子:

var articles = [{
  title: '2014 Javascript MVC Frameworks Comparison',
  author: 'Guybrush Treepwood'
}, {
  title: 'Javascript in the year 2014',
  author: 'Herman Toothrot'
},
{
  title: 'Javascript in the year 2013',
  author: 'Rapp Scallion'
}];

var fuzzy = function(items, key) {
  // Returns a method that you can use to create your own reusable fuzzy search.

  return function(query) {
    var words  = query.toLowerCase().split(' ');

    return items.filter(function(item) {
      var normalizedTerm = item[key].toLowerCase();

      return words.every(function(word) {
        return (normalizedTerm.indexOf(word) > -1);
      });
    });
  };
};


var searchByTitle = fuzzy(articles, 'title');

searchByTitle('javascript 2014') // returns the 1st and 2nd items

Well, I hope this helps someone out there.

好吧,我希望这可以帮助那里的人。

回答by pie6k

Solutions provided here returns true/falseand no information about what part was matched and what part was not.

此处提供的解决方案返回true/false并且没有关于哪个部分匹配以及哪个部分不匹配的信息。

In some cases, you might need to know it eg. to make parts of your input bold in the search results

在某些情况下,您可能需要知道它,例如。在搜索结果中将您的部分输入加粗

I've created my own solution in typescript (if you want to use it - I've published it here - https://github.com/pie6k/fuzzystring) and demo here https://pie6k.github.io/fuzzystring/

我在打字稿中创建了自己的解决方案(如果你想使用它 - 我已经在这里发布 - https://github.com/pie6k/fuzzystring)和演示在这里https://pie6k.github.io/fuzzystring /

It works like:

它的工作原理如下:

fuzzyString('liolor', 'lorem ipsum dolor sit');

// returns
{
  parts: [
    { content: 'l', type: 'input' },
    { content: 'orem ', type: 'fuzzy' },
    { content: 'i', type: 'input' },
    { content: 'psum d', type: 'fuzzy' },
    { content: 'olor', type: 'input' },
    { content: ' sit', type: 'suggestion' },
  ],
  score: 0.87,
}

Here is full implementation (Typescript)

这是完整的实现(Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';

interface FuzzyMatchPart {
  content: string;
  type: MatchRoleType;
}

interface FuzzyMatchData {
  parts: FuzzyMatchPart[];
  score: number;
}

interface FuzzyMatchOptions {
  truncateTooLongInput?: boolean;
  isCaseSesitive?: boolean;
}

function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart[]) {
  const getRoleLength = (role: MatchRoleType) =>
    fuzzyMatchParts
      .filter((part) => part.type === role)
      .map((part) => part.content)
      .join('').length;

  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
    .length;
  const fuzzyLength = getRoleLength('fuzzy');
  const inputLength = getRoleLength('input');
  const suggestionLength = getRoleLength('suggestion');

  return (
    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
  );
}

function compareLetters(a: string, b: string, isCaseSensitive = false) {
  if (isCaseSensitive) {
    return a === b;
  }
  return a.toLowerCase() === b.toLowerCase();
}

function fuzzyString(
  input: string,
  stringToBeFound: string,
  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
  // make some validation first

  // if input is longer than string to find, and we dont truncate it - it's incorrect
  if (input.length > stringToBeFound.length && !truncateTooLongInput) {
    return false;
  }

  // if truncate is enabled - do it
  if (input.length > stringToBeFound.length && truncateTooLongInput) {
    input = input.substr(0, stringToBeFound.length);
  }

  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
  if (input === stringToBeFound) {
    return {
      parts: [{ content: input, type: 'input' }],
      score: 1,
    };
  }

  const matchParts: FuzzyMatchPart[] = [];

  const remainingInputLetters = input.split('');

  // let's create letters buffers
  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
  // we want to add them together as part of match
  let ommitedLettersBuffer: string[] = [];
  let matchedLettersBuffer: string[] = [];

  // helper functions to clear the buffers and add them to match
  function addOmmitedLettersAsFuzzy() {
    if (ommitedLettersBuffer.length > 0) {
      matchParts.push({
        content: ommitedLettersBuffer.join(''),
        type: 'fuzzy',
      });
      ommitedLettersBuffer = [];
    }
  }

  function addMatchedLettersAsInput() {
    if (matchedLettersBuffer.length > 0) {
      matchParts.push({
        content: matchedLettersBuffer.join(''),
        type: 'input',
      });
      matchedLettersBuffer = [];
    }
  }

  for (let anotherStringToBeFoundLetter of stringToBeFound) {
    const inputLetterToMatch = remainingInputLetters[0];

    // no more input - finish fuzzy matching
    if (!inputLetterToMatch) {
      break;
    }

    const isMatching = compareLetters(
      anotherStringToBeFoundLetter,
      inputLetterToMatch,
      isCaseSesitive,
    );

    // if input letter doesnt match - we'll go to the next letter to try again
    if (!isMatching) {
      // add this letter to buffer of ommited letters
      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
      // in case we had something in matched letters buffer - clear it as matching letters run ended
      addMatchedLettersAsInput();
      // go to the next input letter
      continue;
    }

    // we have input letter matching!

    // remove it from remaining input letters
    remainingInputLetters.shift();

    // add it to matched letters buffer
    matchedLettersBuffer.push(anotherStringToBeFoundLetter);
    // in case we had something in ommited letters buffer - add it to the match now
    addOmmitedLettersAsFuzzy();

    // if there is no more letters in input - add this matched letter to match too
    if (!remainingInputLetters.length) {
      addMatchedLettersAsInput();
    }
  }

  // if we still have letters left in input - means not all input was included in string to find - input was incorrect
  if (remainingInputLetters.length > 0) {
    return false;
  }

  // lets get entire matched part (from start to last letter of input)
  const matchedPart = matchParts.map((match) => match.content).join('');

  // get remaining part of string to be found
  const suggestionPart = stringToBeFound.replace(matchedPart, '');

  // if we have remaining part - add it as suggestion
  if (suggestionPart) {
    matchParts.push({ content: suggestionPart, type: 'suggestion' });
  }
  const score = calculateFuzzyMatchPartsScore(matchParts);

  return {
    score,
    parts: matchParts,
  };
}