JavaScript/List.js:实现模糊搜索

JavaScript/List.js: Implement a fuzzy search

本文关键字:实现 模糊搜索 js List JavaScript      更新时间:2023-09-26

我正在做这个过滤工作,我有大约50-100个列表项。每个项目都有这样的标记:

<li>
  <input type="checkbox" name="services[]" value="service_id" />
  <span class="name">Restaurant in NY</span>
  <span class="filters"><!-- hidden area -->
    <span class="city">@city: new york</span>
    <span class="region">@reg: ny</span>
    <span class="date">@start: 02/05/2012</span>
    <span class="price">@price: 100</span>
  </span>
</li>

我创建这样的标记是因为我最初使用的是List.js.

我想做的是这样搜索:@region: LA @price: 124等等。问题是我也想显示多个项目,以便选择多个。。。一个:)

我想这需要模糊搜索,但问题是我没有找到任何功能。

由于我的项目数量相当少,我想要一个客户端解决方案。

我在javascript中寻找"模糊搜索",但在这里没有找到解决方案,所以我编写了自己的函数来满足我的需求。

算法非常简单:循环遍历针状字母,并检查它们在干草堆中是否以相同的顺序出现:

String.prototype.fuzzy = function (s) {
    var hay = this.toLowerCase(), i = 0, n = -1, l;
    s = s.toLowerCase();
    for (; l = s[i++] ;) if (!~(n = hay.indexOf(l, n + 1))) return false;
    return true;
};

例如:

('a haystack with a needle').fuzzy('hay sucks');    // false
('a haystack with a needle').fuzzy('sack hand');    // true

另一个(简单)解决方案。不区分大小写,忽略字母顺序。

它对搜索词的每个字母执行检查。如果原始字符串包含该字母,它将向上计数(如果没有,则向下计数)。根据匹配项/字符串长度的比率,它将返回true或false。

String.prototype.fuzzy = function(term, ratio) {
    var string = this.toLowerCase();
    var compare = term.toLowerCase();
    var matches = 0;
    if (string.indexOf(compare) > -1) return true; // covers basic partial matches
    for (var i = 0; i < compare.length; i++) {
        string.indexOf(compare[i]) > -1 ? matches += 1 : matches -=1;
    }
    return (matches/this.length >= ratio || term == "")
};

示例:

("Test").fuzzy("st", 0.5) // returns true
("Test").fuzzy("tes", 0.8) // returns false cause ratio is too low (0.75)
("Test").fuzzy("stet", 1) // returns true
("Test").fuzzy("zzzzzest", 0.75) // returns false cause too many alien characters ("z")
("Test").fuzzy("es", 1) // returns true cause partial match (despite ratio being only 0.5)

一年后,List.js获得了一个很好的模糊搜索插件,效果非常好。

我对list.js不满意,所以我创建了自己的。这可能不完全是模糊搜索,但我不知道该怎么称呼它。我只是想让它与查询匹配,而不考虑我的单词在查询中的顺序。

考虑以下场景:

  • 内存中存在一组文章
  • 查询词的出现顺序无关紧要(例如"hello-world"与"world-hello")
  • 代码应易于阅读

这里有一个例子:

var articles = [{
  title: '2014 Javascript MVC Frameworks Comparison',
  author: 'Guybrush Treepwood'
}, {
  title: 'Javascript in the year 2014',
  author: 'Herman Toothrot'
},
{
  title: 'Javascript in the year 2013',
  author: 'Rapp Scallion'
}];
var fuzzy = function(items, key) {
  // Returns a method that you can use to create your own reusable fuzzy search.
  return function(query) {
    var words  = query.toLowerCase().split(' ');
    return items.filter(function(item) {
      var normalizedTerm = item[key].toLowerCase();
      return words.every(function(word) {
        return (normalizedTerm.indexOf(word) > -1);
      });
    });
  };
};

var searchByTitle = fuzzy(articles, 'title');
searchByTitle('javascript 2014') // returns the 1st and 2nd items

我希望这能帮助到其他人。

我有一个小函数,在数组中搜索字符串(至少对我来说,它比Levenstein产生了更好的结果):

function fuzzy(item,arr) {
  function oc(a) {
    var o = {}; for (var i=0; i<a.length; i++) o[a[i]] = ""; return o;
  }
  var test = [];
  for (var n=1; n<=item.length; n++)
    test.push(item.substr(0,n) + "*" + item.substr(n+1,item.length-n));
  var result = [];
  for (var r=0; r<test.length; r++) for (var i=0; i<arr.length; i++) {
    if (arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[0]) != -1)
    if (arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[1]) != -1)
    if (0 < arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[1]) 
          - arr[i].toLowerCase().indexOf(test[r].toLowerCase().split("*")[0] < 2 ) )
    if (!(arr[i] in oc(result)))  result.push(arr[i]);
  }
  return result;
}

我自己做了。它使用regex,更像是概念验证,因为它完全没有经过压力测试。

享受javascript模糊搜索/模糊匹配http://unamatasanatarai.github.io/FuzzyMatch/test/index.html

此处提供的解决方案返回true/false,并且没有关于哪些部件匹配以及哪些部件不匹配的信息。

在某些情况下,你可能需要知道它,例如,在搜索结果中使你输入的部分加粗

我已经创建了我自己的typescript解决方案(如果你想使用它-我在这里发布了它-https://github.com/pie6k/fuzzystring)并在此处演示https://pie6k.github.io/fuzzystring/

它的工作方式如下:

fuzzyString('liolor', 'lorem ipsum dolor sit');
// returns
{
  parts: [
    { content: 'l', type: 'input' },
    { content: 'orem ', type: 'fuzzy' },
    { content: 'i', type: 'input' },
    { content: 'psum d', type: 'fuzzy' },
    { content: 'olor', type: 'input' },
    { content: ' sit', type: 'suggestion' },
  ],
  score: 0.87,
}

这里是完整的实现(Typescript)

type MatchRoleType = 'input' | 'fuzzy' | 'suggestion';
interface FuzzyMatchPart {
  content: string;
  type: MatchRoleType;
}
interface FuzzyMatchData {
  parts: FuzzyMatchPart[];
  score: number;
}
interface FuzzyMatchOptions {
  truncateTooLongInput?: boolean;
  isCaseSesitive?: boolean;
}
function calculateFuzzyMatchPartsScore(fuzzyMatchParts: FuzzyMatchPart[]) {
  const getRoleLength = (role: MatchRoleType) =>
    fuzzyMatchParts
      .filter((part) => part.type === role)
      .map((part) => part.content)
      .join('').length;
  const fullLength = fuzzyMatchParts.map((part) => part.content).join('')
    .length;
  const fuzzyLength = getRoleLength('fuzzy');
  const inputLength = getRoleLength('input');
  const suggestionLength = getRoleLength('suggestion');
  return (
    (inputLength + fuzzyLength * 0.7 + suggestionLength * 0.9) / fullLength
  );
}
function compareLetters(a: string, b: string, isCaseSensitive = false) {
  if (isCaseSensitive) {
    return a === b;
  }
  return a.toLowerCase() === b.toLowerCase();
}
function fuzzyString(
  input: string,
  stringToBeFound: string,
  { truncateTooLongInput, isCaseSesitive }: FuzzyMatchOptions = {},
): FuzzyMatchData | false {
  // make some validation first
  // if input is longer than string to find, and we dont truncate it - it's incorrect
  if (input.length > stringToBeFound.length && !truncateTooLongInput) {
    return false;
  }
  // if truncate is enabled - do it
  if (input.length > stringToBeFound.length && truncateTooLongInput) {
    input = input.substr(0, stringToBeFound.length);
  }
  // if input is the same as string to be found - we dont need to look for fuzzy match - return it as match
  if (input === stringToBeFound) {
    return {
      parts: [{ content: input, type: 'input' }],
      score: 1,
    };
  }
  const matchParts: FuzzyMatchPart[] = [];
  const remainingInputLetters = input.split('');
  // let's create letters buffers
  // it's because we'll perform matching letter by letter, but if we have few letters matching or not matching in the row
  // we want to add them together as part of match
  let ommitedLettersBuffer: string[] = [];
  let matchedLettersBuffer: string[] = [];
  // helper functions to clear the buffers and add them to match
  function addOmmitedLettersAsFuzzy() {
    if (ommitedLettersBuffer.length > 0) {
      matchParts.push({
        content: ommitedLettersBuffer.join(''),
        type: 'fuzzy',
      });
      ommitedLettersBuffer = [];
    }
  }
  function addMatchedLettersAsInput() {
    if (matchedLettersBuffer.length > 0) {
      matchParts.push({
        content: matchedLettersBuffer.join(''),
        type: 'input',
      });
      matchedLettersBuffer = [];
    }
  }
  for (let anotherStringToBeFoundLetter of stringToBeFound) {
    const inputLetterToMatch = remainingInputLetters[0];
    // no more input - finish fuzzy matching
    if (!inputLetterToMatch) {
      break;
    }
    const isMatching = compareLetters(
      anotherStringToBeFoundLetter,
      inputLetterToMatch,
      isCaseSesitive,
    );
    // if input letter doesnt match - we'll go to the next letter to try again
    if (!isMatching) {
      // add this letter to buffer of ommited letters
      ommitedLettersBuffer.push(anotherStringToBeFoundLetter);
      // in case we had something in matched letters buffer - clear it as matching letters run ended
      addMatchedLettersAsInput();
      // go to the next input letter
      continue;
    }
    // we have input letter matching!
    // remove it from remaining input letters
    remainingInputLetters.shift();
    // add it to matched letters buffer
    matchedLettersBuffer.push(anotherStringToBeFoundLetter);
    // in case we had something in ommited letters buffer - add it to the match now
    addOmmitedLettersAsFuzzy();
    // if there is no more letters in input - add this matched letter to match too
    if (!remainingInputLetters.length) {
      addMatchedLettersAsInput();
    }
  }
  // if we still have letters left in input - means not all input was included in string to find - input was incorrect
  if (remainingInputLetters.length > 0) {
    return false;
  }
  // lets get entire matched part (from start to last letter of input)
  const matchedPart = matchParts.map((match) => match.content).join('');
  // get remaining part of string to be found
  const suggestionPart = stringToBeFound.replace(matchedPart, '');
  // if we have remaining part - add it as suggestion
  if (suggestionPart) {
    matchParts.push({ content: suggestionPart, type: 'suggestion' });
  }
  const score = calculateFuzzyMatchPartsScore(matchParts);
  return {
    score,
    parts: matchParts,
  };
}