是否有为 C# 编写的模糊搜索或字符串相似性函数库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/83777/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 11:23:02  来源:igfitidea点击:

Are there any Fuzzy Search or String Similarity Functions libraries written for C#?

提问by Luca Molteni

There are similar question, but not regarding C# libraries I can use in my source code.

有类似的问题,但与我可以在源代码中使用的 C# 库无关。

Thank you all for your help.

谢谢大家的帮助。

I've already saw lucene, but I need something more easy to search for similar strings and without the overhead of the indexing part.

我已经看过 lucene,但我需要更容易搜索类似字符串的东西,而且没有索引部分的开销。

The answer I marked has got two very easy algorithms, and one uses LINQ too, so it's perfect.

我标记的答案有两种非常简单的算法,一种也使用了 LINQ,因此非常完美。

采纳答案by George Mauer

Levenshtein distance implementation:

Levenshtein 距离实现:

I have a .NET 1.1 project in which I use the latter. It's simplistic, but works perfectly for what I need. From what I remember it needed a bit of tweaking, but nothing that wasn't obvious.

我有一个 .NET 1.1 项目,我在其中使用后者。它很简单,但非常适合我的需要。据我所知,它需要一些调整,但没有什么不明显的。

回答by Isak Savo

The Beagle Projectfor Linux is written in c# (mono) and is a google-desktop like search tool. It may have some code in there for these kind of string matching.

适用于 Linux的Beagle 项目是用 c#(单声道)编写的,是一个类似于 google-desktop 的搜索工具。它可能有一些代码用于这些类型的字符串匹配。

If I recall correctly, it uses the Lucenelibrary for searching and retrieving data. Maybe that can be useful for your project too.

如果我没记错的话,它使用Lucene库来搜索和检索数据。也许这对您的项目也很有用。

回答by Jason Hymanson

Have you taken a look at Lucene.net? It is a port of the Java Lucene search engine API to the .Net platform. That library offers a lot of search functionality. I played around with it a year or so ago, so don't take my suggestion as based on tons of experience. I saw it in the book Windows Developer Power Toolsand took it for a test drive. You might look through their API documentationto see if it offers something like the Fuzzy Search for which you are looking.

你看过Lucene.net吗?它是 Java Lucene 搜索引擎 API 到 .Net 平台的端口。该库提供了很多搜索功能。大约一年前我玩过它,所以不要将我的建议视为基于大量经验。我在Windows Developer Power Tools一书中看到了它,并把它拿来试驾。您可以查看他们的API 文档,看看它是否提供了您正在寻找的模糊搜索之类的东西。

回答by Ed Schwehm

This code project paperhas a string similarity function using the Levenshtein distance.

这个代码项目论文有一个使用Levenshtein distance的字符串相似度函数。

回答by benefactual

There is the following Levenshtein Distance Algorithm which assigns a value to the similarity of two strings (well, the difference actually), that could be used to build upon: http://www.merriampark.com/ldcsharp.htm

There is the following Levenshtein Distance Algorithm which assigns a value to the similarity of two strings (well, the difference actually), that could be used to build upon: http://www.merriampark.com/ldcsharp.htm

回答by George Mauer

I have used "Ternary Search Tree Dictionary in C#" (http://www.codeproject.com/KB/recipes/tst.aspx) to search for similar strings.

I have used "Ternary Search Tree Dictionary in C#" (http://www.codeproject.com/KB/recipes/tst.aspx) to search for similar strings.

Regards, Patricio

Regards, Patricio

回答by Zaffiro

you can also look at the very impressive library titled Sam's String Metrics https://github.com/StefH/SimMetrics.Net. this includes a host of algorithms.

you can also look at the very impressive library titled Sam's String Metrics https://github.com/StefH/SimMetrics.Net. this includes a host of algorithms.

  • Hamming distance
  • Levenshtein distance
  • Needleman-Wunch distance or Sellers Algorithm
  • Smith-Waterman distance
  • Gotoh Distance or Smith-Waterman-Gotoh distance
  • Block distance or L1 distance or City block distance
  • Monge Elkan distance
  • Jaro distance metric
  • Jaro Winkler
  • SoundEx distance metric
  • Matching Coefficient
  • Dice's Coefficient
  • Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient
  • Overlap Coefficient
  • Euclidean distance or L2 distance
  • Cosine similarity
  • Variational distance
  • Hellinger distance or Bhattacharyya distance
  • Information Radius (Jensen-Shannon divergence)
  • Harmonic Mean
  • Skew divergence
  • Confusion Probability
  • Tau
  • Fellegi and Sunters (SFS) metric
  • TFIDF or TF/IDF
  • FastA
  • BlastP
  • Maximal matches
  • q-gram
  • Ukkonen Algorithms
  • Hamming distance
  • Levenshtein distance
  • Needleman-Wunch distance or Sellers Algorithm
  • Smith-Waterman distance
  • Gotoh Distance or Smith-Waterman-Gotoh distance
  • Block distance or L1 distance or City block distance
  • Monge Elkan distance
  • Jaro distance metric
  • Jaro Winkler
  • SoundEx distance metric
  • Matching Coefficient
  • Dice's Coefficient
  • Jaccard Similarity or Jaccard Coefficient or Tanimoto coefficient
  • Overlap Coefficient
  • Euclidean distance or L2 distance
  • Cosine similarity
  • Variational distance
  • Hellinger distance or Bhattacharyya distance
  • Information Radius (Jensen-Shannon divergence)
  • Harmonic Mean
  • Skew divergence
  • Confusion Probability
  • Tau
  • Fellegi and Sunters (SFS) metric
  • TFIDF or TF/IDF
  • FastA
  • BlastP
  • Maximal matches
  • q-gram
  • Ukkonen Algorithms

回答by Tyler Jensen

They are not my own invention, but they are my favorites and I've just blogged about them and published my own tweaked versions of Dice Coefficient, Levenshtein Distance, Longest Common Subsequence and Double Metaphone in a blog post called Four Functions for Finding Fuzzy String Matches in C# Extensions.

They are not my own invention, but they are my favorites and I've just blogged about them and published my own tweaked versions of Dice Coefficient, Levenshtein Distance, Longest Common Subsequence and Double Metaphone in a blog post called Four Functions for Finding Fuzzy String Matches in C# Extensions.