如何比较 Java 中几乎相似的字符串？（字符串距离测量）

Question

提问by hsmit

I would like to compare two strings and get some score how much these look alike. For example "The sentence is almost similar"and "The sentence is similar".

我想比较两个字符串并计算它们的相似程度。例如“句子几乎相似”和“句子相似”。

I'm not familiar with existing methods in Java, but for PHP I know the levenshtein function.

我不熟悉 Java 中的现有方法，但对于 PHP，我知道levenshtein 函数。

Are there better methods in Java?

Java 中有更好的方法吗？

Answer 1

采纳答案by Joey

The Levensthein distance isa measure for how similar strings are. Or, more precisely, how many alterations have to be made that they are the same.

Levensthein 距离是衡量字符串相似程度的指标。或者，更准确地说，必须进行多少更改才能使它们相同。

The algorithmis available in pseudo-code on Wikipedia. Converting that to Java shouldn't be much of a problem, but it's not built-in into the base class library.

该算法在维基百科的伪代码中可用。将其转换为 Java 应该不是什么大问题，但它并没有内置到基类库中。

Wikipediahas some more algorithms that measure similarity of strings.

维基百科有更多的算法来衡量字符串的相似性。

Answer 2

回答by jspcal

yeah thats a good metric, you could use StringUtil.getLevenshteinDistance()from apache commons

是的，这是一个很好的指标，您可以使用apache commons 中的StringUtil.getLevenshteinDistance()

Answer 3

回答by FiveO

The following Java libraries offer multiple compare algorithms (Levenshtein,Jaro Winkler,...):

以下 Java 库提供了多种比较算法（Levenshtein、Jaro Winkler...）：

Apache Commons Lang 3: https://commons.apache.org/proper/commons-lang/
Simmetrics: http://sourceforge.net/projects/simmetrics/

Apache Commons Lang 3：https: //commons.apache.org/proper/commons-lang/
Simmetrics：http: //sourceforge.net/projects/simmetrics/

Both libraries have a java documentation (Apache Commons Lang Javadoc,Simmetrics Javadoc).

这两个库都有一个 Java 文档（Apache Commons Lang Javadoc、Simmetrics Javadoc）。

//Usage of Apache Commons Lang 3
import org.apache.commons.lang3.StringUtils;   
public double compareStrings(String stringA, String stringB) {
    return StringUtils.getJaroWinklerDistance(stringA, stringB);
}

 //Usage of Simmetrics
import uk.ac.shef.wit.simmetrics.similaritymetrics.JaroWinkler    
public double compareStrings(String stringA, String stringB) {
    JaroWinkler algorithm = new JaroWinkler();
    return algorithm.getSimilarity(stringA, stringB);
}

Answer 4

回答by Thibault Debatty

You can find implementations of Levenshtein and other string similarity/distance measures on https://github.com/tdebatty/java-string-similarity

您可以在https://github.com/tdebatty/java-string-similarity上找到 Levenshtein 和其他字符串相似性/距离度量的实现

If your project uses maven, installation is as simple as

如果你的项目使用maven，安装就这么简单

<dependency>
  <groupId>info.debatty</groupId>
  <artifactId>java-string-similarity</artifactId>
  <version>RELEASE</version>
</dependency>

Then, to use Levenshtein for example

然后，以使用 Levenshtein 为例

import info.debatty.java.stringsimilarity.*;

public class MyApp {

  public static void main (String[] args) {
    Levenshtein l = new Levenshtein();

    System.out.println(l.distance("My string", "My $tring"));
    System.out.println(l.distance("My string", "My $tring"));
    System.out.println(l.distance("My string", "My $tring"));
  }
}

Answer 5

回答by Vaibhav Kumar

Shameless plug, but I wrote a library also:

无耻的插件，但我也写了一个库：

https://github.com/vickumar1981/stringdistance

It has all these functions, plus a few for phonetic similarity (if one word "sounds like" another word - returns either true or false unlike the other fuzzy similarities which are numbers between 0-1).

它具有所有这些功能，加上一些语音相似性（如果一个词“听起来像”另一个词 - 与其他模糊相似性（0-1之间的数字）不同，返回真或假）。

Also includes dna sequencing algorithms like Smith-Waterman and Needleman-Wunsch which are generalized versions of Levenshtein.

还包括 dna 测序算法，如 Smith-Waterman 和 Needleman-Wunsch，它们是 Levenshtein 的通用版本。

I plan, in the near future, on making this work with any array and not just strings (an array of characters).

我计划在不久的将来，使这项工作适用于任何数组，而不仅仅是字符串（字符数组）。

如何比较 Java 中几乎相似的字符串？（字符串距离测量）

提问by hsmit

采纳答案by Joey

回答by jspcal

回答by FiveO

回答by Thibault Debatty

回答by Vaibhav Kumar

相关推荐

最近更新

标签

如何比较 Java 中几乎相似的字符串？（字符串距离测量）

提问by hsmit

采纳答案by Joey

回答by jspcal

回答by FiveO

回答by Thibault Debatty

回答by Vaibhav Kumar

相关推荐

Java IllegalArgumentException：Hibernate 中的参数类型不匹配

Java 调试万无一失的策略“分叉的 VM 没有说再见就终止了。VM 崩溃或 System.exit 被调用？”

缓冲区下溢异常 Java

如何为 javac 设置 PATH 变量，以便我可以手动编译我的 .java 作品？

相关推荐

最近更新

标签