Java 中的 Wordnet 相似性:JAWS、JWNL 或 Java WN::Similarity?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5976537/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 13:48:08  来源:igfitidea点击:

Wordnet Similarity in Java: JAWS, JWNL or Java WN::Similarity?

javasimilaritywordnetjaws-wordnet

提问by Mulone

I need to use Wordnet in a java-based app. I want to:

我需要在基于 Java 的应用程序中使用 Wordnet。我想要:

  • search synsets

  • find similarity/relatedness between synsets

  • 搜索同义词集

  • 查找同义词集之间的相似性/相关性

My app uses RDF graphs and I know there are SPARQL endpoints with Wordnet, but I guess it's better to have a local copy of the dataset, as it's not too big.

我的应用程序使用 RDF 图,我知道 Wordnet 有 SPARQL 端点,但我想最好有数据集的本地副本,因为它不太大。

I've found the following jars:

我发现了以下罐子:

What would you recommend for my app?

你会为我的应用推荐什么?

Is it possible to use a Perl library from a java app via some bindings?

是否可以通过某些绑定使用来自 Java 应用程序的 Perl 库?

Thanks! Mulone

谢谢!穆隆

回答by Nate Glenn

I use JAWS for normal wordnet stuff because it's easy to use. For similarity metrics, though, I use the library located here. You'll also need to download thisfolder, containing pre-processed WordNet and corpus data, for it to work. The code can be used like this, assuming you placed that folder in another called "lib" in your project folder:

我将 JAWS 用于普通的 wordnet 内容,因为它易于使用。不过,对于相似性度量,我使用位于此处的库。您还需要下载文件夹,其中包含预处理的 WordNet 和语料库数据,以使其正常工作。代码可以这样使用,假设您将该文件夹放在项目文件夹中另一个名为“lib”的文件夹中:

JWS ws = new JWS("./lib", "3.0");
Resnik res = ws.getResnik();
TreeMap<String, Double> scores1 = res.res(word1, word2, partOfSpeech);
for(Entry<String, Double> e: scores1.entrySet())
    System.out.println(e.getKey() + "\t" + e.getValue());
System.out.println("\nhighest score\t=\t" + res.max(word1, word2, partOfSpeech) + "\n\n\n");

This will print something like the following, showing the similarity score between each possible combination of synsets represented by the words to be compared:

这将打印如下内容,显示由要比较的单词表示的每个可能的同义词组合之间的相似度得分:

hobby#n#1,gardening#n#1 2.6043996588901104
hobby#n#2,gardening#n#1 -0.0
hobby#n#3,gardening#n#1 -0.0
highest score   =   2.6043996588901104

There are also methods that allow you to specify which sense of either/both words: res(String word1, int senseNum1, String word2, partOfSpeech), etc. Unfortunately, the source documentation is not JavaDoc, so you'll need to inspect it manually. The source can be downloaded here.

还有一些方法允许您指定其中一个/两个词的含义:res(String word1, int senseNum1, String word2, partOfSpeech)等。不幸的是,源文档不是 JavaDoc,因此您需要手动检查它。源可以在这里下载。

The available algorithms are:

可用的算法有:

JWSRandom(ws.getDictionary(), true, 16.0);//random number for baseline
Resnik res = ws.getResnik();
LeacockAndChodorowlch = ws.getLeacockAndChodorow();
AdaptedLesk adLesk = ws.getAdaptedLesk();
AdaptedLeskTanimoto alt = ws.getAdaptedLeskTanimoto();
AdaptedLeskTanimotoNoHyponyms altnh = ws.getAdaptedLeskTanimotoNoHyponyms();
HirstAndStOnge hso = ws.getHirstAndStOnge();
JiangAndConrath jcn = ws.getJiangAndConrath();
Lin lin = ws.getLin();
WuAndPalmer wup = ws.getWuAndPalmer();

Also, it requires you to have the jar file for MIT's JWI

此外,它还要求您拥有麻省理工学院JWI的 jar 文件

回答by yashodhan katte

There is function in JAWS to find similar wordForms Here are details:

JAWS 有查找相似wordForms 的功能这里是详细信息:

public AdjectiveSynset[] getSimilar() throws WordNetException and here is link that you can check out: http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/AdjectiveSynset.htmlthis link it contails details that you can use.

public AdjectiveSynset[] getSimilar() 抛出 WordNetException,这里是您可以查看的 链接:http://lyle.smu.edu/~tspell/jaws/doc/edu/smu/tspell/wordnet/AdjectiveSynset.html这个链接包含您可以使用的详细信息。

回答by MrDrews

I am not sure if either JAWS or JWNL provide methods to calculate similarity between synsets, but I have tried both for searching synsets and I've found JAWS easier to use. Specifically, the simple:

我不确定 JAWS 或 JWNL 是否提供了计算同义词集之间相似性的方法,但我已经尝试了这两种方法来搜索同义词集,并且发现 JAWS 更易于使用。具体来说,简单的:

    // Specifying the Database Directory
    System.setProperty("wordnet.database.dir", "C:/WordNet/2.1/dict/");

was easier for me to understand than JWNL's file_properties.xml requirement.

我比 JWNL 的 file_properties.xml 要求更容易理解。