Java 文本分析库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3778388/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 03:20:46  来源:igfitidea点击:

Java text analysis libraries

javatextanalysistext-analysis

提问by jaseFace

I'm looking for a java driven solution to a requirement for analysing sentences to log whether a key word was used positively or negatively.

我正在寻找一种 Java 驱动的解决方案,以满足分析句子以记录关键字是正面使用还是负面使用的要求。

Ie The key word might be 'cabbages' and the sentence:-

即关键字可能是“卷心菜”和句子:-

'I like cabbages but not peas'

“我喜欢卷心菜,但不喜欢豌豆”

And I'd like a java text analyser of some kind to log this as positive. Can the lucene (Hibernate-Search) libraries be utilized to for this?

我想要某种 Java 文本分析器将其记录为正面。lucene (Hibernate-Search) 库可以用于此目的吗?

Any thoughts?

有什么想法吗?

回答by ishnid

You're looking for "sentiment analysis". One possibility is LingPipe, who kindly link to their competitors also. Jeff Dalton also has a great list of natural language processing tools in his blog.

您正在寻找“情绪分析”。一种可能性是LingPipe,他们也善意地链接到他们的竞争对手。Jeff Dalton 在他的博客中还列出了大量自然语言处理工具。

回答by Michael Borgwardt

I doubt there's anything like that. Lucene definitely can't do it out of the box.

我怀疑有没有这样的事情。Lucene 绝对不能开箱即用。

How do you even define"whether a key word was used positively or negatively" in a way that can be evaluated programmatically? To do it properly, you'd have to analyse the text for their actual meaning, which is an AI problem that is not even remotely solved.

您甚至如何以一种可以通过编程方式评估的方式来定义“一个关键字是被正面使用还是负面使用”?要正确地做到这一点,您必须分析文本的实际含义,这是一个甚至无法远程解决的 AI 问题。

I suppose you could solve it approximately by just doing a statistical analysis of whether the keyword appears more often close to positive (like, good, great, wonderful) or negative (bad, hate, crappy, damn) keywords, but even there, negations, sarcasm and complex sentence structures will be problematic.

我想您可以通过对关键字是否更常出现在正面(例如,好,很棒,很棒)或负面(坏,讨厌,蹩脚,该死)关键字进行统计分析来解决它,但即使在那里,否定、讽刺和复杂的句子结构都会有问题。

回答by Barend

Take a look at Mahout Taste, which builds on Lucene but adds a lot of what you need out of the box. (edit) I should add, Mahout Taste is merely relatedto what you're looking for and not a 100% match.

看看Mahout Taste,它建立在 Lucene 的基础上,但添加了很多开箱即用的东西。(编辑)我应该补充一点,Mahout Taste 仅您要查找的内容有关,而不是 100% 匹配。