Python 的 NLTK 与相关的 Java 库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5589593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 11:48:14  来源:igfitidea点击:

Python's NLTK vs. related Java Libraries?

javapythoninformation-retrievalnltkwordnet

提问by wnewport

I've used LingPipe, Stanford's NER, RiTa and various sentence similarity libraries for my previous Java projects that focused on text (pre)processing (indexing, xml tagging, topic detection, etc.) of large amounts of English text (around 10,000 documents summing to > 1gb of text). Maybe I'm a bad Java programmer, but I find myself typing a lot of code and using a lot of libraries when I switch to a different corpus. Overall, I feel like there might be a better tool for the job.

我在之前的 Java 项目中使用了 LingPipe、Stanford 的 NER、RiTa 和各种句子相似性库,这些项目侧重于大量英文文本(大约 10,000 个文档)的文本(预)处理(索引、xml 标记、主题检测等)总和为 > 1gb 的文本)。也许我是一个糟糕的 Java 程序员,但是当我切换到不同的语料库时,我发现自己输入了很多代码并使用了很多库。总的来说,我觉得可能有更好的工具来完成这项工作。

I guess my question is, will I benefit from switching to Python and NLTK for information retrieval / language processing? Or are there enough pros and cons to make it very subjective? Is NLTK intuitive enough to be learned quickly?

我想我的问题是,我会从切换到 Python 和 NLTK 进行信息检索/语言处理中受益吗?或者是否有足够的利弊使它变得非常主观?NLTK 是否足够直观,可以快速学习?

I'd get my hands dirty, but I won't have access to a personal machine for the next few days.

我会弄脏我的手,但在接下来的几天里我将无法使用个人机器。

回答by lamwaiman1988

NLTKis good for natural language processing. I've used it for my data-mining project. You can train your own analyzer. The learning curve is not steep.

NLTK适用于自然语言处理。我已经将它用于我的数据挖掘项目。您可以训练自己的分析器。学习曲线并不陡峭。

NLTK got huge corpus for training of your analyzer. You can also provide your own set of data, for example, a journal which a part-of-speech tagged.

NLTK 拥有庞大的语料库来训练您的分析器。您还可以提供自己的数据集,例如,带有词性标记的日志。

Because python is very good for text processing, you may to give it a try. Plus, it got a online tutorial

因为python非常适合文本处理,你可以试试看。另外,它有一个在线教程

Please don't forget to use python 2.x version. Try python 2.6. NLTK may not be good with python 3.x

请不要忘记使用 python 2.x 版本。试试python 2.6。NLTK 可能不适用于 python 3.x

回答by Jacob

If you already understand the basics of NLP, I think NLTK should be pretty easy to pick up. It's got a bunch of documentation, 2 books, and I've written a number of articles & tutorials on streamhacker.com. And if there's anything from the Java packages you don't want to lose, you could theoretically combine it with NLTK using Jython (and perhaps execnet).

如果您已经了解 NLP 的基础知识,我认为 NLTK 应该很容易上手。它有一堆文档、两本书,我在streamhacker.com上写了许多文章和教程。如果您不想丢失 Java 包中的任何内容,理论上您可以使用 Jython(可能还有execnet)将它与 NLTK 结合起来。

You also may want to take a look at the Patternlibrary.

您可能还想查看Pattern库。