用于自然语言处理的 Java 或 Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22904025/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 18:48:28  来源:igfitidea点击:

Java or Python for Natural Language Processing

javapythonnlp

提问by Jin Ling

I would like to know which programming language is better for natural language processing. Javaor Python? I have found lots of questions and answers regarding about it. But I am still lost in choosing which one to use.

我想知道哪种编程语言更适合自然语言处理。Java还是Python?我发现了很多关于它的问题和答案。但是我仍然无法选择使用哪一个。

And I want to know which NLP library to use for Java since there are lots of libraries (LingPipe, GATE, OpenNLP, StandfordNLP). For Python, most programmers recommend NLTK.

我想知道 Java 使用哪个 NLP 库,因为有很多库(LingPipe、GATE、OpenNLP、StandfordNLP)。对于 Python,大多数程序员推荐 NLTK。

But if I am to do some text processing or information extraction from unstructured data(just free formed plain English text) to get some useful information, what is the best option? Java or Python? Suitable library?

但是,如果我要从非结构化数据(只是自由格式的纯英文文本)中进行一些文本处理或信息提取以获取一些有用的信息,那么最好的选择是什么?Java 还是 Python?合适的图书馆?

Updated

更新

What I want to do is to extract useful product information from unstructured data (E.g. users make different forms of advertisement about mobiles or laptops with not very standard English language)

我想要做的是从非结构化数据中提取有用的产品信息(例如,用户使用不太标准的英语制作不同形式的关于手机或笔记本电脑的广告)

回答by Nathaniel Payne

The question is very open ended. That said, rather than choose one, below is a comparison depending on the language that you would like to use (since there are good libraries available in both languages).

这个问题是非常开放的。也就是说,与其选择一种,不如根据您要使用的语言进行比较(因为两种语言都有很好的库)。

Python

Python

In terms of Python, the first place you should look at is the Python Natural Language Toolkit. As they note in their description, NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

就 Python 而言,您首先应该查看的是Python Natural Language Toolkit。正如他们在描述中指出的那样,NLTK 是构建 Python 程序以处理人类语言数据的领先平台。它为 50 多个语料库和词汇资源(如 WordNet)提供易于使用的接口,以及一套用于分类、标记化、词干提取、标记、解析和语义推理的文本处理库。

There is also some excellent code that you can look up that originated out of Google's Natural Language Toolkit project that is Python based. You can find a link to that code here on GitHub.

您还可以查找一些源自 Google 的基于 Python 的自然语言工具包项目的优秀代码。您可以在 GitHub 上找到该代码的链接。

Java

爪哇

The first place to look would be Stanford's Natural Language Processing Group. All of software that is distributed there is written in Java. All recent distributions require Oracle Java 6+ or OpenJDK 7+. Distribution packages include components for command-line invocation, jar files, a Java API, and source code.

首先要看的是斯坦福大学的自然语言处理小组。在那里分发的所有软件都是用 Java 编写的。所有最近的发行版都需要 Oracle Java 6+ 或 OpenJDK 7+。分发包包括用于命令行调用的组件、jar 文件、Java API 和源代码。

Another great option that you see in a lot of machine learning environments here (general option), is Weka. Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

您在此处的许多机器学习环境中看到的另一个不错的选择(通用选项)是Weka。Weka 是用于数据挖掘任务的机器学习算法的集合。这些算法可以直接应用于数据集,也可以从您自己的 Java 代码中调用。Weka 包含用于数据预处理、分类、回归、聚类、关联规则和可视化的工具。它也非常适合开发新的机器学习方案。

回答by alvas

Java vs Python for NLP is very much a preference or necessity. Depending on the company/projects you'll need to use one or the other and often there isn't much of a choice unless you're heading a project.

用于 NLP 的 Java 与 Python 是非常偏好或必要的。根据公司/项目的不同,您需要使用一个或另一个,并且通常没有太多选择,除非您正在领导一个项目。

Other than NLTK(www.nltk.org), there are actually other libraries for text processing in python:

除了NLTK(www.nltk.org),实际上还有其他用于文本处理的库python

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search)

(有关更多信息,请参阅https://pypi.python.org/pypi?%3Aaction=search&term=natural+language+processing&submit=search

For Java, there're tonnes of others but here's another list:

对于Java,还有很多其他的,但这是另一个列表:

This is a nice comparison for basic string processing, see http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

这是基本字符串处理的一个很好的比较,请参阅http://nltk.googlecode.com/svn/trunk/doc/howto/nlp-python.html

A useful comparison of GATE vs UIMA vs OpenNLP, see https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

GATE 与 UIMA 与 OpenNLP 的有用比较,请参阅https://www.assembla.com/spaces/extraction-of-cost-data/wiki/Gate-vs-UIMA-vs-OpenNLP?version=4

If you're uncertain, which is the language to go for NLP, personally i say, "any language that will give you the desired analysis/output", see Which language or tools to learn for natural language processing?

如果您不确定 NLP 应该使用哪种语言,我个人说,“任何可以为您提供所需分析/输出的语言”,请参阅为自然语言处理学习哪种语言或工具?

Here's a pretty recent (2017) of NLP tools: https://github.com/alvations/awesome-community-curated-nlp

这是最近(2017 年)的 NLP 工具:https: //github.com/alvations/awesome-community-curated-nlp

An older list of NLP tools (2013): http://web.archive.org/web/20130703190201/http://yauhenklimovich.wordpress.com/2013/05/20/tools-nlp

NLP 工具的旧列表(2013):http://web.archive.org/web/20130703190201/http: //yauhenklimovich.wordpress.com/2013/05/20/tools-nlp



Other than language processing tools, you would very much need machine learningtools to incorporate into NLPpipelines.

除了语言处理工具之外,您还非常需要machine learning工具来合并到NLP管道中。

There's a whole range in Pythonand Java, and once again it's up to preference and whether the libraries are user-friendly enough:

Pythonand 中有一个完整的范围,Java再次取决于偏好以及库是否足够用户友好:

Machine Learning libraries in python:

python中的机器学习库:

(for more, see https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search)

(有关更多信息,请参阅https://pypi.python.org/pypi?%3Aaction=search&term=machine+learning&submit=search



With the recent (2015) deep learning tsunami in NLP, possibly you could consider: https://en.wikipedia.org/wiki/Comparison_of_deep_learning_software

随着最近(2015 年)NLP 中的深度学习海啸,您可能可以考虑:https: //en.wikipedia.org/wiki/Comparison_of_deep_learning_software

I'll avoid listing deep learning tools out of non-favoritism / neutrality.

我将避免出于非偏袒/中立的目的列出深度学习工具。



Other Stackoverflow questions that also asked for NLP/ML tools:

其他也询问 NLP/ML 工具的 Stackoverflow 问题: