Java 的命名实体识别库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/188176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 11:05:12  来源:igfitidea点击:

Named Entity Recognition Libraries for Java

javanlpnamed-entity-recognition

提问by webclimber

I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates

我正在寻找一个简单但“足够好”的 Java 命名实体识别库(和字典),我正在寻找处理电子邮件和文档并提取一些“基本信息”,例如:姓名、地点、地址和日期

I've been looking around, and most seems to be on the heavy side and full NLP kind of projects.

我一直在环顾四周,大多数似乎都是沉重的和完整的 NLP 类型的项目。

Any recommendations ?

有什么建议吗?

采纳答案by webclimber

BTW, I recently ran across OpenCalaiswhich seems to havethe functionality I was looking after.

顺便说一句,我最近遇到了OpenCalais,它似乎具有我正在寻找的功能。

回答by Aleksandar Dimitrov

You might want to have a look at one of my earlier answersto a similar problem.

您可能想看看我之前对类似问题的回答之一。

Other than that, most lighter NER systems depend a lot on the domain used. You will find a whole lot of tools and papers about biomedical NER systems, for example. In addition to my previous post (which already contains my main recommendation if you want to do NER), here are some more tools you might want to look into:

除此之外,大多数较轻的 NER 系统在很大程度上取决于所使用的域。例如,您会找到大量关于生物医学 NER 系统的工具和论文。除了我之前的帖子(如果你想做 NER,它已经包含了我的主要建议),这里还有一些你可能想要研究的工具:

  • The Stanford CER-NER
  • The Postech Biomedical NER Systemif you are interested in this particular domain
  • OpenCalaisseems to be a commercial system. There are UIMA wrappers for OpenCalaisbut they seem dated. There is also a dictionary based Context-Mapper annotator for UIMA that may help you out. Be aware that UIMA implies significant overhead in learning curve ;-)
  • OpenNLPalso have an NER tool.
  • Baliedoes NER, too, among other things.
  • ABNERdoes NER, but again its focused on the biomedical domain.
  • The JULIE Lab Toolsfrom the university of Jena, Germany also do NER. They have standalone versions and UIMA analysis engines.

One additional remark: you won't get away without tokenization on the input. Tokenization of natural language is slightly non-trivial, that's why I suggest you use a toolbox that does both for you.

附加说明:如果不对输入进行标记化,您将无法逃脱。自然语言的标记化有点重要,这就是为什么我建议您使用一个工具箱来为您做这两个。

回答by Arun R

You might want to try Alchemy APIas well. Its similar to Open Calais.

您可能还想尝试Alchemy API。它类似于Open Calais。

回答by yura