Java 的命名实体识别库
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/188176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Named Entity Recognition Libraries for Java
提问by webclimber
I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates
我正在寻找一个简单但“足够好”的 Java 命名实体识别库(和字典),我正在寻找处理电子邮件和文档并提取一些“基本信息”,例如:姓名、地点、地址和日期
I've been looking around, and most seems to be on the heavy side and full NLP kind of projects.
我一直在环顾四周,大多数似乎都是沉重的和完整的 NLP 类型的项目。
Any recommendations ?
有什么建议吗?
采纳答案by webclimber
BTW, I recently ran across OpenCalaiswhich seems to havethe functionality I was looking after.
顺便说一句,我最近遇到了OpenCalais,它似乎具有我正在寻找的功能。
回答by Aleksandar Dimitrov
You might want to have a look at one of my earlier answersto a similar problem.
Other than that, most lighter NER systems depend a lot on the domain used. You will find a whole lot of tools and papers about biomedical NER systems, for example. In addition to my previous post (which already contains my main recommendation if you want to do NER), here are some more tools you might want to look into:
除此之外,大多数较轻的 NER 系统在很大程度上取决于所使用的域。例如,您会找到大量关于生物医学 NER 系统的工具和论文。除了我之前的帖子(如果你想做 NER,它已经包含了我的主要建议),这里还有一些你可能想要研究的工具:
- The Stanford CER-NER
- The Postech Biomedical NER Systemif you are interested in this particular domain
- OpenCalaisseems to be a commercial system. There are UIMA wrappers for OpenCalaisbut they seem dated. There is also a dictionary based Context-Mapper annotator for UIMA that may help you out. Be aware that UIMA implies significant overhead in learning curve ;-)
- OpenNLPalso have an NER tool.
- Baliedoes NER, too, among other things.
- ABNERdoes NER, but again its focused on the biomedical domain.
- The JULIE Lab Toolsfrom the university of Jena, Germany also do NER. They have standalone versions and UIMA analysis engines.
- 在斯坦福CER-NER
- 在浦项工科大学生物医学命名实体识别系统,如果你有兴趣在这个特殊的领域
- OpenCalais似乎是一个商业系统。OpenCalais有UIMA 包装器,但它们似乎过时了。还有一个用于 UIMA 的基于字典的 Context-Mapper 注释器可以帮助您。请注意,UIMA 意味着学习曲线上的大量开销;-)
- OpenNLP也有一个 NER 工具。
- Balie也做 NER,等等。
- ABNER做 NER,但同样专注于生物医学领域。
- 德国耶拿大学的JULIE Lab Tools也做 NER。他们有独立版本和 UIMA 分析引擎。
One additional remark: you won't get away without tokenization on the input. Tokenization of natural language is slightly non-trivial, that's why I suggest you use a toolbox that does both for you.
附加说明:如果不对输入进行标记化,您将无法逃脱。自然语言的标记化有点重要,这就是为什么我建议您使用一个工具箱来为您做这两个。
回答by Arun R
You might want to try Alchemy APIas well. Its similar to Open Calais.
您可能还想尝试Alchemy API。它类似于Open Calais。
回答by yura
For NLP grammar you can check http://code.google.com/p/graph-expression/and http://gate.ac.uk/
对于 NLP 语法,您可以查看http://code.google.com/p/graph-expression/和http://gate.ac.uk/