Node.js 有没有好的索引/搜索引擎?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16625104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there a good indexing / search engine for Node.js?
提问by Venemo
I'm looking for a good open source(with LGPL or a permissive license) indexing engine for a node.js application, something like Lucene. I'm looking for in-process indexing and search and am not interested in indexing servers like Sphinx or Solr.
我正在为node.js 应用程序寻找一个好的开源(带有 LGPL 或许可许可)索引引擎,比如 Lucene。我正在寻找进程内索引和搜索,但对像 Sphinx 或 Solr 这样的索引服务器不感兴趣。
I am not afraid to create bindings for a C/C++ library either so I'm open to those kind of suggestions as well.
我也不怕为 C/C++ 库创建绑定,所以我也愿意接受这些建议。
So far I've found
到目前为止我发现
- node-clucenewhich doesn't seem to be actively maintained anymore (and has several open issues)
- I could create my own binding for CLucenebut it seems to be quite sparsely maintained and its current version is also quite behind the Java Lucene
- Apache Lucywhich seems to be designed for the purpose of creating bindings for dynamic languages, but so far they don't have node bindings (nor a C API) and I haven't found any docs about creating bindings. I also didn't find any benchmarks about its performance.
- node-searchwhich seems to be abandoned
- jsiiwhich seems to be still a prototype and is also abandoned
- fullproofwhich is only intended to run in a web broswer
- lunr.jswhich seems to only allow serializing the whole index, so isn't scalable
- node-clucene似乎不再积极维护(并且有几个未解决的问题)
- 我可以为CLucene创建自己的绑定,但它似乎很少维护,而且它的当前版本也远远落后于 Java Lucene
- Apache Lucy似乎旨在为动态语言创建绑定,但到目前为止它们没有节点绑定(也没有 C API),我还没有找到任何关于创建绑定的文档。我也没有找到有关其性能的任何基准。
- 似乎被放弃的节点搜索
- jsii貌似还是原型,也被废弃了
- fullproof仅用于在网络浏览器中运行
- lunr.js似乎只允许序列化整个索引,因此不可扩展
I could "roll my own", but I'd prefer to use an already existing solution.
我可以“自己动手”,但我更愿意使用现有的解决方案。
EDIT: Why I'm not interested in a standalone index server:I use a fast in-process key-value store database, so it'd be quite a waste having to go out of process for querying.
编辑:为什么我对独立的索引服务器不感兴趣:我使用一个快速的进程内键值存储数据库,所以不得不退出查询进程是非常浪费的。
回答by Fergie
回答by Fergie
Yes, check out the newly released Norch
是的,看看新发布的Norch
Norch is based on the search-indexmodule for node.js, which is in turn based on Google's powerful levelDB index.
Norch 基于node.js的search-index模块,该模块又基于 Google 强大的 levelDB 索引。
EDIT: Use the search-index modulefor fast "in-process" search capability.
编辑:使用search-index 模块实现快速的“进程内”搜索功能。
回答by Matt Sergeant
Can you explain why you're not interested in using an external index? For full text search I always revert to using PostgreSQL's full text indexing capabilities - it's very fast, indexing doesn't require a full-index-update (like Solr does), and results are returned faster than Lucene based solutions (such as Elastic Search).
你能解释一下为什么你对使用外部索引不感兴趣吗?对于全文搜索,我总是恢复使用 PostgreSQL 的全文索引功能 - 它非常快,索引不需要完整索引更新(就像 Solr 那样),并且结果返回的速度比基于 Lucene 的解决方案(例如 Elastic Search)快)。
But if you really want to do it in-process, you probably want to look at Lunr: http://lunrjs.com/- it does work in Node, not just in the browser.
但是,如果您真的想在进程内执行此操作,您可能需要查看 Lunr:http://lunrjs.com/ - 它确实适用于 Node,而不仅仅是在浏览器中。
Edit: Here's where I got my stats on Postgres being faster than Lucene: http://fr.slideshare.net/billkarwin/full-text-search-in-postgresql- see Slide 49.
编辑:这是我在 Postgres 上的统计数据比 Lucene 更快的地方:http: //fr.slideshare.net/billkarwin/full-text-search-in-postgresql- 见幻灯片 49。
Edit: Not sure what kind of speed you're looking at for in/out of process, but our PostgreSQL database can do 100k queries per second without breaking a sweat, and it's not even on SSDs. Perhaps you're over-thinking your performance needs - after all once you need to go to multiple nodes (or using cluster to take advantage of all CPUs) you will need to dump in-process anyway.
编辑:不确定您在进程内/进程外寻找什么样的速度,但我们的 PostgreSQL 数据库可以每秒执行 100k 次查询而不会出汗,而且它甚至不在 SSD 上。也许您过度考虑了您的性能需求 - 毕竟,一旦您需要转到多个节点(或使用集群来利用所有 CPU),您无论如何都需要在进程中转储。
回答by Frank Roth
Full Text Search Light, is a pure in JS written node module for doing full text searches. Here you can find the current git repository link: https://github.com/frankred/node-full-text-search-light
全文搜索灯,是一个纯 JS 编写的节点模块,用于进行全文搜索。在这里您可以找到当前的 git 存储库链接:https: //github.com/frankred/node-full-text-search-light

