Node.js 有没有好的索引/搜索引擎?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16625104/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 14:32:00  来源:igfitidea点击:

Is there a good indexing / search engine for Node.js?

javascriptnode.jsluceneindexingsearch-engine

提问by Venemo

I'm looking for a good open source(with LGPL or a permissive license) indexing engine for a node.js application, something like Lucene. I'm looking for in-process indexing and search and am not interested in indexing servers like Sphinx or Solr.

我正在为node.js 应用程序寻找一个好的开源(带有 LGPL 或许可许可)索引引擎,比如 Lucene。我正在寻找进程内索引和搜索,但对像 Sphinx 或 Solr 这样的索引服务器不感兴趣。

I am not afraid to create bindings for a C/C++ library either so I'm open to those kind of suggestions as well.

我也不怕为 C/C++ 库创建绑定,所以我也愿意接受这些建议。

So far I've found

到目前为止我发现

  • node-clucenewhich doesn't seem to be actively maintained anymore (and has several open issues)
  • I could create my own binding for CLucenebut it seems to be quite sparsely maintained and its current version is also quite behind the Java Lucene
  • Apache Lucywhich seems to be designed for the purpose of creating bindings for dynamic languages, but so far they don't have node bindings (nor a C API) and I haven't found any docs about creating bindings. I also didn't find any benchmarks about its performance.
  • node-searchwhich seems to be abandoned
  • jsiiwhich seems to be still a prototype and is also abandoned
  • fullproofwhich is only intended to run in a web broswer
  • lunr.jswhich seems to only allow serializing the whole index, so isn't scalable
  • node-clucene似乎不再积极维护(并且有几个未解决的问题)
  • 我可以为CLucene创建自己的绑定,但它似乎很少维护,而且它的当前版本也远远落后于 Java Lucene
  • Apache Lucy似乎旨在为动态语言创建绑定,但到目前为止它们没有节点绑定(也没有 C API),我还没有找到任何关于创建绑定的文档。我也没有找到有关其性能的任何基准。
  • 似乎被放弃的节点搜索
  • jsii貌似还是原型,也被废弃了
  • fullproof仅用于在网络浏览器中运行
  • lunr.js似乎只允许序列化整个索引,因此不可扩展

I could "roll my own", but I'd prefer to use an already existing solution.

我可以“自己动手”,但我更愿意使用现有的解决方案。

EDIT: Why I'm not interested in a standalone index server:I use a fast in-process key-value store database, so it'd be quite a waste having to go out of process for querying.

编辑:为什么我对独立的索引服务器不感兴趣:我使用一个快速的进程内键值存储数据库,所以不得不退出查询进程是非常浪费的。

回答by Fergie

Just an update to my earlier answer - since there was so much discussion I didn't want this update to get lost.

只是对我之前的回答的更新 - 由于讨论太多,我不希望此更新丢失。

You can download ithere:

你可以在这里下载

回答by Fergie

Yes, check out the newly released Norch

是的,看看新发布的Norch

Norch is based on the search-indexmodule for node.js, which is in turn based on Google's powerful levelDB index.

Norch 基于node.js的search-index模块,该模块又基于 Google 强大的 levelDB 索引。

EDIT: Use the search-index modulefor fast "in-process" search capability.

编辑:使用search-index 模块实现快速的“进程内”搜索功能。

回答by Matt Sergeant

Can you explain why you're not interested in using an external index? For full text search I always revert to using PostgreSQL's full text indexing capabilities - it's very fast, indexing doesn't require a full-index-update (like Solr does), and results are returned faster than Lucene based solutions (such as Elastic Search).

你能解释一下为什么你对使用外部索引不感兴趣吗?对于全文搜索,我总是恢复使用 PostgreSQL 的全文索引功能 - 它非常快,索引不需要完整索引更新(就像 Solr 那样),并且结果返回的速度比基于 Lucene 的解决方案(例如 Elastic Search)快)。

But if you really want to do it in-process, you probably want to look at Lunr: http://lunrjs.com/- it does work in Node, not just in the browser.

但是,如果您真的想在进程内执行此操作,您可能需要查看 Lunr:http://lunrjs.com/ - 它确实适用于 Node,而不仅仅是在浏览器中。

Edit: Here's where I got my stats on Postgres being faster than Lucene: http://fr.slideshare.net/billkarwin/full-text-search-in-postgresql- see Slide 49.

编辑:这是我在 Postgres 上的统计数据比 Lucene 更快的地方:http: //fr.slideshare.net/billkarwin/full-text-search-in-postgresql- 见幻灯片 49。

Edit: Not sure what kind of speed you're looking at for in/out of process, but our PostgreSQL database can do 100k queries per second without breaking a sweat, and it's not even on SSDs. Perhaps you're over-thinking your performance needs - after all once you need to go to multiple nodes (or using cluster to take advantage of all CPUs) you will need to dump in-process anyway.

编辑:不确定您在进程内/进程外寻找什么样的速度,但我们的 PostgreSQL 数据库可以每秒执行 100k 次查询而不会出汗,而且它甚至不在 SSD 上。也许您过度考虑了您的性能需求 - 毕竟,一旦您需要转到多个节点(或使用集群来利用所有 CPU),您无论如何都需要在进程中转储。

回答by Frank Roth

Full Text Search Light, is a pure in JS written node module for doing full text searches. Here you can find the current git repository link: https://github.com/frankred/node-full-text-search-light

全文搜索灯,是一个纯 JS 编写的节点模块,用于进行全文搜索。在这里您可以找到当前的 git 存储库链接:https: //github.com/frankred/node-full-text-search-light