推荐一个快速且可扩展的持久化 Map - Java
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1536953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Recommend a fast & scalable persistent Map - Java
提问by Joel
I need a disk backed Map structure to use in a Java app. It must have the following criteria:
我需要一个磁盘支持的 Map 结构才能在 Java 应用程序中使用。它必须具有以下条件:
- Capable of storing millions of records (even billions)
- Fast lookup - the majority of operations on the Map will simply to see if a key already exists. This, and 1 above are the most important criteria. There should be an effective in memory caching mechanism for frequently used keys.
- Persistent, but does not need to be transactional, can live with some failure. i.e. happy to synch with disk periodically, and does not need to be transactional.
- Capable of storing simple primitive types - but I don't need to store serialised objects.
- It does not need to be distributed, i.e. will run all on one machine.
- Simple to set up & free to use.
- No relational queries required
- 能够存储数百万条记录(甚至数十亿条)
- 快速查找 - Map 上的大多数操作只是查看一个键是否已经存在。这和上面的 1 是最重要的标准。对于经常使用的键,应该有一个有效的内存缓存机制。
- 持久,但不需要是事务性的,可以忍受一些失败。即乐于定期与磁盘同步,并且不需要是事务性的。
- 能够存储简单的原始类型 - 但我不需要存储序列化的对象。
- 它不需要分布式,即会在一台机器上运行。
- 设置简单且免费使用。
- 不需要关系查询
Records keys will be strings or longs. As described above reads will be much more frequent than writes, and the majority of reads will simply be to check if a key exists (i.e. will not need to read the keys associated data). Each record will be updated once only and records are not deleted.
记录键将是字符串或长整型。如上所述,读取将比写入频繁得多,并且大多数读取只是为了检查密钥是否存在(即不需要读取与密钥相关的数据)。每条记录只会更新一次,不会删除记录。
I currently use Bdb JE but am seeking other options.
我目前使用 Bdb JE,但正在寻找其他选择。
Update
更新
Have since improved query performance on my existing BDB setup by reducing the dependency on secondary keys. Some queries required a join on two secondary keys and by combining them into a composite key I removed a level of indirection in the lookup which speeds things up nicely.
通过减少对辅助键的依赖,提高了我现有 BDB 设置的查询性能。一些查询需要连接两个辅助键,并通过将它们组合成一个复合键,我在查找中删除了一个间接级别,这可以很好地加快速度。
采纳答案by Michael Lloyd Lee mlk
I'd likely use a local database. Like say Bdb JEor HSQLDB. May I ask what is wrong with this approach? You must have some reason to be looking for alternatives.
我可能会使用本地数据库。就像说Bdb JE或HSQLDB。请问这种方法有什么问题?你一定有理由去寻找替代品。
In response to comments: As the problem performance and I guess you are already using JDBC to handle this it might be worth trying HSQLB and reading the chapter on Memory and Disk Use.
回应评论:由于问题性能,我猜你已经在使用 JDBC 来处理这个问题,可能值得尝试 HSQLB 并阅读有关Memory and Disk Use的章节。
回答by Boris Pavlovi?
I think Hibernate Shardsmay easily fulfill all your requirements.
我认为Hibernate Shards可以轻松满足您的所有要求。
回答by David Crawshaw
SQLite does this. I wrote a wrapper for using it from Java: http://zentus.com/sqlitejdbc
SQLite 就是这样做的。我写了一个从 Java 使用它的包装器:http: //zentus.com/sqlitejdbc
As I mentioned in a comment, I have successfully used SQLite with gigabytes of data and tables of hundreds of millions of rows. If you think out the indexing properly, it's very fast.
正如我在评论中提到的,我已经成功地使用 SQLite 处理数千兆字节的数据和数亿行的表格。如果您正确考虑索引,它会非常快。
The only pain is the JDBC interface. Compared to a simple HashMap, it is clunky. I often end up writing a JDBC-wrapper for the specific project, which can add up to a lot of boilerplate code.
唯一的痛点是 JDBC 接口。与简单的 HashMap 相比,它是笨重的。我经常最终为特定项目编写 JDBC 包装器,这可能会增加很多样板代码。
回答by james
JBoss (tree) Cacheis a great option. You can use it standalone from JBoss. Very robust, performant, and flexible.
JBoss(树)缓存是一个不错的选择。您可以从 JBoss 独立使用它。非常健壮、高性能和灵活。
回答by Joel
I've found Tokyo Cabinetto be a simple persistent Hash/Map, and fast to set-up and use.
我发现Tokyo Cabinet是一个简单的持久哈希/映射,并且可以快速设置和使用。
This abbreviated example, taken from the docs, shows how simple it is to save and retrieve data from a persistent map:
这个取自docs 的简短示例显示了从持久映射中保存和检索数据是多么简单:
// create the object
HDB hdb = new HDB();
// open the database
hdb.open("casket.tch", HDB.OWRITER | HDB.OCREAT);
// add item
hdb.put("foo", "hop");
hdb.close();
回答by Andrejs
JDBM3does exactly what you are looking for. It is a library of disk backed maps with really simple API and high performance.
JDBM3完全符合您的要求。它是一个磁盘支持的映射库,具有非常简单的 API 和高性能。
UPDATE
更新
This project has now evolved into MapDB http://www.mapdb.org
这个项目现在已经演变成 MapDB http://www.mapdb.org
回答by Harvinder Singh
You can try Java Chronicles from http://openhft.net/products/chronicle-map/Chronicle Map is a high performance, off-heap, key-value, in memory, persisted data store. It works like a standard java map
您可以从http://openhft.net/products/chronicle-map/尝试 Java Chronicles/ Chronicle Map 是一种高性能、堆外、键值、内存、持久化数据存储。它像标准的 java 地图一样工作
回答by KIC
As of today I would either use MapDB(file based/backed sync or async) or Hazelcast. On the later you will have to implement you own persistency i.e. backed by a RDBMS by implementing a Java interface. OpenHFTchronicle might be an other option. I am not sure how persistency works there since I never used it, but the claim to have one. OpenHFT is completely off heap and allows partial updates of objects (of primitives) without (de-)serialization, which might be a performance benefit.
截至今天,我将使用MapDB(基于文件/支持的同步或异步)或Hazelcast。稍后,您将必须实现自己的持久性,即通过实现 Java 接口由 RDBMS 支持。OpenHFT编年史可能是另一种选择。我不确定持久性在那里是如何工作的,因为我从未使用过它,但声称拥有它。OpenHFT 完全不在堆上,允许在没有(反)序列化的情况下部分更新(原语的)对象,这可能会带来性能优势。
NOTE: If you need your map disk based because of memory issues the easiest option is MapDB. Hazelcast could be used as a cache (distributed or not) which allows you to evict elements from heap after time or size. OpenHFT is off heap and could be considered if you only need persistency for jvm restarts.
注意:如果由于内存问题需要基于地图磁盘,最简单的选择是 MapDB。Hazelcast 可以用作缓存(分布式或非分布式),它允许您根据时间或大小从堆中逐出元素。OpenHFT 在堆外,如果您只需要 jvm 重启的持久性,可以考虑。