java 键值存储建议
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6639080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
key-value store suggestion
提问by Kevin
I need a very basic key-value store for java. I started with a HashMap but it seems that HashMap is somewhat space inefficient (I'm storing ~20 million records, and seems to require ~6GB RAM).
我需要一个非常基本的 Java 键值存储。我从 HashMap 开始,但似乎 HashMap 的空间效率有点低(我存储了大约 2000 万条记录,似乎需要大约 6GB RAM)。
The map is Map<Integer,String>
, and so I'm considering using GNU Trove TIntObjectHashMap<byte[]>
, and storing the map value as an ascii byte array rather than String.
地图是Map<Integer,String>
,所以我正在考虑使用 GNU Trove TIntObjectHashMap<byte[]>
,并将地图值存储为 ascii 字节数组而不是字符串。
As an alternative to that, is there a key-value store that only requires adding jar files, does not hold the entire map in RAM at once, and is still reasonably fast?
作为替代方案,是否有一个键值存储只需要添加 jar 文件,不立即将整个地图保存在 RAM 中,并且仍然相当快?
采纳答案by ghayes
Use Berkeley DB.
使用伯克利数据库。
Berkeley DB stores object graphs, objects in collections, or simple binary key/value data directly in an a btree on disk. This simple, highly efficient approach removes all the unnecessary overhead in ORM solutions. Using the Direct Persistence Layer (DPL) Java developers annotate classes with storage information, much like JPA. This approach is familiar, efficient, and fast. The DPL reduces the complexity of data storage while not sacrificing speed.
Berkeley DB 将对象图、集合中的对象或简单的二进制键/值数据直接存储在磁盘上的 btree 中。这种简单、高效的方法消除了 ORM 解决方案中所有不必要的开销。使用直接持久层 (DPL) Java 开发人员使用存储信息对类进行注释,很像 JPA。这种方法熟悉、高效且快速。DPL 在不牺牲速度的同时降低了数据存储的复杂性。
This should definitely give you huge gains in memory and speed, while not increasing the complexity of your application. Enjoy!
这肯定会给您带来内存和速度的巨大收益,同时不会增加应用程序的复杂性。享受!
回答by mxro
BabuDB is an embedded non-relational database system. Its lean and simple design allows it to persistently store large amounts of key-value pairs without the overhead and complexity of similar approaches such as BerkeleyDB.
BabuDB 是一个嵌入式非关系型数据库系统。其精益和简单的设计使其能够持久存储大量键值对,而没有类似方法(如 BerkeleyDB)的开销和复杂性。
License: New BSD license, Language: Java
许可证:新 BSD 许可证,语言:Java
JDBM2 provides HashMap and TreeMap which are backed by disk storage.
JDBM2 提供了由磁盘存储支持的 HashMap 和 TreeMap。
License: Apache License 2.0, Language: Java
许可证:Apache 许可证 2.0,语言:Java
Banana DB is a self-contained key/value pair database implemented in Java.
Banana DB 是一个用 Java 实现的自包含的键/值对数据库。
License: Apache License 2.0, Language: Java
许可证:Apache 许可证 2.0,语言:Java
I've tried BabuDB and JDBM2 and they work fine. BabuDB is a little bit more difficult to set up, but potentially delivers higher performance than JDBM2.
我试过 BabuDB 和 JDBM2,它们工作得很好。BabuDB 设置起来有点困难,但可能比 JDBM2 提供更高的性能。
These all all databases, which allow to persistdata on disk. There are also solutions to hold a large map in memory (ehcache, hazelcast, ...).
这些所有的所有数据库,它允许对坚持在磁盘上的数据。还有一些解决方案可以在内存中保存大地图(ehcache,hazelcast,...)。
回答by thmarx
http://www.mapdb.org/is what you are looking for. It's a rocking fast persistent implementation of java.util.Map.
http://www.mapdb.org/就是你要找的。它是 java.util.Map 的快速持久实现。
Features
特征
Concurrent
同时
MapDB has record level locking and state-of-art concurrent engine. Its performance scales nearly linearly with number of cores. Data can be written by multiple parallel threads.
MapDB 具有记录级锁定和最先进的并发引擎。其性能与内核数量几乎呈线性关系。数据可以由多个并行线程写入。
Fast
快速地
MapDB has outstanding performance rivaled only by native DBs. It is result of more than a decade of optimizations and rewrites.
MapDB 具有仅可与原生 DB 相媲美的出色性能。它是十多年优化和重写的结果。
ACID
酸
MapDB optionally supports ACID transactions with full MVCC isolation. MapDB uses write-ahead-log or append-only store for great write durability.
MapDB 可选择支持具有完全 MVCC 隔离的 ACID 事务。MapDB 使用预写日志或仅追加存储来实现出色的写入持久性。
Flexible
灵活的
MapDB can be used everywhere from in-memory cache to multi-terabyte database. It also has number of options to trade durability for write performance. This makes it very easy to configure MapDB to exactly fit your needs.
MapDB 可用于从内存缓存到多 TB 数据库的任何地方。它还具有许多选项,可以用持久性来换取写入性能。这使得配置 MapDB 以完全满足您的需求变得非常容易。
Hackable
可破解
MapDB is component based, most features (instance cache, async writes, compression) are just class wrappers. It is very easy to introduce new functionality or component into MapDB.
MapDB 是基于组件的,大多数功能(实例缓存、异步写入、压缩)只是类包装器。将新功能或组件引入 MapDB 非常容易。
SQL Like
SQL 喜欢
MapDB was developed as faster alternative to SQL engine. It has number of features which makes transition from relational database easier: secondary indexes/collections, autoincremental sequential ID, joins, triggers, composite keys…
MapDB 被开发为 SQL 引擎的更快替代品。它具有许多使从关系数据库转换更容易的特性:二级索引/集合、自动递增顺序 ID、连接、触发器、组合键……
Low disk-space usage
磁盘空间使用率低
MapDB has number of features (serialization, delta key packing…) to minimize disk used by its store. It also has very fast compression and custom serializers. We take disk-usage seriously and do not waste single byte.
MapDB 具有许多功能(序列化、增量密钥打包……)以最小化其存储使用的磁盘。它还具有非常快的压缩和自定义序列化程序。我们重视磁盘使用,不浪费单个字节。
回答by leventov
Consider Koloboke Collections, which is up to 2 times faster than Trove according to various tests:
考虑Koloboke Collections,根据各种测试,它比 Trove 快 2 倍:
- Time - memory tradeoff with the example of Java Maps
- Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove
if configured to consume the same memory as Trove. Or alternatively, you can think it consumes considerably lesser memory if configured to be equally fast to Trove.
如果配置为消耗与 Trove 相同的内存。或者,如果配置为与 Trove 一样快,您可以认为它消耗的内存要少得多。
If you want to persist the map between JVM runs with very quick bootstrap, you might also be interested in Chronicle-Mapwhich stores String
s in UTF-8 by default (so you shouldn't bother with conversions String
<-> byte[]
as with Koloboke/Trove). Chronicle-Map is ultra fast for persisted key-value store, but a bit slower that Koloboke and even Trove.
如果您想通过非常快速的引导程序在 JVM 运行之间保持映射,您可能还对Chronicle-Map感兴趣,它String
默认将s存储在 UTF-8 中(因此您不应该像 Koloboke/Trove 那样费心进行转换String
<-> byte[]
)。Chronicle-Map 对于持久化键值存储来说是超快的,但比 Koloboke 甚至 Trove 慢一点。
回答by Dieter
Just wanted to reference some more open source options that became available over time since this question was first asked.
只是想参考一些自首次提出此问题以来随着时间的推移变得可用的更多开源选项。
Apache 2, BTree, Apache Directory Project JDBM replacement effort:
Apache 2、BTree、Apache Directory Project JDBC 替换努力:
http://directory.apache.org/mavibot/
http://directory.apache.org/mavibot/
MPL2/EPL1, RTree, MVStore, H2 Storage Engine:
MPL2/EPL1、RTree、MVStore、H2 存储引擎:
http://www.h2database.com/html/mvstore.html
http://www.h2database.com/html/mvstore.html
Apache 2, Xodus Environments, JetBrains YouTrack and Hub storage engine:
Apache 2、Xodus Environments、JetBrains YouTrack 和 Hub 存储引擎:
回答by Stephen C
The map is Map, and so I'm considering using GNU Trove TIntObjectHashMap, and storing the map value as an ascii byte array rather than String.
地图是地图,所以我正在考虑使用 GNU Trove TIntObjectHashMap,并将地图值存储为 ascii 字节数组而不是字符串。
This doesn't entirely make sense because a TIntObjectHashMap
is not a Map
. However, the approach is sound.
这并不完全有意义,因为 aTIntObjectHashMap
不是 a Map
。然而,这种方法是合理的。
Do you know what kind of space savings I can expect over HashMap for Trove?
你知道我可以期望通过 HashMap for Trove 节省什么样的空间吗?
The best answer is to try it out.
最好的答案是尝试一下。
But here are some rough estimates (assuming a 32bit JVM):
但这里有一些粗略的估计(假设 32 位 JVM):
HashMap keys would need to be Integer instances. They will occupy ~18bytes per instance + 4 bytes per reference. Total 24 bytes.
Trove keys would be 4 byte
int
values.String values would be 20 bytes + 12 bytes + 2 * number of "characters".
Byte array values would be 12 bytes + 1 * number of "characters".
I haven't examined the details of the respective hash table internal data structures.
HashMap 键需要是 Integer 实例。它们将占用每个实例约 18 个字节 + 每个引用 4 个字节。总共 24 个字节。
Trove 键是 4 个字节的
int
值。字符串值将是 20 字节 + 12 字节 + 2 * “字符”数。
字节数组值将是 12 个字节 + 1 * “字符”数。
我还没有检查过各自哈希表内部数据结构的细节。
That probably amounts to around 50% memory saving, though it depends critically on the average length of the value "strings". (The longer they are, the more they will dominate the space usage.)
这可能相当于大约 50% 的内存节省,尽管它严重依赖于值“字符串”的平均长度。(它们越长,它们就越能支配空间使用。)
FWIW, Trove publish their own benchmarks here. They don't look very convincing, but you should be able to dig out their benchmark code and modify it to better match your use-case.
FWIW,Trove在这里发布他们自己的基准。它们看起来不太有说服力,但您应该能够挖掘出它们的基准代码并对其进行修改以更好地匹配您的用例。