MySQL B 树与哈希表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7306316/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
B-Tree vs Hash Table
提问by JohnJohnGa
In MySQL, an index type is a b-tree, and access an element in a b-tree is in logarithmic amortized time O(log(n))
.
在 MySQL 中,索引类型是 b-tree,访问 b-tree 中的元素是在对数分摊时间O(log(n))
。
On the other hand, accessing an element in a hash table is in O(1)
.
另一方面,访问哈希表中的元素在O(1)
.
Why is a hash table not used instead of a b-tree in order to access data inside a database?
为什么不使用哈希表而不是 b 树来访问数据库中的数据?
回答by The Surrican
You can only access elements by their primary key in a hashtable.
This is faster than with a tree algorithm (O(1)
instead of log(n)
), but you cannot select ranges (everything in between x
and y
).
Tree algorithms support this in Log(n)
whereas hash indexes can result in a full table scan O(n)
.
Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists).
Also tree algorithms are usually easier to maintain, grow with data, scale, etc.
您只能通过哈希表中的主键访问元素。这比使用树算法(O(1)
而不是log(n)
)快,但您不能选择范围(介于x
和之间的所有内容y
)。树算法支持这一点,Log(n)
而散列索引可以导致全表扫描O(n)
。此外,哈希索引的恒定开销通常更大(这不是 theta 符号中的因素,但它仍然存在)。此外,树算法通常更容易维护、随数据增长、规模等。
Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.
哈希索引使用预定义的哈希大小,因此您最终会得到一些存储对象的“桶”。这些对象会再次循环以真正在该分区中找到正确的对象。
So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.
因此,如果您有小尺寸,则小元素的开销很大,大尺寸会导致进一步扫描。
Todays hash tables algorithms usually scale, but scaling can be inefficient.
今天的哈希表算法通常可扩展,但扩展效率低下。
There are indeed scalable hashing algorithms. Don't ask me how that works - its a mystery to me too. AFAIK they evolved from scalable replication where re-hashing is not easy.
Its called RUSH- Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.
确实有可扩展的散列算法。不要问我这是如何工作的——这对我来说也是个谜。AFAIK 它们是从可扩展复制演变而来的,在这种复制中,重新散列并不容易。
其称为RUSH- - [Replication ü的nDer小号calable ħ因而灰化,并且这些算法被称为算法RUSH。
However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.
但是,与哈希大小相比,您的索引可能会超过可容忍的大小,并且您的整个索引需要重新构建。通常这不是问题,但对于巨大的数据库来说,这可能需要几天时间。
The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.
树算法的权衡很小,它们几乎适用于所有用例,因此是默认值。
However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.
但是,如果您有一个非常精确的用例,并且您确切地知道需要什么以及只需要什么,则可以利用散列索引。
回答by lmiguelvargasf
Actually, it seems that MySQL uses both kind of indexes either a hash table or a b-tree according to the following link.
实际上,根据以下链接,MySQL 似乎使用了哈希表或 b 树这两种索引。
The difference between using a b-tree and a hash table is that the former allows you to use column comparisonsin expressions that use the =, >, >=, <, <=, or BETWEEN operators, while the latter is used only for equality comparisonsthat use the = or <=> operators.
使用 b 树和哈希表的区别在于,前者允许您在使用=、>、>=、<、<= 或 BETWEEN 运算符的表达式中使用列比较,而后者仅用于使用 = 或 <=> 运算符的相等比较。
回答by Emil Vikstr?m
The time complexity of hashtables is constant only for sufficiently sized hashtables (there need to be enough buckets to hold the data). The size of a database table is not known in advance so the table must be rehashed now and then to get optimal performance out of an hashtable. The rehashing is also expensive.
哈希表的时间复杂度仅对于足够大的哈希表是恒定的(需要有足够的桶来保存数据)。预先不知道数据库表的大小,因此必须不时重新散列该表,以便从散列表中获得最佳性能。重新散列也很昂贵。
回答by Jonathan Weatherhead
I think Hashmaps don't scale as well, and can be expensive when the entire map needs to be rehashed.
我认为 Hashmaps 也不能缩放,并且当整个地图需要重新散列时可能会很昂贵。
回答by RONALD LOUI
Pick DB/OS was based on hashing and worked well. With more memory these days to support efficient sparse hash tables, and redundant hashing to support modest range queries, I would say hashing may yet have its place (some would rather have other forms of non-range similarity-matching, such as wildcards and regexps). We also recommend copying to keep collision chains contiguous when memory hierarchies have big speed differences.
Pick DB/OS 基于散列并且运行良好。现在有更多的内存来支持高效的稀疏哈希表,以及冗余哈希来支持适度的范围查询,我想说哈希可能还有它的位置(有些人宁愿有其他形式的非范围相似性匹配,例如通配符和正则表达式)。当内存层次结构具有很大的速度差异时,我们还建议复制以保持碰撞链连续。