Python 有什么比 dict() 更快的吗？

Question

提问by alec_djinn

I need a faster way to store and access around 3GB of k:vpairs. Where kis a stringor an integerand vis an np.array()that can be of different shapes. Is there any object, that is faster than the standard python dict in storing and accessing such a table? For example, a pandas.DataFrame?

我需要一种更快的方式来存储和访问大约 3GB 的数据k:v对。其中kastring或 aninteger和vis annp.array()可以是不同的形状。是否有任何对象在存储和访问此类表时比标准 python dict 更快？例如，a pandas.DataFrame?

As far I have understood python dict is a quite fast implementation of a hashtable, is there anything better than that for my specific case?

据我所知，python dict 是哈希表的一个非常快的实现，对于我的特定情况，还有什么比这更好的实现吗？

Answer 1

回答by Kasramvd

No, there is nothing faster than a dictionary for this task and that's because the complexity of its indexing and even membership checking is approximately O(1).

不，对于这项任务，没有什么比字典更快的了，这是因为它的索引甚至成员资格检查的复杂性大约是 O(1)。

Once you save your items in a dictionary, you can access them in constant time which means that it's unlikely that your performance problem has anything to do with dictionary indexing. However, you might be able to make this process slightly faster by making some changes in your objects and their types that may result in some optimizations to under the hood operations. For example, if your strings (keys) are not very large you can intern them, which caches them in memory rather than creating them as separate object. If the keys in a dictionary are interned and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. That reduces the access time to the object.

一旦您将项目保存在字典中，您就可以在恒定时间内访问它们，这意味着您的性能问题不太可能与字典索引有关。但是，您可以通过对对象及其类型进行一些更改来稍微加快此过程，这可能会导致对底层操作进行一些优化。例如，如果您的字符串（键）不是很大，您可以对它们进行实习，这会将它们缓存在内存中，而不是将它们创建为单独的对象。如果字典中的键是实习的并且查找键是实习的，则可以通过指针比较而不是字符串比较来完成键比较（散列后）。这减少了对对象的访问时间。

Python has provided an intern()function within the sysmodule that you can use for this.

Pythonintern()在sys模块中提供了一个函数，您可以为此使用它。

Enter string in the table of “interned” strings and return the interned string – which is string itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup...

在“interned”字符串表中输入 string 并返回 interned 字符串——它是字符串本身或副本。实习字符串对于在字典查找中获得一点性能很有用......

Here is an example:

下面是一个例子：

In [49]: d = {'mystr{}'.format(i): i for i in range(30)}

In [50]: %timeit d['mystr25']
10000000 loops, best of 3: 46.9 ns per loop

In [51]: d = {sys.intern('mystr{}'.format(i)): i for i in range(30)}

In [52]: %timeit d['mystr25']
10000000 loops, best of 3: 38.8 ns per loop

Answer 2

回答by akash karothiya

No, I don't think there is anything faster than dict. The time complexity of its index checking is O(1).

不，我认为没有比dict. 其索引检查的时间复杂度为O(1).

-------------------------------------------------------
Operation    |  Average Case  | Amortized Worst Case  |
-------------------------------------------------------
Copy[2]      |    O(n)        |       O(n)            | 
Get Item     |    O(1)        |       O(n)            | 
Set Item[1]  |    O(1)        |       O(n)            | 
Delete Item  |    O(1)        |       O(n)            | 
Iteration[2] |    O(n)        |       O(n)            | 
-------------------------------------------------------

PS https://wiki.python.org/moin/TimeComplexity

Answer 3

回答by sonus21

You can think of storing them in Data structure like Trie given your key is string. Even to store and retrieve from Trie you need O(N) where N is maximum length of key. Same happen to hash calculation which computes hash for key. Hash is used to find and store in Hash Table. We often don't consider the hashing time or computation.

鉴于您的密钥是字符串，您可以考虑将它们存储在像 Trie 这样的数据结构中。即使要从 Trie 存储和检索，您也需要 O(N)，其中 N 是密钥的最大长度。同样发生在计算密钥散列的散列计算中。哈希用于在哈希表中查找和存储。我们通常不考虑散列时间或计算。

You may give a shot to Trie, Which should be almost equal performance, may be little bit faster( if hash value is computed differently for say

您可以尝试一下 Trie，这应该是几乎相同的性能，可能会快一点（如果哈希值的计算方式不同，例如

HASH[i] = (HASH[i-1] + key[i-1]*256^i % BUCKET_SIZE ) % BUCKET_SIZE

or something similar due to collision we need to use 256^i.

或类似的由于碰撞我们需要使用 256^i。

You can try to store them in Trie and see how it performs.

您可以尝试将它们存储在 Trie 中并查看其性能。

Python 有什么比 dict() 更快的吗？

提问by alec_djinn

回答by Kasramvd

回答by akash karothiya

回答by sonus21

相关推荐

最近更新

标签

Python 有什么比 dict() 更快的吗？

提问by alec_djinn

回答by Kasramvd

回答by akash karothiya

回答by sonus21

相关推荐

Python Spyder 缺少对象检查器

Python 何时缓存 DataFrame？

OpenCV Python：标准化图像

删除在python3中使用venv创建的虚拟环境

相关推荐

最近更新

标签