在 python 中使用巨大的列表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15283893/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:41:00  来源:igfitidea点击:

working with HUGE lists in python

pythonitertoolspoker

提问by scott

how can I manage a huge list of 100+ million strings? How can i begin to work with such a huge list?

如何管理 100 多个字符串的庞大列表?我如何开始处理如此庞大的清单?

example large list:

示例大列表:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"

采纳答案by David Wolever

With lots and lots of memory. Python lists and strings are actually reasonablyefficient, so provided you've got the memory, it shouldn't be an issue.

有很多很多的记忆。Python 列表和字符串实际上相当有效,因此只要您有内存,这应该不是问题。

That said, if what you're storing are specifically poker hands, you can definitely come up with more compact representations. For example, you can use one byte to encode each card, which means you only need one 64 bit int to store an entire hand. You could then store these in a NumPy array, which would be significantly more efficient than a Python list.

也就是说,如果您存储的是专门的扑克牌,您绝对可以想出更紧凑的表示。例如,您可以使用一个字节来编码每张卡片,这意味着您只需要一个 64 位 int 来存储一整手牌。然后,您可以将它们存储在 NumPy 数组中,这比 Python 列表要高效得多。

For example:

例如:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

And to speed up that last line a bit: hands[num] = map(cards_to_bytes.__getitem__, hand)

并加快最后一行的速度: hands[num] = map(cards_to_bytes.__getitem__, hand)

This will only require 7 * 133784560 = ~1gb of memory… And that could be cut down if you pack four cards into each byte (I don't know the syntax for doing that off the top of my head…)

这将只需要 7 * 133784560 = ~1gb 的内存......如果你在每个字节中装入四张卡,这可以减少(我不知道这样做的语法......)

回答by Leopd

There's often a trade-off between how long you spend coding and how long your code takes to run. If you're just trying to get something done quickly and don't expect it to run frequently, an approach like you're suggesting is fine. Just make the list huge -- if you don't have enough RAM, your system will churn virtual memory, but you'll probably get your answer faster than learning how to write a more sophisticated solution.

在花费多长时间编码和运行代码需要多长时间之间通常需要权衡。如果您只是想快速完成某件事并且不希望它经常运行,那么您建议的方法就可以了。只需让列表变大——如果您没有足够的 RAM,您的系统将搅动虚拟内存,但与学习如何编写更复杂的解决方案相比,您可能会更快地得到答案。

But if this is a system that you expect to be used on a regular basis, you should figure out something other than storing everything in RAM. An SQL database is probably what you want. They can be very complex, but because they are nearly ubiquitous there are plenty of excellent tutorials out there.

但是,如果这是一个您希望定期使用的系统,那么除了将所有内容存储在 RAM 中之外,您还应该找出其他方法。SQL 数据库可能就是您想要的。它们可能非常复杂,但因为它们几乎无处不在,所以有很多优秀的教程。

You might look to a well-documented framework like django which simplifies access to a database through an ORM layer.

您可能会寻找一个文档齐全的框架,例如 django,它可以通过 ORM 层简化对数据库的访问。

回答by Junuxx

If you just want to loop over all possible hands to count them or to find one with a certain property, there is no need to store them all in memory.

如果您只想遍历所有可能的手来计算它们或找到具有特定属性的手,则无需将它们全部存储在内存中

You can just use the iterator and not convert to a list:

您可以只使用迭代器而不是转换为列表:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

85900584 hand combinations in texas holdem poker.

德州扑克中的 85900584 手牌组合。

回答by Andrew Prock

Another memory-less option which allow you to create a stream of data for processing however you like is to use generators. For example.

另一个允许您创建数据流进行处理的无内存选项是使用生成器。例如。

Print the total number of hands:

打印总手数:

sum (1 for x in combinations(cards, 7))

Print the number of hands with the ace of clubs in it:

打印其中有俱乐部 A 的手数:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)

回答by Lee Daniel Crocker

My public domain OneJokerlibrary has some combinatoric functions that would be handy. It has an Iterator class that can give you information about the set of combinations without storing them or even running though them. For example:

我的公共领域OneJoker库有一些很方便的组合函数。它有一个 Iterator 类,可以为您提供有关组合集的信息,而无需存储它们,甚至无需运行它们。例如:

  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               

You could use the Iterator.rank() function to reduce each hand to a single int, store those in a compact array, then use Iterator.hand_at() to produce them on demand.

您可以使用 Iterator.rank() 函数将每只手减少到单个 int,将它们存储在一个紧凑的数组中,然后使用 Iterator.hand_at() 按需生成它们。