performance 优化康威的“生命游戏”

Question

提问by OysterD

To experiment, I've (long ago) implemented Conway's Game of Life(and I'm aware of thisrelated question!).

为了实验，我（很久以前）实施了康威的生命游戏（我知道这个相关问题！）。

My implementation worked by keeping 2 arrays of booleans, representing the 'last state', and the 'state being updated' (the 2 arrays being swapped at each iteration). While this is reasonably fast, I've often wondered about how to optimize this.

我的实现通过保留 2 个布尔数组来工作，代表“最后状态”和“正在更新的状态”（每次迭代时交换 2 个数组）。虽然这相当快，但我经常想知道如何优化它。

One idea, for example, would be to precompute at iteration N the zones that couldbe modified at iteration (N+1) (so that if a cell does not belong to such a zone, it won't even be considered for modification at iteration (N+1)). I'm aware that this is very vague, and I never took time to go into the details...

例如，一种想法是在迭代 N 时预先计算可以在迭代 (N+1) 时修改的区域（这样，如果一个单元格不属于这样的区域，则在迭代 (N+1))。我知道这是非常模糊的，我从来没有花时间深入细节......

Do you have any ideas (or experience!) of how to go about optimizing (for speed) Game of Life iterations?

您对如何优化（速度）生命游戏迭代有任何想法（或经验！）？

Answer 1

回答by Chris Marasti-Georg

I am going to quote my answer from the other question, because the chapters I mention have some very interesting and fine-tuned solutions. Some of the implementation details are in c and/or assembly, yes, but for the most part the algorithms can work in any language:

我将引用另一个问题的答案，因为我提到的章节有一些非常有趣且经过微调的解决方案。一些实现细节在 c 和/或汇编中，是的，但在大多数情况下，算法可以在任何语言中工作：

Chapters 17and 18of Michael Abrash's Graphics Programmer's Black Bookare one of the most interesting reads I have ever had. It is a lesson in thinking outside the box. The whole book is great really, but the final optimized solutions to the Game of Life are incredible bits of programming.

章17和18迈克尔·亚伯拉什的的图形程序员的黑皮书是最有趣的人读我有过。这是一个跳出框框思考的教训。整本书真的很棒，但生命游戏的最终优化解决方案是令人难以置信的编程部分。

Answer 2

回答by 1800 INFORMATION

There are some super-fast implementations that (from memory) represent cells of 8 or more adjacent squares as bit patterns and use that as an index into a large array of precalculated values to determine in a single machine instruction if a cell is live or dead.

有一些超快速实现（从内存中）将 8 个或更多相邻方块的单元格表示为位模式，并将其用作大量预计算值的索引，以在单个机器指令中确定单元格是活的还是死的.

Check out here:

在这里查看：

http://dotat.at/prog/life/life.html

Also XLife:

还有 XLife：

http://linux.maruhn.com/sec/xlife.html

Answer 3

回答by A. Rex

You should look into Hashlife, the ultimate optimization. It uses the quadtreeapproach that skinp mentioned.

你应该研究Hashlife，终极优化。它使用了skinp 提到的四叉树方法。

Answer 4

回答by Omran Shilunte

what is the most efficient algo mainly depends on the initial state.

什么是最有效的算法主要取决于初始状态。

if the majority of cells is dead, you could save a lot of CPU time by skipping empty parts and not calculating stuff cell by cell.

如果大多数单元格都死了，您可以通过跳过空部分而不是逐个单元格计算东西来节省大量 CPU 时间。

im my opinion it can make sense to check for completely dead spaces first, when your initial state is something like "random, but with chance for life lower than 5%."

我认为，当您的初始状态类似于“随机，但生命机会低于 5%”时，首先检查完全死区是有意义的。

i would just divide the matrix up into halves and start checking the bigger ones first.

我只是将矩阵分成两半，然后开始检查较大的矩阵。

so if you have a field of 10,000 * 10,000, you′d first accumulate the states of the upper left quarter of 5,000 * 5,000.

所以如果你有一个 10,000 * 10,000 的字段，你首先要累积 5,000 * 5,000 左上四分之一的状态。

and if the sum of states is zero in the first quarter, you can ignore this first quarter completely now and check the upper right 5,000 * 5,000 for life next.

如果第一季度的状态总和为零，您现在可以完全忽略第一季度，然后检查右上角的 5,000 * 5,000 以获取生命。

if its sum of states is >0, you will now divide up the second quarter into 4 pieces again - and repeat this check for life for each of these subspaces.

如果它的状态总和 > 0，您现在将再次将第二季度分成 4 个部分 - 并对每个子空间重复此检查。

you could go down to subframes of 8*8 or 10*10 (not sure what makes the most sense here) now.

你现在可以降到 8*8 或 10*10 的子帧（不确定这里什么最有意义）。

whenever you find life, you mark these subspaces as "has life".

每当你找到生命时，你就将这些子空间标记为“有生命”。

only spaces which "have life" need to be divided into smaller subspaces - the empty ones can be skipped.

只有“有生命”的空间需要被划分为更小的子空间——可以跳过空的子空间。

when you are finished assigning the "has life" attribute to all possible subspaces, you end up with a list of subspaces which you now simply extend by +1 to each direction - with empty cells - and perform the regular (or modified) game of life rules to them.

当您完成将“has life”属性分配给所有可能的子空间后，您最终会得到一个子空间列表，您现在只需将其扩展到每个方向 +1 - 带有空单元 - 并执行常规（或修改）游戏生活规则给他们。

you might think that dividn up a 10,000*10,000 spae into subspaces of 8*8 is a lot os tasks - but accumulating their states values is in fact much, much less computing work than performing the GoL algo to each cell plus their 8 neighbours plus comparing the number and storing the new state for the net iteration somewhere...

您可能认为将 10,000*10,000 空间划分为 8*8 的子空间是很多操作系统任务 - 但累加它们的状态值实际上比对每个单元加上它们的 8 个相邻单元执行 GoL 算法要少得多的计算工作比较数字并在某处存储网络迭代的新状态......

but like i said above, for a random init state with 30% population this wont make much sense, as there will be not many completely dead 8*8 subspaces to find (leave alone dead 256*256 subpaces)

但就像我上面说的，对于 30% 人口的随机初始状态，这没有多大意义，因为不会有太多完全死的 8*8 子空间要找到（更不用说死的 256*256 子空间）

and of course, the way of perfect optimisation will last but not least depend on your language.

当然，完美优化的方式将持续但并非最不重要的取决于您的语言。

-110

Answer 5

回答by Owen Knight

The algorithm itself is inherently parallelizable. Using the same double-buffered method in an unoptimized CUDA kernel, I'm getting around 25ms per generation in a 4096x4096 wrapped world.

算法本身是可并行化的。在未优化的 CUDA 内核中使用相同的双缓冲方法，我在 4096x4096 包装的世界中每代大约 25 毫秒。

Answer 6

回答by 1729

As mentioned in Arbash's Black Book, one of the most simple and straight forward ways to get a huge speedup is to keep a change list.

正如 Arbash 的 Black Book 中提到的，获得巨大加速的最简单直接的方法之一是保留更改列表。

Instead of iterating through the entire cell grid each time, keep a copy of all the cells that you change.

不要每次都遍历整个单元格网格，而是保留您更改的所有单元格的副本。

This will narrow down the work you have to do on each iteration.

这将缩小您在每次迭代中必须完成的工作。

Answer 7

回答by Tyler

Two ideas:

两个想法：

(1) Many configurations are mostly empty space. Keep a linked list (not necessarily in order, that would take more time) of the live cells, and during an update, only update around the live cells (this is similar to your vague suggestion, OysterD :)

(1) 很多配置多为空位。保留活细胞的链接列表（不一定按顺序，这会花费更多时间），并且在更新期间，仅在活细胞周围更新（这类似于您模糊的建议，OysterD :)

(2) Keep an extra array which stores the # of live cells in each row of 3 positions (left-center-right). Now when you compute the new dead/live value of a cell, you need only 4 read operations (top/bottom rows and the center-side positions), and 4 write operations (update the 3 affected row summary values, and the dead/live value of the new cell). This is a slight improvement from 8 reads and 1 write, assuming writes are no slower than reads. I'm guessing you might be able to be more clever with such configurations and arrive at an even better improvement along these lines.

(2) 保留一个额外的数组，用于存储每行 3 个位置（左-中-右）的活细胞数量。现在，当您计算单元格的新死/活值时，您只需要 4 次读取操作（顶部/底部行和中心侧位置）和 4 次写入操作（更新 3 个受影响的行摘要值，以及死/新单元格的实时值）。假设写入不比读取慢，这比 8 次读取和 1 次写入略有改进。我猜你可能会更聪明地使用这样的配置，并沿着这些路线获得更好的改进。

Answer 8

回答by skinp

Don't exactly know how this can be done, but I remember some of my friends had to represent this game's grid with a Quadtree for a assignment. I'm guess it's real good for optimizing the space of the grid since you basically only represent the occupied cells. I don't know about execution speed though.

不完全知道如何做到这一点，但我记得我的一些朋友不得不用四叉树来代表这个游戏的网格来完成任务。我猜这对于优化网格空间非常有用，因为您基本上只代表被占用的单元格。我不知道执行速度。

Answer 9

回答by Joseph Bui

It's a two dimensional automaton, so you can probably look up optimization techniques. Your notion seems to be about compressing the number of cells you need to check at each step. Since you only ever need to check cells that are occupied or adjacent to an occupied cell, perhaps you could keep a buffer of all such cells, updating it at each step as you process each cell.

这是一个二维自动机，因此您可能会查找优化技术。您的想法似乎是关于压缩每一步需要检查的单元格数量。由于您只需要检查被占用或与被占用的单元格相邻的单元格，也许您可以保留所有这些单元格的缓冲区，在处理每个单元格时在每一步更新它。

If your field is initially empty, this will be much faster. You probably can find some balance point at which maintaining the buffer is more costly than processing all the cells.

如果您的字段最初为空，则速度会快得多。您可能会找到一些平衡点，在此平衡点维护缓冲区比处理所有单元格的成本更高。

Answer 10

回答by Lasse V. Karlsen

There are table-driven solutions for this that resolve multiple cells in each table lookup. A google query should give you some examples.

有针对此的表驱动解决方案，可以解析每个表查找中的多个单元格。谷歌查询应该给你一些例子。

performance 优化康威的“生命游戏”

提问by OysterD

回答by Chris Marasti-Georg

回答by 1800 INFORMATION

回答by A. Rex

回答by Omran Shilunte

回答by Owen Knight

回答by 1729

回答by Tyler

回答by skinp

回答by Joseph Bui

回答by Lasse V. Karlsen

相关推荐

最近更新

标签

performance 优化康威的“生命游戏”

提问by OysterD

回答by Chris Marasti-Georg

回答by 1800 INFORMATION

回答by A. Rex

回答by Omran Shilunte

回答by Owen Knight

回答by 1729

回答by Tyler

回答by skinp

回答by Joseph Bui

回答by Lasse V. Karlsen

相关推荐

python-2.7 用于 python 2.7 的 mod_python

Firebase 和索引/搜索

firebase 在 forEach 中使用条件语句时，如何检测是否每个孩子都未通过条件？

检测 Firebase 连接是否丢失/恢复

相关推荐

最近更新

标签