Python Minimax 为一个白痴解释

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3956258/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 13:33:36  来源:igfitidea点击:

Minimax explained for an idiot

pythonminimaxtic-tac-toe

提问by orokusaki

I've wasted my entire day trying to use the minimax algorithm to make an unbeatable tictactoe AI. I missed something along the way (brain fried).

我已经浪费了我一整天的时间试图使用极大极小算法来制作无与伦比的 tictactoe AI。一路上我错过了一些东西(脑炸)。

I'm not looking for code here, just a better explanation of where I went wrong.

我不是在这里寻找代码,只是更好地解释我哪里出错了。

Here is my current code (the minimax method always returns 0 for some reason):

这是我当前的代码(由于某种原因,minimax 方法总是返回 0):

from copy import deepcopy


class Square(object):
    def __init__(self, player=None):
        self.player = player

    @property
    def empty(self):
        return self.player is None


class Board(object):
    winning_combos = (
        [0, 1, 2], [3, 4, 5], [6, 7, 8], [0, 3, 6], [1, 4, 7], [2, 5, 8],
        [0, 4, 8], [2, 4, 6],
    )

    def __init__(self, squares={}):
        self.squares = squares
        for i in range(9):
            if self.squares.get(i) is None:
                self.squares[i] = Square()

    @property
    def available_moves(self):
        return [k for k, v in self.squares.iteritems() if v.empty]

    @property
    def complete(self):
        for combo in self.winning_combos:
            combo_available = True
            for pos in combo:
                if not pos in self.available_moves:
                    combo_available = False
            if combo_available:
                return self.winner is not None
        return True

    @property
    def player_won(self):
        return self.winner == 'X'

    @property
    def computer_won(self):
        return self.winner == 'O'

    @property
    def tied(self):
        return self.complete == True and self.winner is None

    @property
    def winner(self):
        for player in ('X', 'O'):
            positions = self.get_squares(player)
            for combo in self.winning_combos:
                win = True
                for pos in combo:
                    if pos not in positions:
                        win = False
                if win:
                    return player
        return None

    @property
    def heuristic(self):
        if self.player_won:
            return -1
        elif self.tied:
            return 0
        elif self.computer_won:
            return 1

    def get_squares(self, player):
        return [k for k,v in self.squares.iteritems() if v.player == player]

    def make_move(self, position, player):
        self.squares[position] = Square(player)

    def minimax(self, node, player):
        if node.complete:
            return node.heuristic
        a = -1e10000
        for move in node.available_moves:
            child = deepcopy(node)
            child.make_move(move, player)
            a = max([a, -self.minimax(child, get_enemy(player))])
        return a


def get_enemy(player):
    if player == 'X':
        return 'O'
    return 'X'

采纳答案by Conspicuous Compiler

Your complete function is not working as expected, causing games to be declared tied before anything can happen. For instance, consider this setup:

您的完整功能没有按预期工作,导致游戏在任何事情发生之前就被声明为绑定。例如,考虑这个设置:

>> oWinning = {
 1: Square('X'),
 3: Square('O'), 4: Square('X'),
 6: Square('O'), 8: Square('X'),
}
>> nb = Board(oWinning)
>> nb.complete
True
>> nb.tied
True

This should be a win for the computer on the next move. Instead, it says the game is tied.

这应该是计算机下一步的胜利。相反,它说游戏是绑定的。

The problem is that your logic in complete, right now, checks to see if all of the squares in a combo are free. If any of them are not, it presumes that that combo can't be won with. What it needs to do is check if any positions in that combo are occupied, and so long as all of those combos are either None or the same player, that combo should be considered still available.

问题是您的逻辑现在完整地检查组合中的所有方块是否都是空闲的。如果其中任何一个不是,则假定无法赢得该组合。它需要做的是检查该组合中是否有任何位置被占用,只要所有这些组合都是 None 或同一玩家,该组合就应被视为仍然可用。

e.g.

例如

def available_combos(self, player):
    return self.available_moves + self.get_squares(player)

@property
def complete(self):
    for player in ('X', 'O'):
        for combo in self.winning_combos:
            combo_available = True
            for pos in combo:
                if not pos in self.available_combos(player):
                   combo_available = False
            if combo_available:
                return self.winner is not None
    return True

Now that I properly tested this with the updated code I'm getting the expected result on this test case:

现在我用更新的代码正确地测试了这个,我在这个测试用例上得到了预期的结果:

>>> nb.minimax(nb, 'O')
-1
>>> nb.minimax(nb, 'X')
1

回答by Jean

As you already know the idea of Minimax is to deep search for the best value, assuming the opponent will always play the move with the worst value (worst for us, so best for them).

正如您已经知道的,Minimax 的想法是深入搜索最佳价值,假设对手总是以最坏的价值走棋(对我们来说最坏,对他们来说是最好的)。

The idea is, you will try to give a value to each position. The position where you lose is negative (we don't want that) and the position where you win is positive. You assume you will always try for the highest-value position, but you also assume the opponent will always aim at the lowest-value position, which has the worst outcome for us, and the best for them (they win, we lose). So you put yourself in their shoes, try to play as good as you can as them, and assume they will do that.
So if you find out you have possible two moves, one giving them the choice to win or to lose, one resulting in a draw anyway, you assume they will go for the move that will have them win if you let them do that. So it's better to go for the draw.

这个想法是,您将尝试为每个职位赋予价值。你输的位置是负的(我们不希望那样),你赢的位置是正的。你假设你总是会尝试最高价值的位置,但你也假设对手总是瞄准最低价值的位置,这对我们来说是最糟糕的结果,而对他们来说是最好的(他们赢了,我们输了)。所以你把自己放在他们的鞋子里,尽量像他们一样打得好,并假设他们会这样做。
所以如果你发现你有两种可能的走法,一个让他们选择赢或输,一个无论如何都会导致平局,你假设他们会选择如果你让他们这样做就会让他们获胜。所以最好去抽奖。

Now for a more "algorithmic" view.

现在来看一个更“算法”的观点。

Imagine your grid is nearly full except for two possible positions.
Consider what happens when you play the first one :
The opponent will play the other one. It's their only possible move so we don't have to consider other moves from them. Look at the result, associate a resulting value (+∞ if won, 0 if draw, -∞ if lost : for tic tac toe you can represent those as +1 0 and -1).
Now consider what happens when you play the second one :
(same thing here, opponent has only one move, look at the resulting position, value the position).

想象一下,除了两个可能的位置之外,您的网格几乎已满。
考虑一下当你玩第一个时会发生什么:
对手会玩另一个。这是他们唯一可能的举动,所以我们不必考虑他们的其他举动。查看结果,关联结果值(+∞ 如果赢了,0 如果平局,-∞ 如果输了:对于井字游戏,您可以将它们表示为 +1 0 和 -1)。
现在考虑当你玩第二个时会发生什么:(
同样的事情,对手只有一步,看看结果的位置,重视这个位置)。

You need to choose between the two moves. It's our move, so we want the best result (this is the "max" in minimax). Choose the one with the higher result as our "best" move. That's it for the "2 moves from end" example.

您需要在两个动作之间进行选择。这是我们的举动,所以我们想要最好的结果(这是 minimax 中的“max”)。选择结果更高的作为我们的“最佳”举措。这就是“从终点开始的 2 个移动”示例。

Now imagine you have not 2 but 3 moves left.
The principle is the same, you want to assign a value to each of your 3 possible moves, so that you can choose the best.
So you start by considering one of the three moves.
You are now in the situation above, with only 2 possible moves, but it's the opponent's turn. Then you start considering one of the possible moves for the opponent, like we did above. Likewise, you look at each of the possible moves, and you find an outcome value for both of them. It's the opponent move, so we assume they will play the "best" move for them, the one with the worst turnout for us, so it's the one with the lesser value (this is the "min" in minimax). Ignore the other one ; assume they will play what you found was best for them anyway. This is what your move will yield, so it's the value you assign to the first of your three moves.

现在想象你还剩 3 步而不是 2 步。
原理是一样的,你要给你的 3 个可能的动作中的每一个都分配一个值,这样你就可以选择最好的。
所以你首先考虑三个动作之一。
您现在处于上述情况,只有 2 个可能的移动,但轮到对手了。然后你开始考虑对手可能采取的行动之一,就像我们上面所做的那样。同样,您查看每个可能的移动,并找到它们的结果值。这是对手的走法,所以我们假设他们会为他们走“最好”的走法,对我们来说投票率最差的走法,所以这是价值较小的走法(这是 minimax 中的“min”)。忽略另一个;假设他们会演奏你发现的最适合他们的东西。这就是您的移动将产生的结果,因此它是您分配给三个移动中的第一个的值。

Now you consider each of your other possible 2 moves. You give them a value in the same manner. And from your three moves, you choose the one with the max value.

现在,您考虑其他可能的 2 个移动。你以同样的方式给他们一个值。从你的三个动作中,你选择一个具有最大值的动作。

Now consider what happens with 4 moves. For each of your 4 moves, you look what happens for the 3 moves of your opponent, and for each of them you assume they will choose the one that gives you the worst possible outcome of the best of the 2 remaining moves for you.

现在考虑 4 次移动会发生什么。对于你的 4 步中的每一步,你会观察对手的 3 步会发生什么,并且对于每一个你假设他们会选择一个能给你带来最坏结果的那一个,这是剩下的 2 步中最好的。

You see where this is headed. To evaluate a move n steps from the end, you look at what may happen for each of the n possible moves, trying to give them a value so that you can pick the best. In the process, you will have to try to find the best move for the player that plays at n-1 : the opponent, and choose the step with the lesser value. In the process of evaluating the n-1 move, you have to choose between the possible n-2 moves, which will be ours, and assume we will play as well as we can at this step. Etc.

你看这是怎么回事。要评估距离最后 n 步的移动,您会查看 n 种可能移动中的每一个可能发生的情况,尝试给它们一个值,以便您可以选择最好的。在此过程中,您将不得不尝试为在 n-1 处下棋的玩家(对手)找到最佳移动,并选择具有较小值的步骤。在评估 n-1 步的过程中,您必须在可能的 n-2 步之间进行选择,这将是我们的,并假设我们将在此步骤中尽可能地发挥作用。等等。

This is why this algorithm is inherently recursive. Whatever n, at step n you evaluate all possible steps at n-1. Rinse and repeat.

这就是为什么这个算法本质上是递归的。无论 n,在第 n 步,您都会评估 n-1 处所有可能的步骤。冲洗并重复。

For tic-tac-toe todays machines are far powerful enough to compute all possible outcomes right off from the start of the game, because there are only a few hundred of them. When you look to implement it for a more complex game, you will have to stop computing at some point because it will take too long. So for a complex game, you will also have to write code that decides whether to continue looking for all possible next moves or to try to give a value to the position now and return early. It means you will also have to compute a value for position that is not final - for example for chess you would take into account how much material each opponent has on the board, the immediate possibilities of check without mate, how many tiles you control and all, which makes it not trivial.

对于井字游戏,今天的机器已经足够强大,可以从游戏一开始就计算出所有可能的结果,因为只有几百个。当您希望为更复杂的游戏实现它时,您将不得不在某个时候停止计算,因为这将花费太长时间。因此,对于复杂的游戏,您还必须编写代码来决定是继续寻找所有可能的下一步动作,还是尝试立即为位置赋值并尽早返回。这意味着您还必须计算一个非最终位置的值——例如,对于国际象棋,您会考虑每个对手在棋盘上有多少材料,没有队友的直接过牌可能性,您控制的瓷砖数量以及所有,这使得它不是微不足道的。

回答by Guy Sirton

Step 1: Build your game tree

第 1 步:构建您的游戏树

Starting from the current board generate all possible moves your opponent can make. Then for each of those generate all the possible moves you can make. For Tic-Tac-Toe simply continue until no one can play. In other games you'll generally stop after a given time or depth.

从当前棋盘开始,生成对手可以做出的所有可能的动作。然后为其中的每一个生成您可以进行的所有可能的移动。对于 Tic-Tac-Toe,只需继续直到无人玩为止。在其他游戏中,您通常会在给定的时间或深度后停止。

This looks like a tree, draw it yourself on a piece of paper, current board at top, all opponent moves one layer below, all your possible moves in response one layer below etc.

这看起来像一棵树,你自己在一张纸上画它,当前板在顶部,所有对手都移动到下面一层,所有可能的移动响应下面一层等等。

Step 2: Score all boards at the bottom of the tree

第 2 步:对树底部的所有板进行评分

For a simple game like Tic-Tac-Toe make the score 0 if you lose, 50 tie, 100 win.

对于像 Tic-Tac-Toe 这样的简单游戏,如果您输了,则得分为 0,平局 50,赢了 100。

Step 3: Propagate the score up the tree

第 3 步:将分数向上传播

This is where the min-max come into play. The score of a previously unscored board depends on its children and who gets to play. Assume both you and your opponent always choose the best possible move at the given state. The best move for the opponent is the move that gives youthe worst score. Likewise, your best move is the move that gives you the highest score. In case of the opponent's turn, you choose the child with the minimum score (that maximizes his benefit). If it is your turn you assume you'll make the best possible move, so you choose the maximum.

这就是 min-max 发挥作用的地方。以前未计分的棋盘的得分取决于它的孩子和谁可以玩。假设你和你的对手总是在给定的状态下选择最好的移动。对对手来说最好的举动是给最差分数的举动。同样,你最好的举动是给你最高分的举动。在轮到对手的情况下,您选择分数最低的孩子(使他的利益最大化)。如果轮到你了,你会假设你会做出最好的举动,所以你选择了最大值。

Step 4: Pick your best move

第 4 步:选择你最好的动作

Now play the move that results in the best propagated score among all your possible plays from the current position.

现在从当前位置开始在所有可能的游戏中播放产生最佳传播分数的移动。

Try it on a piece of paper, if starting from a blank board is too much for you start from some advanced Tic-Tac-Toe position.

在一张纸上尝试一下,如果从空白板开始对于您从一些高级 Tic-Tac-Toe 位置开始来说太多了。

Using recursion:Very often this can be simplified by using recursion. The "scoring" function is called recursively at each depth and depending on whether or not the depth is odd or even it select max or min respectively for all possible moves. When no moves are possible it evaluates the static score of the board. Recursive solutions (e.g. the example code) can be a bit trickier to grasp.

使用递归:通常这可以通过使用递归来简化。“计分”函数在每个深度被递归调用,并且取决于深度是奇数还是偶数,它分别为所有可能的移动选择最大值或最小值。当不可能移动时,它会评估棋盘的静态分数。递归解决方案(例如示例代码)可能有点难以掌握。