Java 使用 MinMax 和 Alpha-Beta 剪枝寻找最佳移动

Question

提问by StepTNT

I'm working on an AI for a game and I want to use the MinMaxalgorithm with the Alpha-Beta pruning.

我正在为游戏开发 AI，我想将MinMax算法与Alpha-Beta 修剪一起使用。

I have a rough idea on how it works but I'm still not able to write the code from scratch, so I've spend the last two days looking for some kind of pseudocode online.

我对它的工作原理有一个粗略的想法，但我仍然无法从头开始编写代码，所以我花了最近两天的时间在网上寻找某种伪代码。

My problem is that every pseudocode I've found online seems to be based on finding the value for the best move while I need to return the best move itself and not a number.

我的问题是，我在网上找到的每个伪代码似乎都是基于找到最佳走法的值，而我需要返回最佳走法本身而不是数字。

My current code is based on this pseudocode (source)

我当前的代码基于这个伪代码（源代码）

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) alpha = score
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

As you can see, this code returns a number and I guess that this is needed to make everything work (since the returned number is used during the recursion).

如您所见，此代码返回一个数字，我想这是使一切正常工作所必需的（因为在递归期间使用了返回的数字）。

So I thought that I may use an external variable to store the best move, and this is how I've changed the previous code:

所以我想我可以使用一个外部变量来存储最好的移动，这就是我改变以前的代码的方式：

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) {
              alpha = score
              bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE
          }
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

Now, this is how it makes sense to me, because we need to update the best move only if it's player's turn and if the move is better than the previous.

现在，这对我来说是有意义的，因为我们只需要在轮到玩家并且移动比之前的移动更好时更新最佳移动。

So, while I think that this one's correct (even if I'm not 100% sure), the sourcehas also a javaimplementation which updates the bestMoveeven in the score < betacase and I don't understand why.

所以，虽然我认为这个是正确的（即使我不是 100% 确定），但源代码也有一个java实现，它bestMove在这种score < beta情况下更新偶数，我不明白为什么。

Trying with that implementation led my code to choose as best move a move from the oppositing player, which doesn't seem to be correct (assuming that I'm the black player, I'm looking for the best move that I can make so I'm expecting a "black" move and not a "white" one).

尝试使用该实现导致我的代码选择了来自对手玩家的最佳移动，这似乎不正确（假设我是黑人玩家，我正在寻找我可以做出的最佳移动）我期待一个“黑色”的举动，而不是一个“白色”的举动）。

I don't know if my pseudocode (the second one) is the correct way to find the best move using MinMaxwith alpha-beta pruningor if I need to update the best move even in the score < betacase.

我不知道我的伪代码（第二个）是否是使用MinMax和alpha-beta 修剪找到最佳移动的正确方法，或者我是否需要更新最佳移动，即使在score < beta情况下。

Please feel free to suggest any new and bettere pseudocode if you prefer, I'm not bound to anything and I don't mind rewriting some code if it's better than mine.

如果您愿意，请随时提出任何新的和更好的伪代码，我不受任何约束，如果它比我的更好，我不介意重写一些代码。

EDIT:

编辑：

Since I can't understand the replies, I guess that maybe the question doesn't ask what I want to know so I'm trying to write it better here.

由于我无法理解这些回复，我想可能这个问题没有问我想知道什么，所以我试图在这里写得更好。

Provided that I want to get the best move only for one player and that this player, which is the maximizer, is passed to the MinMaxfunction everytime that I need a new move (so that minmax(2, black, a, b)returns the best move for the black player while minmax(2, white, a ,b)returns the best one for the white player), how would you change the first pseudocode (or the javaimplementation in the source) to store this given best move somewhere?

假设我只想为一个玩家获得最佳移动，并且这个玩家，即maximer ，每次我需要一个新移动时都会被传递给MinMax函数（这样就minmax(2, black, a, b)返回了黑人玩家的最佳移动，同时minmax(2, white, a ,b)返回了最适合白人玩家），您将如何更改第一个伪代码（或源代码中的java实现）以将这个给定的最佳移动存储在某个地方？

EDIT 2:

编辑2：

Let's see if we can make it work this way.

让我们看看我们是否可以让它以这种方式工作。

This is my implementation, can you please tell me if it's correct?

这是我的实现，你能告诉我它是否正确吗？

//PlayerType is an enum with just White and Black values, opponent() returns the opposite player type
protected int minMax(int alpha, int beta, int maxDepth, PlayerType player) {        
    if (!canContinue()) {
        return 0;
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    int value = 0;
    boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI        
    if (maxDepth == 0 || board.isGameOver()) {
        value = evaluateBoard();
        return value;
    }
    while (movesIterator.hasNext()) {
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        value = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) {
            if (value > alpha) {
                selectedMove = currentMove;
                alpha = value;
            }
        } else {
            if (value < beta) {
                beta = value;
            }
        }
        if (alpha >= beta) {
            break;
        }
    }
    return (isMaximizer) ? alpha : beta;
}

EDIT 3:

编辑 3：

New implementation based on @Codor's answer/comments

基于@Codor 的回答/评论的新实现

private class MoveValue {
    public Move move;
    public int value;

    public MoveValue() {
        move = null;
        value = 0;
    }

    public MoveValue(Move move, int value) {
        this.move = move;
        this.value = value;
    }

    @Override
    public String toString() {
        return "MoveValue{" + "move=" + move + ", value=" + value + '}';
    }

}

protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) {
    if (!canContinue()) {
        return new MoveValue();
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    MoveValue moveValue = new MoveValue();
    boolean isMaximizer = (player.equals(playerType));
    if (maxDepth == 0 || board.isGameOver()) {            
        moveValue.value = evaluateBoard();
        return moveValue;
    }
    while (movesIterator.hasNext()) {
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) {
            if (moveValue.value > alpha) {
                selectedMove = currentMove;
                alpha = moveValue.value;
            }
        } else {
            if (moveValue.value < beta) {
                beta = moveValue.value;
                selectedMove = currentMove;
            }
        }
        if (alpha >= beta) {
            break;
        }
    }
    return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta);
}

I don't know if I got it right or if I did something wrong, but I'm back to the problem I had when I posted the question:

我不知道我做对了还是做错了，但我又回到了我发布问题时遇到的问题：

calling minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black)returns a move that can be done only by the white player and this is not what I need.

调用minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black)返回一个只能由白人玩家完成的移动，这不是我需要的。

I need the best move for the given player, not the best move for the whole board.

我需要给定玩家的最佳走法，而不是整个棋盘的最佳走法。

Answer 1

采纳答案by StepTNT

After some research and a lot of time wasted solving this problem, I came up with this solution that seems to work.

经过一些研究并浪费了大量时间来解决这个问题，我想出了这个似乎有效的解决方案。

private class MoveValue {

    public double returnValue;
    public Move returnMove;

    public MoveValue() {
        returnValue = 0;
    }

    public MoveValue(double returnValue) {
        this.returnValue = returnValue;
    }

    public MoveValue(double returnValue, Move returnMove) {
        this.returnValue = returnValue;
        this.returnMove = returnMove;
    }

}


protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player) {       
    if (!canContinue()) {
        return new MoveValue();
    }        
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    double value = 0;
    boolean isMaximizer = (player.equals(playerType)); 
    if (maxDepth == 0 || board.isGameOver()) {            
        value = evaluateBoard();            
        return new MoveValue(value);
    }
    MoveValue returnMove;
    MoveValue bestMove = null;
    if (isMaximizer) {           
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue > alpha) {
                alpha = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = beta;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    } else {
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue < beta) {
                beta = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = alpha;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    }   
}

Answer 2

回答by Codor

This is a bit diffuclt as the given code is not an actual Java implementation; in order to achieve what you want, there must be concrete types to represent a move and position in the game tree. Usually the the game tree is not explicitly encoded but navigated in a sparse representation where the implementation would actually perform the move in question, evaluate the resulting smaller problem recursively and undo the move, thus using depth-first searchby using the call stack so represent the current path.

这有点困难，因为给定的代码不是实际的 Java 实现；为了实现你想要的，必须有具体的类型来表示游戏树中的移动和位置。通常博弈树没有显式编码，而是在稀疏表示中导航，实现将实际执行有问题的移动，递归评估由此产生的较小问题并撤消移动，因此使用调用堆栈使用深度优先搜索，因此表示当前路径。

To obtain the actual best move, simply return the instance from your method which maximizes the subsequent evaluation. It might be helpful to first implement the Minimax algorithmwithout alpha-beta-pruning, which is added in a subsequent steps after the basic structure works.

要获得实际的最佳移动，只需从您的方法中返回实例即可最大化后续评估。首先在没有alpha-beta-pruning 的情况下实现Minimax 算法可能会有所帮助，它会在基本结构工作后的后续步骤中添加。

The implementation from the link in the question (Section 1.5) actually returns the best move, as indicated in the following comment taken from there.

问题（第 1.5 节）中链接的实现实际上返回了最佳移动，如以下评论所示。

/** Recursive minimax at level of depth for either
    maximizing or minimizing player.
    Return int[3] of {score, row, col}  */

Here no user-defined type is used to represent the move, but the method returns three values, which are the evaluated best score and the coordinates to which the player would move to actually perform the best move (which the implementation already has done to obtain the score), which are a representation of the actual move.

这里没有使用用户定义的类型来表示移动，但该方法返回三个值，它们是评估的最佳分数和玩家实际执行最佳移动时将移动到的坐标（实现已经完成以获得分数），这是实际移动的表示。

Java 使用 MinMax 和 Alpha-Beta 剪枝寻找最佳移动

提问by StepTNT

采纳答案by StepTNT

回答by Codor

相关推荐

最近更新

标签

Java 使用 MinMax 和 Alpha-Beta 剪枝寻找最佳移动

提问by StepTNT

采纳答案by StepTNT

回答by Codor

相关推荐

解析 LocalDateTime (Java 8) 时无法从 TemporalAccessor 获取 LocalDateTime

Java 数组索引越界异常 0

Java Spring 4 - HTTP 状态 400，所需参数不存在

Java JPA vs ORM vs Hibernate？

相关推荐

最近更新

标签