![]() ![]() So branching factors are likely selected for such that they get higher by good agents. Much like chess, I would expect optionality to be an important strategic consideration in Stratego. In practice though the average length of a game of Go is ~200ish, the average length of Stratego ~400ish. Meanwhile, on your first move in Go, you are in at most 1 state. We can definitely get a lower upper bound by applying abstraction via domain knowledge, but just so I don't have to deal with the complexity I'll state that there are at most 8683317618811886495518194401280000000 different states associated with the infostate of your first move. In Stratego there are thirty three pieces with unknown information on the first move. Meanwhile Stratego is also infinitely more complex than Go because in Go there are less moves with every move, but in Stratego moves don't monotonically decrease the remaining game length.īeyond those two infinities of greater complexity, Stratego is imperfect information, so it is played relative to the infostate, not the state. Just to give a sense of hardness, Stratego's game tree is infinitely larger than Go's game tree, because in imperfect information you select a continuous strategy vector over action space whereas in Go you select a discrete action. The eight 2’s (Scout) are typically deployed both in the front and more in the back,Īllowing to scout opponent pieces initially but also in later phases of the game. Typically increases throughout a game as more opponent Bombs and potential Flag positions get The 3’s (Miner), which canĭefuse Bombs, are often placed on the back row, which makes sense because their importance Which complies with the behavior seen from strong human players. DeepNash does not often deploy Bombs on the front row, Additionally, the Spy is quite often located not too far away from the 9 (or 8), which Observed is that the highest pieces, the 10 and 9, are often deployed on different sides of theīoard. Vincent de Boer,ģ-fold World Champion) believe that it is indeed good to occasionally not protect the Flagīecause this unpredictability makes it harder for the opponent in the end-game. However, DeepNash will not surround the Flag with Bombs. > The Flag is almost always put on the back row, and often protected by Bombs. I really like the section on initial piece deployment: Her last obsession was Tyrants of the Underdark. But a momentary lapse of Stratego-face wasn’t the issue. But then she move a piece that I assumed was a bomb and her face gave away everything. In Stratego she had bluffed me into believing her flag was in a corner. She likes everything from Codenames to those Rosenberg games that are so heavy they should come with an OSHA training poster. Oh, and time she slipped a WMD into the US in Labyrinth. Then, giggled, and sighed for relief, as she paid the overnight rate to send an armada to my doorstep. Spent the whole game exploring and increasing ship speed and weapons just enough, waiting for me to commit my heavily armed, but slow-ish ships. My favorite example is the time she carefully mislead me into believing that she hadn’t found the other end of a wormhole that opened near my home-world in a game of Space Empires 4x. She’s one of the few people who can regularly beat me at war games. But it causes all sorts of confusion when trying to reimplement. It doesn't ruin any of the theory and no one is going to hold it against you. I wish researchers would be more honest with "this is a hack to get things to work on a computer because neural networks have floating point inaccuracies". It also opens up the question of if this new "fine-tuned" policy still guarantees the Nash equilibrium which it obviously does not as some mixed strategies are going to have sufficiently small probability. Here, they admit that fine-tuning is vitally important (one of the 3 core steps) but details are relegated to the supplementary materials. As is my general critique with their previous papers, they generally omit many engineering details that prove to be very important. This is the AlphaZero step increment to NeuRD. It should also be noted (as they do in the paper) that this is more incremental than methodologically innovative as AlphaGo. This differs from counter factual regret methods (like the most famous Poker AIs) because it does not need to compute for all possible "information sets" which makes it intractable for even sufficiently complicated poker variants. Note, there is no MCTS being done in this paper. The core contribution here is the Nash equilibrium component to imperfect information games using only self-play. I'm still trying to grok and implement the paper, but I studied AlphaGo/AlphaZero/MuZero during my PhD.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |