Facebook has developed a new poker AI with better performance than Libratus!
Apr 20 , 2021
Researchers at Facebook have developed a general artificial intelligence framework called "Recursive Belief-based Learning" (ReBeL), which is proven by its outstanding performance in a game that has been difficult for AI programs for a long time (Texas Hold'em)
The ReBeL framework implements a new concept that enables it to better handle part of the information aspect of poker, even better than the previous Superman Poker AI, Libratus.
In recent years, artificial intelligence systems have shown amazing capabilities in cracking various complex games. DeepMind's AlphaZero program can teach itself to play chess only from the basic rules of chess, shogi (Japanese chess) and Go, and use the way of self-playing to reach new heights in these three games within a few hours.
Libratus also uses self-play to learn frontal unlimited poker. ReBeL did the same, but added a new concept, that is, what is "game state", allowing AI to better understand hidden information games in self-play.
ReBeL considers information about the state of the game that is visible, such as known cards, bet sizes, and even the range of hands that the opponent may have. In addition, it also considers each player's "belief" in his state, similar to how humans may consider whether their opponent thinks they are leading or lagging in the hand.
For this reason, ReBeL actually trains two different AI models through self-exertion reinforcement learning: one is the value network and the other is the policy network. Then, AI operates on what researchers call public belief states (PBS). In a perfect information game like chess, only one game state is enough to make a perfect decision. PBS not only considers the state of the game, but also factors such as the policies of the two parties, and thus derives a complete and probabilistic model that illustrates all the possible actions of the chess player and the results of these actions.
ReBeL performs better than Libratus in the face of human enemies
Compared with Dong Kim, one of the best heads-up poker players in the world, ReBeL played more than 2 seconds per hand in 7,500 hands, and the time required for decision-making was never more than 5 seconds. Facebook’s previous poker game system, Libratus, had a maximum score of 147, while ReBeL’s average blind (forced betting) score per game against humans was 165 (standard deviation 69).
Worried about being taken into the gambling industry, Facebook decided not to open the source code
In the experiment, ReBel performed well in imperfect information games. The Facebook team conducted an experiment in which ReBel played two player versions of Hold’em, Turn Endgame Hold’em (a simplified version of the game with no raises in the first two rounds) and Liar’s Dice.
The research team used up to 128 computers with 8 graphics cards to generate simulated game data, and randomly allocated bets and chip sizes (from 5,000 to 25,000 chips) during training. ReBeL is trained throughout the game and has $20,000 to bet.
Out of concerns about cheating, the Facebook team decided not to release the ReBeL code base for poker. Instead, they opened up the implementation of Liar’s Dice. Facebook researchers believe that ReBeL will make Texas Hold'em more popular in the field of reinforcement learning research.
"Although artificial intelligence algorithms already exist that can achieve superhuman performance in poker games, these algorithms usually assume that participants have a certain number of chips or use a certain bet size."
In actual combat, the number of your chips is uncertain, so you need to retrain the algorithm. In this case, it is difficult to conduct real-time battles. However, ReBeL can calculate any strategy of any bet size in a few seconds.