Meta researchers have developed an artificial intelligence that can play Diplomacy and beat most human players.
The Meta AI blog explains that Diplomacy is a “nearly impossible great challenge in AI” because of the skills required to play it. Diplomacy’s goal is to control the majority of the board. However, players can negotiate with one another through conversations, form alliances and make deals. They also need to be able to detect when others are lying.
Why is Cicero so special?
Researchers believe Cicero is a combination of strategic reasoning models and dialogue. Earlier AIs that won games against humans had one thing in common – all of those games were a “winner-takes-it-all” situation, but Diplomacy is different.In Diplomacy, seven players fight to control supply centres. Each player is constantly in negotiations with the other. Cicero is a game-changer because it was able to determine the fundamentals of human interactions (for Diplomacy), whether it be cooperation or deception.
Meta claims that Cicero has more than doubled human players’ average scores across 40 anonymous Diplomacy game. Cicero was also among the top 10% who had played more than one game. Cicero would make the decision on how to move next based on the performance of each player and text conversations.
AI that can beat the best players in games like poker, chess, and Go is trained by self-play reinforcement. This method is not suitable for Diplomacy, which requires cooperation with human players. Meta claims that supervised learning is often used in such games. However, Meta states that this creates an opponent who is “relatively weak” and can be exploited. Meta instead used an iterative planning algorithm, which “balances dialog consistency with rationality.”
Diplomacy, a board game that lets players compete to dominate Europe in a loose version of WW1, is called Diplomacy. You can maneuver a limited number of armies around the board each turn, but you also make alliances. Geoff is told that you must band together against Margret’s Germany. You agree to support Margret’s troops into Berlin and then secretly give your support to Margaret, who has promised to storm through Paris. Meta writes that diplomacy is “a game about people, not pieces”.
Meta will not be able to stop Meta from demonstrating his shrewdness in this area. It’s still a game in which you must convince others to cooperate with your efforts, and cicero is able to do that.
Meta’s blog post provides more details, as well as the research paper of the team. However, you can look at Mike Lewis’s tweet thread to see the most striking bits.
Meta rose to the occasion by creating and training CICERO, an AI called CICERO. It is now included in the top 10% Diplomacy players who have played more then one game (on webDiplomacy.net). This was possible because of the combination of two areas of AI research: natural language processing and strategic thinking.
While AI agents can be trained for games such as Chess by self-playing reinforcement learning, modeling cooperative play in Diplomacy required an entirely different approach. Meta states that the classic approach to learning would involve supervised training, where an agent would be trained with labeled data from previous Diplomacy games. However, supervised learning alone created an AI agent that was gullible and could easily be manipulated by lying people.
Cicero also includes a piKL iterative planning algorithm. This refines the initial prediction of other players’ policies and plans moves using dialogue between other players and the bot. This algorithm evaluates different options that could produce better results to help improve the anticipated moves of other players.
Andrew Goff, three time Diplomacy world champion, released a statement praising Cicero for his passionless approach to the game. Goff stated that “a lot of human players will soften or get motivated by revenge,” but Cicero does not do this. It just sees the situation and reacts accordingly. It’s very ruthless in its execution of its strategy, but not in a way that is annoying other players.
Cicero participated anonymously in 40 Diplomacy games on webDiplomacy.net. He was part of a “blitz league” that took place between August 19-20, 2022. Cicero finished among the top 10% of those who played more than one game. Cicero was second among 19 players who played at least five games. Cicero had a mean score of 25.8 percent for all 40 games. This is more than twice as high as the 12.4 percent average among its 82 competitors.