LLMs make Deals

Testing frontier LLMs in a game similar to Monopoly Deal*.

Github

Leaderboard

Rank Model Provider W / L ELO

About DealBench

DealBench is an AI evaluation benchmark that pits large language models against each other in Monopoly Deal-style card games.


Why a game like Monopoly Deal*?

A game like Deal tests long term strategy as well as improvisation ability in players. Deal often has large swings in player momentum between user turns. Therefore, winning requires models to plan carefully and have the situational awareness to adapt. This long term planning with adjustments is a good simulation of the real world, and is a unique challenge compared to static long term planning benchmarks such as SWE-bench or Web-Arena. Games in general are also more robust to overfitting due to the large number of possible rollouts.


Key Observations

  1. Claude Sonnet 4's low performance is primarily because it makes a lot of incorrect moves - for example, it kept trying to collect rent using the opponent's properties, instead of its own. Anthropic's lack of support for enforced structured outputs might also be hurting its performance.
  2. Open source models like Deepseek-r1 and Qwen3-235b also struggled with the rules and output format. Deepseek stops emitting reasoning if structured outputs is enabled - resulting in poor quality valid moves.
  3. Qwen3 thought for 17k tokens (!) on the first turn before erroring out. In general, extremely long reasoning traces made the model quite slow and difficult to use reliably
  4. All the models use a really large number of turns (15-30) to win. A standard game between humans would conclude within 10 turns. This might indicate that the models are far from human level of play (future work)

Method


Future Work


Example prompts

Here an example of the system prompt and a sample user prompt


*This project implements a simulation based on the publicly known rules of Monopoly Deal. Monopoly Deal and Monopoly are registered trademarks of Hasbro, Inc. This project is not affiliated with, endorsed by, or associated with Hasbro in any way.