lucadipalma
update tags
0a26b83
---
title: Agentic Codenames Arena
emoji: 📊
colorFrom: blue
colorTo: blue
python_version: 3.12.6
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
short_description: Time for the LLMs to have some fun with Codenames!
tags:
- mcp-in-action-track-creative
- mcp-in-action-track-consumer
- Google
- Gemini
- Anthropic
- OpenAI
- HuggingFace
- ElevenLabs
---
# 🧠 Agentic Codenames Arena
![Meme](assets/meme.png)
**Watch, or join, LLMs battling it out in Codenames.**
**New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started.
---
## ✅ Hackathon Requirements:
`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA)
`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)
`My HuggingFace Username`: lucadipalma1998
---
## 🧩 What This App Does
**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*.
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of:
* **1 Boss**: Provides the clue and clue number for each turn.
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.
The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.
Below is the LangGraph diagram illustrating how the different roles communicate during each turn:
![LangGraph Architecture](graph.png)
You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues.
---
## 🤖 How It Works
### **LLM Teams**
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
Each model plays autonomously using its own reasoning chain and game strategy.
### **Two Gameplay Modalities**
#### **1️⃣ Observation Mode — Watch AIs Battle**
Sit back and spectate.
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses.
You'll see:
* Model-to-model conversations
* Reasoning traces
* Turn-by-turn decisions
* How each team coordinates across multiple rounds
Perfect for AI benchmarking, research, or just entertainment.
#### **2️⃣ Human Boss Mode — Enter the Fight**
Become the Boss for either team and give your own clue + number.
Your AI teammates will interpret your hint and take their guesses.
---
## 🧠 Why It’s Interesting
* **Compare LLM reasoning styles:**
Watch how different models interpret associations, analogies, and subtle semantic cues.
* **Analyze team dynamics:**
Some models coordinate beautifully. Others… not so much.
Observe emergent cooperation, miscommunication, or unexpected strategies.
* **Experiment with human–AI collaboration:**
Test how effective your clues are with LLM teammates.
Try pushing the limits with creative, cryptic, or minimalist hints.
---
## 🕹️ Main Features
* **Build teams by selecting providers** or choose `random` to generate a mixed-model team.
* **Switch between AI vs AI** and **Human vs AI** modes
* **Detailed per-turn logs** for all model decisions
* **Transparent reasoning chains**
* **Interactive UI** for watching matches play out
* **Match history & analytics dashboard**
---
## 📊 Stats & Analytics
All games played in the Arena are stored in a database.
The Stats section of the app includes:
* **Model win/loss rates** across all recorded matches
* **Performance comparisons** between model families (OpenAI vs Google vs …)
* **Historical match logs** for replay & analysis
* **Leaderboards** highlighting the best-performing models
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.
---
<a id="how-to-play"></a>
## ❓ How to Play
### 📝 Summary
Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.
### 💡 Let's see an example
What Bosses see (above) VS what other players see (below)
<img src="assets/example.png" alt="Example board" width="400">
<img src="assets/no-color-board.png" alt="Example board" width="400">
### 👥 Team Roles
Each team has four members with distinct responsibilities:
* **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
* **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
* **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations.
---
### 🎮 How a Turn Works
#### 1️⃣ Boss Gives a Clue
The Red Boss (seeing the board) might say:
> **"Atmosphere: 2"**
This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of:
* **AIR** (part of the atmosphere)
* **SPACE** (beyond the atmosphere)
*⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.*
---
#### 2️⃣ Team Discussion
The Captain and Players discuss without seeing the colors:
* **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.”
* **Player 2:** “SPACE could connect because it's outside the atmosphere.”
---
#### 3️⃣ Captain Makes Final Selection
The Captain decides which words to touch, in order:
1. AIR ✅ (Red — Correct!)
2. SPACE ✅ (Red — Correct!)
The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).
---
### ⚠️ Mistakes to Avoid
* Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**!
* Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team.
* Guessing **SATURN** (beige — neutral) simply ends the turn.
---
### 🏆 Winning the Game
The game ends when:
***A team finds all their colored words** → That team wins!
***A team touches the killer word (STAFF)** → That team loses immediately!
---
### 💡 Strategy Tips
#### For the Boss:
* Try to link multiple words with creative clues
* Avoid clues that may lead to the killer or opponent’s words
* Consider associations your team might make
#### For Captain & Players:
* Discuss all possible interpretations
* Consider risky words
* Don’t be afraid to stop early to avoid the killer word
* The Captain has final say but should consider all suggestions
---
## 🤝 Sponsors
Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.