|
|
--- |
|
|
title: Agentic Codenames Arena |
|
|
emoji: 📊 |
|
|
colorFrom: blue |
|
|
colorTo: blue |
|
|
python_version: 3.12.6 |
|
|
sdk: gradio |
|
|
sdk_version: 5.49.1 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
short_description: Time for the LLMs to have some fun with Codenames! |
|
|
tags: |
|
|
- mcp-in-action-track-creative |
|
|
- mcp-in-action-track-consumer |
|
|
- Google |
|
|
- Gemini |
|
|
- Anthropic |
|
|
- OpenAI |
|
|
- HuggingFace |
|
|
- ElevenLabs |
|
|
--- |
|
|
|
|
|
|
|
|
# 🧠 Agentic Codenames Arena |
|
|
|
|
|
 |
|
|
|
|
|
**Watch, or join, LLMs battling it out in Codenames.** |
|
|
|
|
|
**New to Codenames? No problem.** Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started. |
|
|
|
|
|
--- |
|
|
|
|
|
## ✅ Hackathon Requirements: |
|
|
|
|
|
`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA) |
|
|
|
|
|
`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08) |
|
|
|
|
|
`My HuggingFace Username`: lucadipalma1998 |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧩 What This App Does |
|
|
|
|
|
**Agentic Codenames Arena** is an interactive dashboard where teams of LLMs compete in the game of *Codenames*. |
|
|
Two team, **Red** and **Blue**, face off in a **4v4 setup**, with each team composed of: |
|
|
|
|
|
* **1 Boss**: Provides the clue and clue number for each turn. |
|
|
* **1 Captain**: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”. |
|
|
* **2 Players**: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions. |
|
|
|
|
|
|
|
|
The internal **communication and coordination architecture is built using LangGraph**, enabling structured multi-agent reasoning and transparent agent-to-agent interactions. |
|
|
|
|
|
Below is the LangGraph diagram illustrating how the different roles communicate during each turn: |
|
|
|
|
|
 |
|
|
|
|
|
You can either **sit back and watch fully autonomous LLM teams play**, or **step in as a human Boss** to lead your AI teammates with your own clues. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🤖 How It Works |
|
|
|
|
|
### **LLM Teams** |
|
|
|
|
|
Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace... |
|
|
Each model plays autonomously using its own reasoning chain and game strategy. |
|
|
|
|
|
### **Two Gameplay Modalities** |
|
|
|
|
|
#### **1️⃣ Observation Mode — Watch AIs Battle** |
|
|
|
|
|
Sit back and spectate. |
|
|
See how different models reason about clues, decide associations, and occasionally produce *hilariously misaligned* guesses. |
|
|
|
|
|
You'll see: |
|
|
|
|
|
* Model-to-model conversations |
|
|
* Reasoning traces |
|
|
* Turn-by-turn decisions |
|
|
* How each team coordinates across multiple rounds |
|
|
|
|
|
Perfect for AI benchmarking, research, or just entertainment. |
|
|
|
|
|
#### **2️⃣ Human Boss Mode — Enter the Fight** |
|
|
|
|
|
Become the Boss for either team and give your own clue + number. |
|
|
Your AI teammates will interpret your hint and take their guesses. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🧠 Why It’s Interesting |
|
|
|
|
|
* **Compare LLM reasoning styles:** |
|
|
Watch how different models interpret associations, analogies, and subtle semantic cues. |
|
|
|
|
|
* **Analyze team dynamics:** |
|
|
Some models coordinate beautifully. Others… not so much. |
|
|
Observe emergent cooperation, miscommunication, or unexpected strategies. |
|
|
|
|
|
* **Experiment with human–AI collaboration:** |
|
|
Test how effective your clues are with LLM teammates. |
|
|
Try pushing the limits with creative, cryptic, or minimalist hints. |
|
|
|
|
|
--- |
|
|
|
|
|
## 🕹️ Main Features |
|
|
|
|
|
* **Build teams by selecting providers** or choose `random` to generate a mixed-model team. |
|
|
* **Switch between AI vs AI** and **Human vs AI** modes |
|
|
* **Detailed per-turn logs** for all model decisions |
|
|
* **Transparent reasoning chains** |
|
|
* **Interactive UI** for watching matches play out |
|
|
* **Match history & analytics dashboard** |
|
|
|
|
|
--- |
|
|
|
|
|
## 📊 Stats & Analytics |
|
|
|
|
|
All games played in the Arena are stored in a database. |
|
|
The Stats section of the app includes: |
|
|
|
|
|
* **Model win/loss rates** across all recorded matches |
|
|
* **Performance comparisons** between model families (OpenAI vs Google vs …) |
|
|
* **Historical match logs** for replay & analysis |
|
|
* **Leaderboards** highlighting the best-performing models |
|
|
|
|
|
This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure. |
|
|
|
|
|
--- |
|
|
<a id="how-to-play"></a> |
|
|
## ❓ How to Play |
|
|
|
|
|
|
|
|
### 📝 Summary |
|
|
|
|
|
Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who **can see** a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, **who cannot see any colors**, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words. |
|
|
|
|
|
### 💡 Let's see an example |
|
|
|
|
|
What Bosses see (above) VS what other players see (below) |
|
|
|
|
|
<img src="assets/example.png" alt="Example board" width="400"> |
|
|
<img src="assets/no-color-board.png" alt="Example board" width="400"> |
|
|
|
|
|
### 👥 Team Roles |
|
|
|
|
|
Each team has four members with distinct responsibilities: |
|
|
|
|
|
* **1 Boss** 🎯: The only player who can see the color-coded board. Provides clues to guide the team. |
|
|
* **1 Captain** 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections. |
|
|
* **2 Players** 💭: Collaborate with the Captain, propose interpretations and associations. |
|
|
|
|
|
--- |
|
|
|
|
|
### 🎮 How a Turn Works |
|
|
|
|
|
#### 1️⃣ Boss Gives a Clue |
|
|
|
|
|
The Red Boss (seeing the board) might say: |
|
|
|
|
|
> **"Atmosphere: 2"** |
|
|
|
|
|
This clue suggests 2 red words are related to *atmosphere*. Looking at the board, the Boss is thinking of: |
|
|
|
|
|
* **AIR** (part of the atmosphere) |
|
|
* **SPACE** (beyond the atmosphere) |
|
|
|
|
|
*⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.* |
|
|
|
|
|
--- |
|
|
|
|
|
#### 2️⃣ Team Discussion |
|
|
|
|
|
The Captain and Players discuss without seeing the colors: |
|
|
|
|
|
* **Player 1:** “AIR feels like the safest bet — it's literally the atmosphere.” |
|
|
* **Player 2:** “SPACE could connect because it's outside the atmosphere.” |
|
|
|
|
|
--- |
|
|
|
|
|
#### 3️⃣ Captain Makes Final Selection |
|
|
|
|
|
The Captain decides which words to touch, in order: |
|
|
|
|
|
1. AIR ✅ (Red — Correct!) |
|
|
2. SPACE ✅ (Red — Correct!) |
|
|
|
|
|
The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable). |
|
|
|
|
|
--- |
|
|
|
|
|
### ⚠️ Mistakes to Avoid |
|
|
|
|
|
* Guessing **STAFF** (black — killer word) ends the game **immediately**. They **lose**! |
|
|
* Guessing **WALL** (blue — opponent’s word) ends the turn and gives that word to the Blue team. |
|
|
* Guessing **SATURN** (beige — neutral) simply ends the turn. |
|
|
|
|
|
--- |
|
|
|
|
|
### 🏆 Winning the Game |
|
|
|
|
|
The game ends when: |
|
|
|
|
|
* ✅ **A team finds all their colored words** → That team wins! |
|
|
* ❌ **A team touches the killer word (STAFF)** → That team loses immediately! |
|
|
|
|
|
--- |
|
|
|
|
|
### 💡 Strategy Tips |
|
|
|
|
|
#### For the Boss: |
|
|
|
|
|
* Try to link multiple words with creative clues |
|
|
* Avoid clues that may lead to the killer or opponent’s words |
|
|
* Consider associations your team might make |
|
|
|
|
|
#### For Captain & Players: |
|
|
|
|
|
* Discuss all possible interpretations |
|
|
* Consider risky words |
|
|
* Don’t be afraid to stop early to avoid the killer word |
|
|
* The Captain has final say but should consider all suggestions |
|
|
|
|
|
--- |
|
|
|
|
|
## 🤝 Sponsors |
|
|
|
|
|
Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon. |
|
|
|
|
|
|