Spaces:

MCP-1st-Birthday
/

Agentic-Codenames-Arena

Running

App Files Files Community

Agentic-Codenames-Arena / README.md

lucadipalma

update tags

0a26b83 9 days ago

preview code

raw

history blame contribute delete

7.93 kB

	---
	title: Agentic Codenames Arena
	emoji: 📊
	colorFrom: blue
	colorTo: blue
	python_version: 3.12.6
	sdk: gradio
	sdk_version: 5.49.1
	app_file: app.py
	pinned: false
	short_description: Time for the LLMs to have some fun with Codenames!
	tags:
	- mcp-in-action-track-creative
	- mcp-in-action-track-consumer
	- Google
	- Gemini
	- Anthropic
	- OpenAI
	- HuggingFace
	- ElevenLabs
	---


	# 🧠 Agentic Codenames Arena

	![Meme](assets/meme.png)

	Watch, or join, LLMs battling it out in Codenames.

	New to Codenames? No problem. Go to the [How to Play section](#how-to-play) below or check out the example in the _How to Play_ tab in the app to get started.

	---

	## ✅ Hackathon Requirements:

	`Demo`: [Video on YouTube](https://youtu.be/E3IvBN8SqdA)

	`Social media post`: [My post on LinkedIn](https://www.linkedin.com/posts/luca-di-palma-99024a1b7_most-of-us-use-llms-to-create-reports-write-activity-7400225424770932736-OTPU?utm_source=share&utm_medium=member_desktop&rcm=ACoAADJnVPwBh-8LoV25AQVeclIBTKNuOP6rr08)

	`My HuggingFace Username`: lucadipalma1998

	---

	## 🧩 What This App Does

	Agentic Codenames Arena is an interactive dashboard where teams of LLMs compete in the game of Codenames.
	Two team, Red and Blue, face off in a 4v4 setup, with each team composed of:

	* 1 Boss: Provides the clue and clue number for each turn.
	* 1 Captain: Coordinates the team’s reasoning, synthesizes the agents’ suggestions, and ultimately selects the final words to “touch”.
	* 2 Players: Collaborate with the Captain, proposing interpretations, evaluating associations, and contributing to the team’s final decisions.


	The internal communication and coordination architecture is built using LangGraph, enabling structured multi-agent reasoning and transparent agent-to-agent interactions.

	Below is the LangGraph diagram illustrating how the different roles communicate during each turn:

	![LangGraph Architecture](graph.png)

	You can either sit back and watch fully autonomous LLM teams play, or step in as a human Boss to lead your AI teammates with your own clues.

	---

	## 🤖 How It Works

	### LLM Teams

	Build teams from several providers: OpenAI, Google, Anthropic, HuggingFace...
	Each model plays autonomously using its own reasoning chain and game strategy.

	### Two Gameplay Modalities

	#### 1️⃣ Observation Mode — Watch AIs Battle

	Sit back and spectate.
	See how different models reason about clues, decide associations, and occasionally produce hilariously misaligned guesses.

	You'll see:

	* Model-to-model conversations
	* Reasoning traces
	* Turn-by-turn decisions
	* How each team coordinates across multiple rounds

	Perfect for AI benchmarking, research, or just entertainment.

	#### 2️⃣ Human Boss Mode — Enter the Fight

	Become the Boss for either team and give your own clue + number.
	Your AI teammates will interpret your hint and take their guesses.

	---

	## 🧠 Why It’s Interesting

	* Compare LLM reasoning styles:
	Watch how different models interpret associations, analogies, and subtle semantic cues.

	* Analyze team dynamics:
	Some models coordinate beautifully. Others… not so much.
	Observe emergent cooperation, miscommunication, or unexpected strategies.

	* Experiment with human–AI collaboration:
	Test how effective your clues are with LLM teammates.
	Try pushing the limits with creative, cryptic, or minimalist hints.

	---

	## 🕹️ Main Features

	* Build teams by selecting providers or choose `random` to generate a mixed-model team.
	* Switch between AI vs AI and Human vs AI modes
	* Detailed per-turn logs for all model decisions
	* Transparent reasoning chains
	* Interactive UI for watching matches play out
	* Match history & analytics dashboard

	---

	## 📊 Stats & Analytics

	All games played in the Arena are stored in a database.
	The Stats section of the app includes:

	* Model win/loss rates across all recorded matches
	* Performance comparisons between model families (OpenAI vs Google vs …)
	* Historical match logs for replay & analysis
	* Leaderboards highlighting the best-performing models

	This turns the Arena into a dynamic benchmarking tool for evaluating LLM semantic reasoning, coordination abilities, and reliability under pressure.

	---
	<a id="how-to-play"></a>
	## ❓ How to Play


	### 📝 Summary

	Codenames is a word-association game where two teams compete to guess all their secret words before the opponents do. Each team has a Boss who can see a hidden color-coded board showing which words belong to their team, which belong to the other team, which are neutral, and which single word is the deadly assassin. The Boss gives one-word clues paired with a number, hinting at how many words on the board relate to that clue. Their teammates, who cannot see any colors, must discuss, interpret the clue, and decide which words the Boss is pointing toward. Choosing their own words brings them closer to victory, while accidentally selecting an opponent’s word, a neutral word, or the assassin can derail their progress or end the game instantly. The goal is simple: interpret clues wisely, avoid dangerous words, and be the first team to uncover all your hidden words.

	### 💡 Let's see an example

	What Bosses see (above) VS what other players see (below)

	<img src="assets/example.png" alt="Example board" width="400">
	<img src="assets/no-color-board.png" alt="Example board" width="400">

	### 👥 Team Roles

	Each team has four members with distinct responsibilities:

	* 1 Boss 🎯: The only player who can see the color-coded board. Provides clues to guide the team.
	* 1 Captain 🧭: Coordinates team reasoning, synthesizes suggestions, and makes final word selections.
	* 2 Players 💭: Collaborate with the Captain, propose interpretations and associations.

	---

	### 🎮 How a Turn Works

	#### 1️⃣ Boss Gives a Clue

	The Red Boss (seeing the board) might say:

	> "Atmosphere: 2"

	This clue suggests 2 red words are related to atmosphere. Looking at the board, the Boss is thinking of:

	* AIR (part of the atmosphere)
	* SPACE (beyond the atmosphere)

	⚠️ Important: The clue must be ONE word and ONE number. The number indicates how many words relate to that clue.

	---

	#### 2️⃣ Team Discussion

	The Captain and Players discuss without seeing the colors:

	* Player 1: “AIR feels like the safest bet — it's literally the atmosphere.”
	* Player 2: “SPACE could connect because it's outside the atmosphere.”

	---

	#### 3️⃣ Captain Makes Final Selection

	The Captain decides which words to touch, in order:

	1. AIR ✅ (Red — Correct!)
	2. SPACE ✅ (Red — Correct!)

	The team can stop after any correct guess or continue up to the number given (+1 bonus from previous turns if applicable).

	---

	### ⚠️ Mistakes to Avoid

	* Guessing STAFF (black — killer word) ends the game immediately. They lose!
	* Guessing WALL (blue — opponent’s word) ends the turn and gives that word to the Blue team.
	* Guessing SATURN (beige — neutral) simply ends the turn.

	---

	### 🏆 Winning the Game

	The game ends when:

	* ✅ A team finds all their colored words → That team wins!
	* ❌ A team touches the killer word (STAFF) → That team loses immediately!

	---

	### 💡 Strategy Tips

	#### For the Boss:

	* Try to link multiple words with creative clues
	* Avoid clues that may lead to the killer or opponent’s words
	* Consider associations your team might make

	#### For Captain & Players:

	* Discuss all possible interpretations
	* Consider risky words
	* Don’t be afraid to stop early to avoid the killer word
	* The Captain has final say but should consider all suggestions

	---

	## 🤝 Sponsors

	Thank you to Google, Anthropic, OpenAI, HuggingFace, ElevenLabs for sponsoring the Hackathon.