Spaces:
Sleeping
Sleeping
| title: Token Attention Viewer | |
| emoji: 📈 | |
| colorFrom: gray | |
| colorTo: pink | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Interactive visualization of attention weights in LLMs word- | |
| # Token-Attention-Viewer | |
| Token Attention Viewer is an interactive Gradio app that visualizes the self-attention weights inside transformer language models for every generated token. It helps researchers, students, and developers explore how models like GPT-2 or LLaMA focus on different parts of the input as they generate text. | |
| # Word-Level Attention Visualizer (Gradio) | |
| An interactive Gradio app to **generate text with a causal language model** and **visualize attention word-by-word**. | |
| Each word in the generated continuation is shown like a paragraph; the **background opacity** behind a word reflects the **sum of attention weights** that the selected (query) word assigns to the context. You can also switch between many popular Hugging Face models. | |
| --- | |
| ## ✨ What the app does | |
| * **Generate** a continuation from your prompt using a selected causal LM (GPT-2, OPT, Mistral, etc.). | |
| * **Select a generated word** to inspect. | |
| * **Visualize attention** as a semi-transparent background behind words (no plots/libraries like matplotlib). | |
| * **Mean across layers/heads** or inspect a specific layer/head. | |
| * **Proper detokenization** to real words (regex-based) and **EOS tokens are stripped** (no `<|endoftext|>` clutter). | |
| * **Paragraph wrapping**: words wrap to new lines automatically inside the box. | |
| --- | |
| ## 🚀 Quickstart | |
| ### 1) Clone | |
| ```bash | |
| git clone https://github.com/devMuniz02/Token-Attention-Viewer | |
| cd Token-Attention-Viewer | |
| ``` | |
| ### 2) (Optional) Create a virtual environment | |
| **Windows (PowerShell):** | |
| ```powershell | |
| python -m venv venv | |
| .\venv\Scripts\Activate.ps1 | |
| ``` | |
| **macOS / Linux (bash/zsh):** | |
| ```bash | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| ``` | |
| ### 3) Install requirements | |
| Install: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### 4) Run the app | |
| ```bash | |
| python app.py | |
| ``` | |
| You should see Gradio report a local URL similar to: | |
| ``` | |
| Running on local URL: http://127.0.0.1:7860 | |
| ``` | |
| ### 5) Open in your browser | |
| Open the printed URL (default `http://127.0.0.1:7860`) in your browser. | |
| --- | |
| ## 🧭 How to use | |
| 1. **Model**: pick a model from the dropdown and click **Load / Switch Model**. | |
| * Small models (e.g., `distilgpt2`, `gpt2`) run on CPU. | |
| * Larger models (e.g., `mistralai/Mistral-7B-v0.1`) generally need a GPU with enough VRAM. | |
| 2. **Prompt**: enter your starting text. | |
| 3. **Generate**: click **Generate** to produce a continuation. | |
| 4. **Inspect**: select any **generated word** (radio buttons). | |
| * The paragraph box highlights where that word attends. | |
| * Toggle **Mean Across Layers/Heads** or choose a specific **layer/head**. | |
| 5. Repeat with different models or prompts. | |
| --- | |
| ## 🧩 Files | |
| * `app.py` — Gradio application (UI + model loading + attention visualization). | |
| * `requirements.txt` — Python dependencies (see above). | |
| * `README.md` — this file. | |
| --- | |
| ## 🛠️ Troubleshooting | |
| * **Radio/choices error**: If you switch models and see a Gradio “value not in choices” error, ensure the app resets the radio with `value=None` (the included code already does this). | |
| * **`<|endoftext|>` shows up**: The app strips **trailing** special tokens from the generated segment, so EOS shouldn’t appear. If you still see it in the middle, your model truly generated it as a token. | |
| * **OOM / model too large**: | |
| * Try a smaller model (`distilgpt2`, `gpt2`, `facebook/opt-125m`). | |
| * Reduce `Max New Tokens`. | |
| * Use CPU for smaller models or a GPU with more VRAM for bigger ones. | |
| * **Slow generation**: Smaller models or CPU mode will be slower; consider using GPU and the `accelerate` package. | |
| * **Missing tokenizer pad token**: The app sets `pad_token_id = eos_token_id` automatically when needed. | |
| --- | |
| ## 🔒 Access-gated models | |
| Some families (e.g., **LLaMA**, **Gemma**) require you to accept licenses or request access on Hugging Face. Make sure your Hugging Face account has access before trying to load those models. | |
| --- | |
| ## 📣 Acknowledgments | |
| * Built with [Gradio](https://www.gradio.app/) and [Hugging Face Transformers](https://huggingface.co/docs/transformers). | |
| * Attention visualization inspired by standard causal LM attention tensors available from `generate(output_attentions=True)`. |