Spaces:
Sleeping
Sleeping
| title: Funny Image Captioner | |
| emoji: 🚀 | |
| colorFrom: pink | |
| colorTo: gray | |
| sdk: gradio | |
| sdk_version: 5.22.0 | |
| app_file: app.py | |
| pinned: true | |
| short_description: App that gives funny descriptions of images | |
| # Fun Image Caption | |
| A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models. | |
| ## Description | |
| This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun. | |
| ## Features | |
| - Upload any image for captioning | |
| - Choose from multiple voice personas: | |
| - Scurvy-ridden pirate | |
| - Forgetful wizard | |
| - Sarcastic teenager | |
| - Two-step LangGraph workflow: | |
| - Image captioning with vision-language model | |
| - Creative voice-based description | |
| - Built on efficient 4-bit quantized models for ZeroGPU environments | |
| ## Useful Poetry Commands | |
| - Show all installed packages: `poetry show` | |
| - Show detailed info about a specific package: `poetry show <package>` | |
| - Show package location and details: `poetry show -v <package>` | |
| - List virtual environments: `poetry env list` | |
| - Show current environment info: `poetry env info` | |
| - Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt` | |
| ## Requirements | |
| - Python 3.10+ | |
| - Poetry (Python package manager) | |
| - Git | |
| - CUDA-compatible GPU | |
| ## Installation | |
| 1. Install Poetry if you haven't already: | |
| ```bash | |
| curl -sSL https://install.python-poetry.org | python3 - | |
| ``` | |
| 2. Clone the repository: | |
| ```bash | |
| git clone https://github.com/yourusername/fun-image-caption.git | |
| cd fun-image-caption | |
| ``` | |
| 3. Create and activate a new Poetry environment: | |
| ```bash | |
| poetry env use python3.10 | |
| poetry shell | |
| ``` | |
| 4. Install dependencies: | |
| ```bash | |
| poetry install | |
| ``` | |
| 5. Verify installation: | |
| ```bash | |
| poetry show | |
| ``` | |
| ## Install Huggingface hub for CLI commands | |
| ```bash | |
| pip install huggingface_hub | |
| huggingface-cli login | |
| ``` | |
| ## Key Dependencies | |
| - accelerate==1.2.1: Framework for efficient model deployment | |
| - bitsandbytes==0.41.3.post2: Quantization library for model optimization | |
| - torch==2.4.0: PyTorch for ML operations | |
| - transformers==4.49.0: Hugging Face transformers library | |
| - gradio: Web interface framework | |
| - langgraph: Workflow orchestration for language model pipelines | |
| - pillow: Python Imaging Library | |
| ## Usage | |
| 1. Run the application: | |
| ```bash | |
| python app.py | |
| ``` | |
| 2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860) | |
| 3. Upload an image using the interface | |
| 4. Select a voice persona from the dropdown menu | |
| 5. Click "Generate Description" to see the results | |
| 6. Enjoy your image description in the selected character voice! | |
| ## Models | |
| The application uses the following models: | |
| - Image Captioning: google/gemma-3-12b-vision (4-bit quantized) | |
| - Voice Description: google/gemma-3-12b (4-bit quantized) | |
| ## Author | |
| [Your name and contact information] | |
| ## License | |
| [License information to be added] | |