fun-image-caption

Sleeping

App Files Files Community

fun-image-caption / README.md

krsnewwave

Update README.md

e55e742 verified 9 months ago

preview code

raw

history blame contribute delete

3.04 kB

	---
	title: Funny Image Captioner
	emoji: 🚀
	colorFrom: pink
	colorTo: gray
	sdk: gradio
	sdk_version: 5.22.0
	app_file: app.py
	pinned: true
	short_description: App that gives funny descriptions of images
	---

	# Fun Image Caption

	A delightful app that captions your images through the voice of unique characters. Built with Gradio, LangGraph, and Hugging Face models.

	## Description

	This project creates an interactive AI application that captions and describes images in entertaining character voices. It combines modern vision-language models with a user-friendly interface to make image descriptions more engaging and fun.

	## Features

	- Upload any image for captioning
	- Choose from multiple voice personas:
	- Scurvy-ridden pirate
	- Forgetful wizard
	- Sarcastic teenager
	- Two-step LangGraph workflow:
	- Image captioning with vision-language model
	- Creative voice-based description
	- Built on efficient 4-bit quantized models for ZeroGPU environments

	## Useful Poetry Commands

	- Show all installed packages: `poetry show`
	- Show detailed info about a specific package: `poetry show <package>`
	- Show package location and details: `poetry show -v <package>`
	- List virtual environments: `poetry env list`
	- Show current environment info: `poetry env info`
	- Export dependencies to requirements.txt: `uv pip compile pyproject.toml -o requirements.txt`

	## Requirements

	- Python 3.10+
	- Poetry (Python package manager)
	- Git
	- CUDA-compatible GPU

	## Installation

	1. Install Poetry if you haven't already:
	```bash
	curl -sSL https://install.python-poetry.org \| python3 -
	```

	2. Clone the repository:
	```bash
	git clone https://github.com/yourusername/fun-image-caption.git
	cd fun-image-caption
	```

	3. Create and activate a new Poetry environment:
	```bash
	poetry env use python3.10
	poetry shell
	```

	4. Install dependencies:
	```bash
	poetry install
	```

	5. Verify installation:
	```bash
	poetry show
	```

	## Install Huggingface hub for CLI commands
	```bash
	pip install huggingface_hub

	huggingface-cli login
	```

	## Key Dependencies

	- accelerate==1.2.1: Framework for efficient model deployment
	- bitsandbytes==0.41.3.post2: Quantization library for model optimization
	- torch==2.4.0: PyTorch for ML operations
	- transformers==4.49.0: Hugging Face transformers library
	- gradio: Web interface framework
	- langgraph: Workflow orchestration for language model pipelines
	- pillow: Python Imaging Library

	## Usage

	1. Run the application:
	```bash
	python app.py
	```

	2. Open your browser and navigate to the provided URL (typically http://127.0.0.1:7860)

	3. Upload an image using the interface

	4. Select a voice persona from the dropdown menu

	5. Click "Generate Description" to see the results

	6. Enjoy your image description in the selected character voice!

	## Models

	The application uses the following models:
	- Image Captioning: google/gemma-3-12b-vision (4-bit quantized)
	- Voice Description: google/gemma-3-12b (4-bit quantized)

	## Author

	[Your name and contact information]

	## License

	[License information to be added]