Coffee Bean Detection - Fine-tuned Mask R-CNN
Model Details
Model Description
This is a fine-tuned Mask R-CNN model specialized for detecting and segmenting individual coffee beans in images. The model performs instance segmentation, providing precise pixel-level masks and bounding boxes for each detected coffee bean.
- Developed by: Mark Kunitomi
- Model type: Instance Segmentation (Mask R-CNN)
- Architecture: ResNet-50 FPN backbone
- License: apache-2.0
Example Results
Left: Original green coffee beans image. Right: Detection and segmentation results showing individual bean masks with confidence scores.
How to Get Started with the Model
Quick Start with Jupyter Notebook
A complete inference notebook (inference_demo.ipynb) is included with this model.
Command Line Interface
For quick inference, use the included predict_beans.py script:
# Install dependencies
pip install -r requirements.txt
# Basic inference on a single image
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors --images your_image.jpg
# Process multiple images with custom settings
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors \
--images *.jpg \
--confidence 0.5 \
--nms_threshold 0.3 \
--smooth_polygons \
--filter_edge_beans \
--output_dir results \
--export_format coco
# For all available options
python predict_beans.py --help
Key CLI Options:
--confidence: Detection confidence threshold (default: 0.5)--nms_threshold: Non-maximum suppression threshold (default: 0.3)--smooth_polygons: Apply polygon smoothing for cleaner masks--filter_edge_beans: Remove beans touching image edges--export_format: Output format (json, coco, labelme, all)
Recommendations
- Use confidence threshold of 0.5 or higher for production applications
- Validate on your specific bean varieties before deployment
- Consider additional fine-tuning for specialized use cases
- Implement human verification for critical applications
Training Details
Training Data
The model was trained on a custom dataset of coffee bean images:
- 128 training images with detailed COCO-format annotations
- Multiple coffee varieties and roast levels
- Various lighting conditions and backgrounds
- Manually annotated polygon masks for each bean
- Data augmentation: rotation, scaling, color jittering, horizontal/vertical flips
Preprocessing
- Images resized to maintain aspect ratio
- Normalization with ImageNet statistics
- Random augmentations during training
Evaluation
Testing Data
Evaluated on a held-out validation set:
- 4,952 ground truth bean instances
- Diverse bean arrangements and densities
- Various roast levels and lighting conditions
Metrics
| Metric | Value |
|---|---|
| Precision | 99.92% |
| Recall | 96.71% |
| Average IoU | 90.93% |
| Detection Rate | 96.71% |
| Average Confidence | 99.82% |
| Mask Loss | 0.1333 |
| Validation Loss | 0.2464 |
Technical Specifications
Model Architecture and Objective
- Architecture: Mask R-CNN with ResNet-50 Feature Pyramid Network
- Input: RGB images (any size)
- Output: Instance masks, bounding boxes, class labels, confidence scores
- Objective: Minimize combined classification, box regression, and mask segmentation losses
- Model Size: 176.1 MB (SafeTensors format)
Compute Infrastructure
Hardware
- CPU: Mac Mini M2 with 8GB RAM
- Training time: ~2 hours for fine-tuning
Software
- PyTorch 2.0+
- TorchVision 0.15+
- CUDA 11.8+
- Python 3.8+
- Downloads last month
- 6
Space using Kunitomi/coffee-bean-maskrcnn 1
Evaluation results
- Precision on Coffee Bean Datasetself-reported99.920
- Recall on Coffee Bean Datasetself-reported96.710
- Average IoU on Coffee Bean Datasetself-reported90.930
- Detection Rate on Coffee Bean Datasetself-reported96.710