Coffee Bean Detection - Fine-tuned Mask R-CNN

Model Details

Model Description

This is a fine-tuned Mask R-CNN model specialized for detecting and segmenting individual coffee beans in images. The model performs instance segmentation, providing precise pixel-level masks and bounding boxes for each detected coffee bean.

Developed by: Mark Kunitomi
Model type: Instance Segmentation (Mask R-CNN)
Architecture: ResNet-50 FPN backbone
License: apache-2.0

Example Results

Left: Original green coffee beans image. Right: Detection and segmentation results showing individual bean masks with confidence scores.

How to Get Started with the Model

Quick Start with Jupyter Notebook

A complete inference notebook (inference_demo.ipynb) is included with this model.

Command Line Interface

For quick inference, use the included predict_beans.py script:

# Install dependencies
pip install -r requirements.txt

# Basic inference on a single image
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors --images your_image.jpg

# Process multiple images with custom settings
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors \
  --images *.jpg \
  --confidence 0.5 \
  --nms_threshold 0.3 \
  --smooth_polygons \
  --filter_edge_beans \
  --output_dir results \
  --export_format coco

# For all available options
python predict_beans.py --help

Key CLI Options:

--confidence: Detection confidence threshold (default: 0.5)
--nms_threshold: Non-maximum suppression threshold (default: 0.3)
--smooth_polygons: Apply polygon smoothing for cleaner masks
--filter_edge_beans: Remove beans touching image edges
--export_format: Output format (json, coco, labelme, all)

Recommendations

Use confidence threshold of 0.5 or higher for production applications
Validate on your specific bean varieties before deployment
Consider additional fine-tuning for specialized use cases
Implement human verification for critical applications

Training Details

Training Data

The model was trained on a custom dataset of coffee bean images:

128 training images with detailed COCO-format annotations
Multiple coffee varieties and roast levels
Various lighting conditions and backgrounds
Manually annotated polygon masks for each bean
Data augmentation: rotation, scaling, color jittering, horizontal/vertical flips

Preprocessing

Images resized to maintain aspect ratio
Normalization with ImageNet statistics
Random augmentations during training

Evaluation

Testing Data

Evaluated on a held-out validation set:

4,952 ground truth bean instances
Diverse bean arrangements and densities
Various roast levels and lighting conditions

Metrics

Metric	Value
Precision	99.92%
Recall	96.71%
Average IoU	90.93%
Detection Rate	96.71%
Average Confidence	99.82%
Mask Loss	0.1333
Validation Loss	0.2464

Technical Specifications

Model Architecture and Objective

Architecture: Mask R-CNN with ResNet-50 Feature Pyramid Network
Input: RGB images (any size)
Output: Instance masks, bounding boxes, class labels, confidence scores
Objective: Minimize combined classification, box regression, and mask segmentation losses
Model Size: 176.1 MB (SafeTensors format)

Compute Infrastructure

Hardware

CPU: Mac Mini M2 with 8GB RAM
Training time: ~2 hours for fine-tuning

Software

PyTorch 2.0+
TorchVision 0.15+
CUDA 11.8+
Python 3.8+

Downloads last month: 6

Space using Kunitomi/coffee-bean-maskrcnn 1

Evaluation results

Precision on Coffee Bean Dataset
self-reported

99.920
Recall on Coffee Bean Dataset
self-reported

96.710
Average IoU on Coffee Bean Dataset
self-reported

90.930
Detection Rate on Coffee Bean Dataset
self-reported

96.710