Coffee Bean Detection - Fine-tuned Mask R-CNN

Model Details

Model Description

This is a fine-tuned Mask R-CNN model specialized for detecting and segmenting individual coffee beans in images. The model performs instance segmentation, providing precise pixel-level masks and bounding boxes for each detected coffee bean.

  • Developed by: Mark Kunitomi
  • Model type: Instance Segmentation (Mask R-CNN)
  • Architecture: ResNet-50 FPN backbone
  • License: apache-2.0

Example Results

Left: Original green coffee beans image. Right: Detection and segmentation results showing individual bean masks with confidence scores.

How to Get Started with the Model

Quick Start with Jupyter Notebook

A complete inference notebook (inference_demo.ipynb) is included with this model.

Command Line Interface

For quick inference, use the included predict_beans.py script:

# Install dependencies
pip install -r requirements.txt

# Basic inference on a single image
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors --images your_image.jpg

# Process multiple images with custom settings
python predict_beans.py --model maskrcnn_coffeebeans_v1.safetensors \
  --images *.jpg \
  --confidence 0.5 \
  --nms_threshold 0.3 \
  --smooth_polygons \
  --filter_edge_beans \
  --output_dir results \
  --export_format coco

# For all available options
python predict_beans.py --help

Key CLI Options:

  • --confidence: Detection confidence threshold (default: 0.5)
  • --nms_threshold: Non-maximum suppression threshold (default: 0.3)
  • --smooth_polygons: Apply polygon smoothing for cleaner masks
  • --filter_edge_beans: Remove beans touching image edges
  • --export_format: Output format (json, coco, labelme, all)

Recommendations

  • Use confidence threshold of 0.5 or higher for production applications
  • Validate on your specific bean varieties before deployment
  • Consider additional fine-tuning for specialized use cases
  • Implement human verification for critical applications

Training Details

Training Data

The model was trained on a custom dataset of coffee bean images:

  • 128 training images with detailed COCO-format annotations
  • Multiple coffee varieties and roast levels
  • Various lighting conditions and backgrounds
  • Manually annotated polygon masks for each bean
  • Data augmentation: rotation, scaling, color jittering, horizontal/vertical flips

Preprocessing

  • Images resized to maintain aspect ratio
  • Normalization with ImageNet statistics
  • Random augmentations during training

Evaluation

Testing Data

Evaluated on a held-out validation set:

  • 4,952 ground truth bean instances
  • Diverse bean arrangements and densities
  • Various roast levels and lighting conditions

Metrics

Metric Value
Precision 99.92%
Recall 96.71%
Average IoU 90.93%
Detection Rate 96.71%
Average Confidence 99.82%
Mask Loss 0.1333
Validation Loss 0.2464

Technical Specifications

Model Architecture and Objective

  • Architecture: Mask R-CNN with ResNet-50 Feature Pyramid Network
  • Input: RGB images (any size)
  • Output: Instance masks, bounding boxes, class labels, confidence scores
  • Objective: Minimize combined classification, box regression, and mask segmentation losses
  • Model Size: 176.1 MB (SafeTensors format)

Compute Infrastructure

Hardware

  • CPU: Mac Mini M2 with 8GB RAM
  • Training time: ~2 hours for fine-tuning

Software

  • PyTorch 2.0+
  • TorchVision 0.15+
  • CUDA 11.8+
  • Python 3.8+
Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using Kunitomi/coffee-bean-maskrcnn 1

Evaluation results