---
title: Zephyr 7B CyberSecurity Trainer
emoji: 🔐
colorFrom: red
colorTo: yellow
sdk: docker
app_file: train.py
pinned: false
license: mit
---

# Zephyr 7B CyberSecurity Fine-tuning

Fine-tuning Zephyr 7B on a curated collection of cybersecurity datasets.

## Overview

This project fine-tunes the **Zephyr 7B** model on 18 cybersecurity-focused datasets from the [thelordofweb CyberSecurity collection](https://huggingface.co/collections/thelordofweb/cybersecurity-dataset-6869079fc8cd15bfb8bb02a1), creating a specialized model for cybersecurity tasks.

## Datasets Included

- AlicanKiraz0/All-CVE-Records-Training-Dataset
- AlicanKiraz0/Cybersecurity-Dataset-v1
- Bouquets/Cybersecurity-LLM-CVE
- CyberNative/CyberSecurityEval
- Mohabahmed03/Alpaca_Dataset_CyberSecurity_Smaller
- CyberNative/github_cybersecurity_READMEs
- AlicanKiraz0/Cybersecurity-Dataset-Heimdall-v1.1
- jcordon5/cybersecurity-rules
- Bouquets/DeepSeek-V3-Distill-Cybersecurity-en
- Seerene/cybersecurity_dataset
- ahmedds10/finetuning_alpaca_Cybersecurity
- Tiamz/cybersecurity-instruction-dataset
- OhWayTee/Cybersecurity-News_3
- Trendyol/All-CVE-Chat-MultiTurn-1999-2025-Dataset
- Vanessasml/cyber-reports-news-analysis-llama2-3k
- Vanessasml/cybersecurity_32k_instruction_input_output
- Vanessasml/enisa_cyber_news_dataset
- Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset

## Training Configuration

- **Base Model**: HuggingFaceH4/zephyr-7b-beta
- **Method**: QLoRA (4-bit quantization)
- **LoRA Config**: r=16, alpha=32
- **Epochs**: 3
- **Batch Size**: 4 (per device)
- **Gradient Accumulation**: 4 steps
- **Learning Rate**: 2e-4
- **Optimizer**: paged_adamw_8bit

## Running on Hugging Face Spaces

This training script is designed to run on Hugging Face Spaces with GPU support.

### Requirements

- Hugging Face Space with GPU (A100 recommended)
- Write access token

### Setup

1. Create a new Space with GPU support
2. Upload all files from this directory
3. Set your HF_TOKEN as a Space secret
4. Run the training script

## Output

The fine-tuned model will be saved to: `Jcalemcg/zephyr-7b-cybersecurity-finetuned`

## License

Follows the licensing of the base Zephyr 7B model and included datasets.