--- title: Zephyr 7B CyberSecurity Trainer emoji: 🔐 colorFrom: red colorTo: yellow sdk: docker app_file: train.py pinned: false license: mit --- # Zephyr 7B CyberSecurity Fine-tuning Fine-tuning Zephyr 7B on a curated collection of cybersecurity datasets. ## Overview This project fine-tunes the **Zephyr 7B** model on 18 cybersecurity-focused datasets from the [thelordofweb CyberSecurity collection](https://huggingface.co/collections/thelordofweb/cybersecurity-dataset-6869079fc8cd15bfb8bb02a1), creating a specialized model for cybersecurity tasks. ## Datasets Included - AlicanKiraz0/All-CVE-Records-Training-Dataset - AlicanKiraz0/Cybersecurity-Dataset-v1 - Bouquets/Cybersecurity-LLM-CVE - CyberNative/CyberSecurityEval - Mohabahmed03/Alpaca_Dataset_CyberSecurity_Smaller - CyberNative/github_cybersecurity_READMEs - AlicanKiraz0/Cybersecurity-Dataset-Heimdall-v1.1 - jcordon5/cybersecurity-rules - Bouquets/DeepSeek-V3-Distill-Cybersecurity-en - Seerene/cybersecurity_dataset - ahmedds10/finetuning_alpaca_Cybersecurity - Tiamz/cybersecurity-instruction-dataset - OhWayTee/Cybersecurity-News_3 - Trendyol/All-CVE-Chat-MultiTurn-1999-2025-Dataset - Vanessasml/cyber-reports-news-analysis-llama2-3k - Vanessasml/cybersecurity_32k_instruction_input_output - Vanessasml/enisa_cyber_news_dataset - Trendyol/Trendyol-Cybersecurity-Instruction-Tuning-Dataset ## Training Configuration - **Base Model**: HuggingFaceH4/zephyr-7b-beta - **Method**: QLoRA (4-bit quantization) - **LoRA Config**: r=16, alpha=32 - **Epochs**: 3 - **Batch Size**: 4 (per device) - **Gradient Accumulation**: 4 steps - **Learning Rate**: 2e-4 - **Optimizer**: paged_adamw_8bit ## Running on Hugging Face Spaces This training script is designed to run on Hugging Face Spaces with GPU support. ### Requirements - Hugging Face Space with GPU (A100 recommended) - Write access token ### Setup 1. Create a new Space with GPU support 2. Upload all files from this directory 3. Set your HF_TOKEN as a Space secret 4. Run the training script ## Output The fine-tuned model will be saved to: `Jcalemcg/zephyr-7b-cybersecurity-finetuned` ## License Follows the licensing of the base Zephyr 7B model and included datasets.