HuggingFaceTB
/

SmolLM-135M-Instruct

Text Generation

alignment-handbook

text-generation-inference

Model card Files Files and versions

loubnabnl HF Staff commited on Aug 17, 2024

Commit

cd22213

·

verified ·

1 Parent(s): 6e6cf67

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -33,10 +33,11 @@ To build SmolLM-Instruct, we finetune the base models on publicly available data
 ## Changelog
 |Release|Description|
 |-|-|
-|v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the WebInstructSub dataset, combined with StarCoder2-Self-OSS-Instruct. Then, we perform DPO (Direct Preference Optimization) for one epoch on HelpSteer for the 135M and 1.7B models, and argilla/dpo-mix-7k for the 360M model.|
-|v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtere](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [self-oss-instruct-sc2-exec-filter-50k](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
 ## Usage

 ## Changelog
 |Release|Description|
 |-|-|
+|v0.1| Initial release of SmolLM-Instruct. We finetune on the permissive subset of the [WebInstructSub](https://huggingface.co/datasets/TIGER-Lab/WebInstructSub) dataset, combined with [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k). Then, we perform DPO (Direct Preference Optimization) for one epoch on [HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer) for the 135M and 1.7B models, and [argilla/dpo-mix-7k](https://huggingface.co/datasets/argilla/dpo-mix-7k) for the 360M model.|
+|v0.2| We changed the finetuning mix to datasets more suitable for smol models. We train on a new dataset of 2k simple everyday conversations we generated by llama3.1-70B [everyday-conversations-llama3.1-2k](https://huggingface.co/datasets/HuggingFaceTB/everyday-conversations-llama3.1-2k/), [Magpie-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Pro-300K-Filtered), [StarCoder2-Self-OSS-Instruct](https://huggingface.co/datasets/bigcode/self-oss-instruct-sc2-exec-filter-50k), and a small subset of [OpenHermes-2.5](https://huggingface.co/datasets/teknium/OpenHermes-2.5)|
 ## Usage