| | --- |
| | license: other |
| | base_model: deepseek-ai/deepseek-coder-1.3b-base |
| | tags: |
| | - axolotl |
| | - generated_from_trainer |
| | model-index: |
| | - name: deepseek-coder-1.3b-typescript |
| | results: [] |
| | datasets: |
| | - bigcode/the-stack-dedup |
| | widget: |
| | - text: "class Person {\n constructor(public name:" |
| | example_title: "class" |
| | - text: "function quickSort" |
| | example_title: "function" |
| | --- |
| | |
| | <p align="center"> |
| | <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="codegpt-deepseek-typescript.png?raw=true"> |
| | </p> |
| | <p align="center"><a href="https://codegpt.co/">[CodeGPT.co]</a> | <a href="https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript">[🦙 Ollama]</a> | <a href="https://discord.gg/fKyyJX5pne">[Discord]</a> | <a href="https://marketplace.visualstudio.com/items?itemName=DanielSanMedium.dscodegpt">[VSCode Extension]</a> </p> |
| | <hr> |
| |
|
| | [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
| | <details><summary>See axolotl config</summary> |
| |
|
| | axolotl version: `0.3.0` |
| | ```yaml |
| | base_model: deepseek-ai/deepseek-coder-1.3b-base |
| | model_type: AutoModelForCausalLM |
| | trust_remote_code: true |
| | load_in_8bit: false |
| | load_in_4bit: false |
| | strict: false |
| | |
| | |
| | datasets: |
| | - path: CodeGPTPlus/typescript-0-500000-seq1024 |
| | type: completion |
| | field: text |
| | |
| | |
| | val_set_size: 0.001 |
| | output_dir: ./fft-out |
| | |
| | sequence_len: 1024 |
| | |
| | adapter: |
| | lora_model_dir: |
| | lora_r: |
| | lora_alpha: |
| | lora_dropout: |
| | lora_target_linear: |
| | lora_fan_in_fan_out: |
| | lora_modules_to_save: |
| | |
| | wandb_project: deepseek_1.3_fft |
| | wandb_entity: |
| | wandb_watch: |
| | wandb_name: aws_a10g |
| | wandb_log_model: end |
| | |
| | |
| | gradient_accumulation_steps: 2 |
| | micro_batch_size: 20 |
| | num_epochs: 1 |
| | optimizer: adamw_bnb_8bit |
| | adam_beta1: 0.9 |
| | adam_beta2: 0.999 |
| | adam_epsilon: 0.000001 |
| | max_grad_norm: 1.0 |
| | weight_decay: 0.1 |
| | lr_scheduler: cosine |
| | learning_rate: 0.00002 |
| | train_on_inputs: false |
| | group_by_length: false |
| | bf16: true |
| | fp16: false |
| | tf32: false |
| | gradient_checkpointing: true |
| | early_stopping_patience: |
| | resume_from_checkpoint: |
| | local_rank: |
| | logging_steps: 1 |
| | xformers_attention: |
| | flash_attention: true |
| | |
| | loss_watchdog_threshold: 5.0 |
| | loss_watchdog_patience: 3 |
| | |
| | hub_model_id: CodeGPTPlus/deepseek_coder_1.3b_typescript |
| | hub_strategy: every_save |
| | warmup_ratio: 0.01 |
| | evals_per_epoch: 20 |
| | saves_per_epoch: 3 |
| | debug: |
| | deepspeed: |
| | |
| | fsdp: |
| | fsdp_config: |
| | special_tokens: |
| | bos_token: "<|begin▁of▁sentence|>" |
| | eos_token: "<|end▁of▁sentence|>" |
| | pad_token: "<|end▁of▁sentence|>" |
| | ``` |
| |
|
| | </details><br> |
| |
|
| | # deepseek-coder-1.3b-typescript |
| |
|
| | CodeGPTPlus/deepseek-coder-1.3b-typescript, emerges as a fine-tuned iteration of [deepseek-ai/deepseek-coder-1.3b-base](https://huggingface.co/deepseek-ai/deepseek-coder-1.3b-base), meticulously crafted by the CodeGPT team to excel in generating expert code in TypeScript. With specific fine-tuning for TypeScript and a dataset of 0.5B tokens, this model excels in producing precise and efficient solutions in this programming language. |
| |
|
| | The 16K window size and an additional fill-in-the-middle task are employed to deliver project-level code completion. |
| |
|
| | This new model stands as the ideal choice for those seeking a specialized code generator for TypeScript, backed by the expertise of the CodeGPT team. |
| |
|
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.7681 |
| |
|
| | **Model Developers** CodeGPT Team |
| |
|
| | **Variations** 1.3B |
| |
|
| | **Input** Models input text only. |
| |
|
| | **Output** Models generate text only. |
| |
|
| | ## How to Use |
| | This model is for completion purposes only. Here give some examples of how to use the model. |
| |
|
| | #### Running the model on a GPU |
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | tokenizer = AutoTokenizer.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
| | trust_remote_code=True) |
| | model = AutoModelForCausalLM.from_pretrained("CodeGPTPlus/deepseek-coder-1.3b-typescript", |
| | trust_remote_code=True).cuda() |
| | |
| | input_text = """<|fim▁begin|>function quickSort(arr: number[]): number[] { |
| | if (arr.length <= 1) { |
| | return arr; |
| | } |
| | const pivot = arr[0]; |
| | const left = []; |
| | const right = []; |
| | <|fim▁hole|> |
| | return [...quickSort(left), pivot, ...quickSort(right)]; |
| | }<|fim▁end|>""" |
| | |
| | inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| | outputs = model.generate(**inputs, max_length=256) |
| | print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| | ``` |
| |
|
| | ### Running with Ollama |
| | **Model:** https://ollama.ai/codegpt/deepseek-coder-1.3b-typescript |
| |
|
| | ```ollama run codegpt/deepseek-coder-1.3b-typescript``` |
| |
|
| | ### Running with Ollama and CodeGPT Autocomplete in VSCode |
| |
|
| | **Documentation:** https://docs.codegpt.co/docs/tutorial-features/code_autocompletion |
| | |
| | Select "Ollama - codegpt/deepseek-coder-1.3b-typescript" in the autocomplete model selector. |
| | |
| | Then, write any code or comment in the vscode text editor, and the model will provide you with code suggestions through the CodeGPT code autocomplete. |
| | |
| | <img width="1000px" alt="CodeGPT: DeepSeek Coder - Typescript" src="ollama_autocomplete_codegpt.gif"> |
| | |
| | ### Fill In the Middle (FIM) |
| | ```python |
| | <|fim▁begin|>function quickSort(arr: number[]): number[] { |
| | if (arr.length <= 1) { |
| | return arr; |
| | } |
| | const pivot = arr[0]; |
| | const left = []; |
| | const right = []; |
| | <|fim▁hole|> |
| | return [...quickSort(left), pivot, ...quickSort(right)]; |
| | }<|fim▁end|> |
| | ``` |
| | |
| | ## Training procedure |
| | |
| | ### Training hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | - learning_rate: 2e-05 |
| | - train_batch_size: 20 |
| | - eval_batch_size: 20 |
| | - seed: 42 |
| | - gradient_accumulation_steps: 2 |
| | - total_train_batch_size: 40 |
| | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 |
| | - lr_scheduler_type: cosine |
| | - lr_scheduler_warmup_steps: 261 |
| | - num_epochs: 1 |
| | |
| | ### Training results |
| | |
| | | Training Loss | Epoch | Step | Validation Loss | |
| | |:-------------:|:-----:|:-----:|:---------------:| |
| | | 1.0745 | 0.0 | 1 | 0.8681 | |
| | | 1.2267 | 0.05 | 1308 | 0.8130 | |
| | | 1.1594 | 0.1 | 2616 | 0.8018 | |
| | | 0.7674 | 0.15 | 3924 | 0.7942 | |
| | | 0.6443 | 0.2 | 5232 | 0.7889 | |
| | | 0.9155 | 0.25 | 6540 | 0.7847 | |
| | | 0.7501 | 0.3 | 7848 | 0.7819 | |
| | | 0.8835 | 0.35 | 9156 | 0.7792 | |
| | | 0.7261 | 0.4 | 10464 | 0.7769 | |
| | | 0.9746 | 0.45 | 11772 | 0.7748 | |
| | | 0.6884 | 0.5 | 13080 | 0.7734 | |
| | | 0.6104 | 0.55 | 14388 | 0.7722 | |
| | | 0.8876 | 0.6 | 15696 | 0.7710 | |
| | | 0.9567 | 0.65 | 17004 | 0.7703 | |
| | | 0.6915 | 0.7 | 18312 | 0.7696 | |
| | | 0.8874 | 0.75 | 19620 | 0.7691 | |
| | | 0.6124 | 0.8 | 20928 | 0.7686 | |
| | | 0.8147 | 0.85 | 22236 | 0.7684 | |
| | | 0.8021 | 0.9 | 23544 | 0.7683 | |
| | | 0.8665 | 0.95 | 24852 | 0.7681 | |
| | |
| | |
| | ### Framework versions |
| | |
| | - Transformers 4.37.0.dev0 |
| | - Pytorch 2.0.1+cu118 |
| | - Datasets 2.16.1 |
| | - Tokenizers 0.15.0 |