See axolotl config

axolotl version: 0.13.0.dev0

adapter: qlora
base_model: Qwen/Qwen3-4B
bf16: false
dataset_prepared_path: null
datasets:
- path: zypchn/MedQuAD-TR-alpaca
  type: alpaca
debug: null
deepspeed: null
early_stopping_patience: null
eval_steps: 0.01
flash_attention: null
fp16: true
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 1
gradient_checkpointing: true
group_by_length: false
hub_model_id: TurkishMedQwen3-4B-v1
is_llama_derived_model: false
learning_rate: 0.0002
load_in_4bit: true
load_in_8bit: false
local_rank: null
logging_steps: 1
lora_alpha: 16
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lora_target_modules: null
lr_scheduler: cosine
micro_batch_size: 2
model_type: AutoModelForCausalLM
num_epochs: 5
optimizer: paged_adamw_32bit
output_dir: ./qlora-out
pad_to_sequence_len: true
resume_from_checkpoint: null
sample_packing: true
save_steps: null
save_strategy: epoch
sequence_len: 4096
special_tokens:
  eos_token: <|im_end|>
  pad_token: <|endoftext|>
strict: false
tf32: false
tokenizer_type: Qwen2Tokenizer
train_on_inputs: false
val_set_size: 0.02
wandb_entity: null
wandb_log_model: null
wandb_project: null
wandb_run_id: null
wandb_watch: null
warmup_steps: 100
weight_decay: 0.0
xformers_attention: null

TurkishMedQwen3-4B-v1

This model is a fine-tuned version of Qwen/Qwen3-4B on the zypchn/MedQuAD-TR-alpaca dataset. It achieves the following results on the evaluation set:

Loss: 1.4742
Memory/max Mem Active(gib): 18.36
Memory/max Mem Allocated(gib): 18.36
Memory/device Mem Reserved(gib): 21.79

Model description

More information needed

Intended uses & limitations

Intended Use: Medical Question-Answering
Limitations: Please be aware that the model may not generate well for some instances due to the quality of the train dataset.

Training and evaluation data

Real patient-doctor interactions scraped from a website. These QA pairs were then re-constructed to Alpaca format.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 530
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	4.6851	14.51	14.51	15.64
3.8213	0.2243	24	3.9841	18.36	18.36	21.79
2.8934	0.4486	48	3.0245	18.36	18.36	21.79
2.4021	0.6729	72	2.2597	18.36	18.36	21.79
1.7734	0.8972	96	1.8842	18.36	18.36	21.79
1.6769	1.1308	120	1.7202	18.36	18.36	21.79
1.6458	1.3551	144	1.6444	18.36	18.36	21.79
1.5338	1.5794	168	1.5903	18.36	18.36	21.79
1.5638	1.8037	192	1.5541	18.36	18.36	21.79
1.3773	2.0374	216	1.5338	18.36	18.36	21.79
1.4174	2.2617	240	1.5257	18.36	18.36	21.79
1.4269	2.4860	264	1.5171	18.36	18.36	21.79
1.3517	2.7103	288	1.4938	18.36	18.36	21.79
1.2708	2.9346	312	1.4792	18.36	18.36	21.79
1.0778	3.1682	336	1.4886	18.36	18.36	21.79
1.2576	3.3925	360	1.4828	18.36	18.36	21.79
1.3479	3.6168	384	1.4776	18.36	18.36	21.79
1.2556	3.8411	408	1.4704	18.36	18.36	21.79
1.108	4.0654	432	1.4710	18.36	18.36	21.79
1.2222	4.2897	456	1.4744	18.36	18.36	21.79
1.1457	4.5140	480	1.4743	18.36	18.36	21.79
1.1319	4.7383	504	1.4746	18.36	18.36	21.79
1.3381	4.9626	528	1.4742	18.36	18.36	21.79