Zen Coder Flash ⚡

The Flagship Zen Coder Model

Overview

Zen Coder Flash is the flagship code-focused model in the Zen AI family. Built on GLM-4.7-Flash's cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.

Attribute	Value
Parameters	31B total / 3B active (MoE)
Context Length	131,072 tokens
Base Model	GLM-4.7-Flash
License	MIT
Languages	100+ programming languages

Why Zen Coder Flash?

59.2% SWE-bench vs 22% Qwen3-30B - nearly 3x better at real coding tasks
Efficient MoE: 31B params but only 3B active per token
131K context: Handle entire codebases in a single prompt
Native tool calling: Built-in function execution support
Reasoning mode: Extended chain-of-thought for complex problems

Performance

Benchmark	Score	vs Qwen3-30B
SWE-bench Verified	59.2%	+37.2% (2.7x)
AIME 2025	91.6%	+6.6%
GPQA	75.2%	+1.8%
τ²-Bench	79.5%	+30.5%

Zen Coder Family

Tier	Model	Parameters	Active	Use Case
Small	zen-coder-4b	4B	4B	Edge/mobile
Flagship	zen-coder-flash	31B MoE	3B	Balanced
Max	zen-max	671B MoE	14B	Frontier

Quick Start

Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "zenlm/zen-coder-flash"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(response)

vLLM (Recommended for Production)

vllm serve zenlm/zen-coder-flash \
    --tensor-parallel-size 4 \
    --speculative-config.method mtp \
    --speculative-config.num_speculative_tokens 1 \
    --tool-call-parser glm47 \
    --reasoning-parser glm45 \
    --enable-auto-tool-choice

SGLang

python -m sglang.launch_server \
    --model-path zenlm/zen-coder-flash \
    --tp-size 4 \
    --tool-call-parser glm47 \
    --reasoning-parser glm45 \
    --speculative-algorithm EAGLE \
    --speculative-num-steps 3

MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("zenlm/zen-coder-flash")
response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
print(response)

Capabilities

Code Generation

100+ programming languages
Framework-aware completions
Test generation
Documentation generation

Debugging & Analysis

Bug detection and fixes
Code review
Performance optimization
Security analysis

Software Engineering

Architecture design
API design
Refactoring suggestions
Migration assistance

Tool Calling

# Native function calling support
tools = [
    {
        "type": "function",
        "function": {
            "name": "run_tests",
            "description": "Run test suite",
            "parameters": {"type": "object", "properties": {}}
        }
    }
]

Identity

I am Zen Coder Flash, the flagship code-focused model in the Zen AI family. I combine GLM-4.7's cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.

Training

Zen Coder Flash is built through identity fine-tuning on GLM-4.7-Flash using MLX LoRA on Apple Silicon. The training emphasizes:

Zen identity and persona
Code-focused instruction following
Tool calling capabilities
Extended reasoning patterns

Citation

@misc{zen-coder-flash-2025,
  title={Zen Coder Flash: Efficient Frontier Code Generation},
  author={Hanzo AI},
  year={2025},
  url={https://huggingface.co/zenlm/zen-coder-flash}
}

License

MIT License - inherited from GLM-4.7-Flash base model.

Zen AI: Clarity Through Intelligence

Downloads last month: 110

Safetensors

Model size

30B params

Tensor type

BF16

U32

F32

Model tree for zenlm/zen-coder-flash

Base model

zai-org/GLM-4.7-Flash

Quantized

(57)

this model

Space using zenlm/zen-coder-flash 1

Evaluation results

SWE-bench Verified on SWE-bench Verified
self-reported

59.200
AIME 2025 on AIME 2025
self-reported

91.600

zenlm
/

zen-coder-flash