AlphaMoE
Collection
Foundational model architecture for Mixture of Experts by aquif. State-of-the-art performance and economical training.
•
1 item
•
Updated
aquif-AlphaMoE is the first foundational model designed entirely by aquif AI, marking a shift from third-party based architectures (used in aquif-3 and aquif-3.5) toward an in-house architecture family. Released on October 1, 2025, AlphaMoE debuts the AquifAlphaMoEForCausalLM design, a scalable Mixture of Experts (MoE) framework that balances efficiency, reasoning, and multilingual capability.
This release represents aquif AI’s first step into independent foundational model architecture design, with a focus on modular expert scaling, long-context performance, and efficient parameter utilization.
| Model | HuggingFace Repository |
|---|---|
| aquif-AlphaMoE-7.5B-A3B | aquif-ai/aquif-AlphaMoE-7.5B-A3B |
| Model | Total Params (B) | Active Params (B) | Experts (Total / Active) | Context | Attention | Vocab Size | MMLU | GPQA-D | LiveCodeBench | Math-500 | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|
| aquif-AlphaMoE-7.5B-A3B | 7.47 | 2.92 | 64 / 4 | 164k | GQA (16 heads) | 128k | 86.7 | 60.1 | 35.9 | 87.3 | 67.5 |
| Metric | AlphaMoE (7.5B A3B) | aquif-3-moe (17B A2.8B) | Ling-mini-2.0 (16B A1.4B) | Qwen3-Instruct-2507 (4B) | aquif-3.5 (7.3B) | Granite-4.0-HS (32B A9B) | Gemma-3 (12.2B) |
|---|---|---|---|---|---|---|---|
| MMLU | 84.3 | 83.2 | 80.9 | 81.6 | 78.5 | 78.5 | 78.5 |
| GPQA-Diamond | 57.5 | 56.7 | 54.3 | 49.6 | 42.3 | 41.6 | 34.9 |
| LiveCodeBench | 35.9 | 28.6 | 34.8 | 31.9 | 21.3 | 25.1 | 13.7 |
| Math-500 | 87.3 | 91.4 | 89.4 | 84.4 | 90.2 | 85.4 | 82.4 |
| Average | 66.3 | 65.0 | 64.9 | 61.9 | 58.1 | 57.7 | 52.4 |
AquifAlphaMoEForCausalLM This project is released under the MIT (prev. Apache 2.0) license. See LICENSE file for details.
Made in 🇧🇷
© 2025 aquif AI. All rights reserved.