Rıfkı-V3 Technical Report

Released: 2025-01-01 Version: 3.1 (Dynamic) Team: Proje Rıfkı Core

1. Overview

Rıfkı-V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B active parameters per token. Inspired by the DeepSeek-V3 architecture, Rıfkı adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures to achieve high-performance inference while maintaining economic training costs. Use of FP8 mixed precision training further stabilizes the model.

Key Feature: Rıfkı-V3 demonstrates exceptional performance in math, code, and reasoning tasks, rivaling closed-source frontier models.

2. Architecture Summary

Rıfkı model architecture is built upon the Transformer framework with key optimizations for efficiency at scale.

Multi-head Latent Attention (MLA)

Traditional Key-Value (KV) caching in transformers consumes significant memory. Rıfkı utilizes MLA technology to compress the KV cache, significantly reducing memory overhead during generation, allowing for longer context windows up to 128K tokens.

DeepSeekMoE Mixture-of-Experts

Instead of activating all parameters for every token, Rıfkı uses a MoE router to select only the most relevant experts. This ensures that for each token, only 37B parameters are active out of the 671B total, reducing computational cost by 90% compared to dense models.

3. Benchmarks

Rıfkı-V3 has been rigorously tested across standard benchmarks including MMLU, GSM8K, and HumanEval.

4. Usage

API Integration

Rıfkı provides an OpenAI-compatible API endpoint.

Local Run

You can run Rıfkı locally using standard inference engines like vLLM or SGLang.

Citation

If you use Rıfkı in your research, please cite: