Browse
FastContext-NVFP4

FastContext-NVFP4

A high-performance NVFP4-quantized version of the FastContext repo-exploration model.

github.com
Built with
GLM-5.2NEW
Strong

The model is explicitly referenced in the linked source documentation and discussion as the benchmark/alternative model for repository exploration.

The source post explicitly references the model: 'Without FastContext: stuff the whole demo repo into GLM-5.2'.
Build evidence
Strong

The repository contains functional scripts for quantization and vLLM serving, with clear performance benchmarks provided in the README.

Creator
r0b0tlab @r0b0tlab
Shipped
1h ago · model from Jun 16, 2026

FastContext-NVFP4 provides optimized weights and serving configurations for Microsoft's FastContext, a 4B parameter model designed to help coding agents explore repositories efficiently. This release uses NVIDIA ModelOpt for FP4 quantization, significantly improving decode throughput and reducing memory footprint compared to the base BF16 model for use in local coding agent workflows.

Timeline
Teaser
Video
Playable
Product

Loading…

Media & coverage
sourced from 2 posts