FastContext-NVFP4
A high-performance NVFP4-quantized version of the FastContext repo-exploration model.
github.comThe model is explicitly referenced in the linked source documentation and discussion as the benchmark/alternative model for repository exploration.
The repository contains functional scripts for quantization and vLLM serving, with clear performance benchmarks provided in the README.
FastContext-NVFP4 provides optimized weights and serving configurations for Microsoft's FastContext, a 4B parameter model designed to help coding agents explore repositories efficiently. This release uses NVIDIA ModelOpt for FP4 quantization, significantly improving decode throughput and reducing memory footprint compared to the base BF16 model for use in local coding agent workflows.
Loading…



