Home - DK's Blog

Parse-Render Roundtrip: Prefix Cache Design Principles for Open-Source Chat Templates
May 13, 2026

Proposing the Parse-Render Roundtrip design principle. Analyzing chat template and tool parser designs across Qwen3.6, GLM-5.1, Kimi K2.6, DeepSeek V4 Pro, and GPT-OSS — verifying token-level prefix consistency for multi-turn agent conversations.

LLMInferenceAgent
Towards Low-Precision Computation in LLMs: From BF16 to FP4
May 1, 2026

A practical walkthrough of low-precision numerical formats, scaling strategies, and precision loss mitigation in LLMs, grounded in recent work from DeepSeek-V3/V4 and GPT-OSS.

LLMLow-Precision ComputingQuantization
Hello World
April 25, 2026

The first post of this blog

essay