- Parse-Render Roundtrip: Prefix Cache Design Principles for Open-Source Chat Templates
Proposing the Parse-Render Roundtrip design principle. Analyzing chat template and tool parser designs across Qwen3.6, GLM-5.1, Kimi K2.6, DeepSeek V4 Pro, and GPT-OSS — verifying token-level prefix consistency for multi-turn agent conversations.
- Towards Low-Precision Computation in LLMs: From BF16 to FP4
A practical walkthrough of low-precision numerical formats, scaling strategies, and precision loss mitigation in LLMs, grounded in recent work from DeepSeek-V3/V4 and GPT-OSS.
- Hello World
The first post of this blog