DK's Blog
  • Home
  • About
  • 中文
  • Parse-Render Roundtrip: Prefix Cache Design Principles for Open-Source Chat Templates
    May 13, 2026

    Proposing the Parse-Render Roundtrip design principle. Analyzing chat template and tool parser designs across Qwen3.6, GLM-5.1, Kimi K2.6, DeepSeek V4 Pro, and GPT-OSS — verifying token-level prefix consistency for multi-turn agent conversations.

    LLMInferenceAgent
  • Towards Low-Precision Computation in LLMs: From BF16 to FP4
    May 1, 2026

    A practical walkthrough of low-precision numerical formats, scaling strategies, and precision loss mitigation in LLMs, grounded in recent work from DeepSeek-V3/V4 and GPT-OSS.

    LLMLow-Precision ComputingQuantization
  • Hello World
    April 25, 2026

    The first post of this blog

    essay
GitHub Email
© 2026 dangkai.dk