Kubernetes v1.36 Enhances Memory QoS with Tiered Protection and Opt-In Reservations

By ✦ min read
<h2 id="introduction">Introduction</h2> <p>As Kubernetes evolves, ensuring predictable performance for containerized workloads remains a top priority. In version 1.36, the Memory QoS feature (still in alpha) receives significant enhancements, giving cluster operators finer control over how the kernel handles container memory under pressure. Originally introduced in v1.22 and refined in v1.27, this feature leverages the cgroup v2 memory controller to provide smarter guidance to the kernel. This article explores the key updates in v1.36: opt-in memory reservation, tiered protection by Quality of Service (QoS) class, new observability metrics, and a kernel‑version warning for <code>memory.high</code>.</p><figure style="margin:20px 0"><img src="https://picsum.photos/seed/2989915411/800/450" alt="Kubernetes v1.36 Enhances Memory QoS with Tiered Protection and Opt-In Reservations" style="width:100%;height:auto;border-radius:8px" loading="lazy"><figcaption style="font-size:12px;color:#666;margin-top:5px"></figcaption></figure> <h2 id="whats-new">What’s New in v1.36</h2> <p>The Memory QoS feature has been re‑architected to separate throttling from memory reservation. Previously, enabling the feature gate immediately configured <code>memory.min</code> for every container with a memory request, a hard guarantee that the kernel would never reclaim. In v1.36, administrators can adopt a more graduated approach.</p> <h3 id="opt-in-reservation">Opt‑In Memory Reservation with <code>memoryReservationPolicy</code></h3> <p>Starting in v1.36, the kubelet introduces a new configuration field, <code>memoryReservationPolicy</code>, which controls whether the kubelet writes <code>memory.min</code> or <code>memory.low</code> for pods. This field offers two options:</p> <ul> <li><strong><code>None</code> (default):</strong> The kubelet does <em>not</em> write <code>memory.min</code> or <code>memory.low</code> for any containers. Throttling via <code>memory.high</code> still works, controlled by the <code>memoryThrottlingFactor</code> (default 0.9). This lets operators enable throttling first to observe workload behavior without hard guarantees.</li> <li><strong><code>TieredReservation</code>:</strong> The kubelet writes tiered memory protection based on the Pod’s QoS class, providing differentiated levels of hardware‑assisted memory reservation.</li> </ul> <h2 id="tiered-protection">Tiered Protection by QoS Class</h2> <p>When <code>memoryReservationPolicy</code> is set to <code>TieredReservation</code>, the kubelet applies distinct cgroup v2 parameters for each QoS class:</p> <ul> <li><strong>Guaranteed Pods:</strong> Receive hard protection via <code>memory.min</code>. For example, a Guaranteed Pod requesting 512 MiB of memory will have <code>memory.min</code> set to 536870912 bytes (512 MiB). The kernel will <em>never</em> reclaim this memory under any circumstances; if it cannot honor the guarantee, it invokes the OOM killer on other processes to free pages.</li> <li><strong>Burstable Pods:</strong> Receive soft protection via <code>memory.low</code>. For the same 512 MiB request on a Burstable Pod, the cgroup file shows <code>memory.low</code> = 536870912. The kernel <em>avoids</em> reclaiming this memory under normal memory pressure but may reclaim it if the alternative is a system‑wide OOM.</li> <li><strong>BestEffort Pods:</strong> Receive neither <code>memory.min</code> nor <code>memory.low</code>. Their memory remains fully reclaimable, offering the most flexibility to the kernel under pressure.</li> </ul> <p>This tiered approach ensures that the most critical workloads receive the strongest guarantee, while burstable pods get a preference that can be relaxed in extreme situations.</p> <h2 id="comparison-prior">Comparison with Previous Behavior</h2> <p>In earlier versions (v1.27 and before), enabling the MemoryQoS feature gate immediately set <code>memory.min</code> for every container with a memory request, regardless of QoS class. <code>memory.min</code> is a hard reservation – the kernel will not reclaim that memory, even under extreme pressure. Consider a node with 8 GiB of RAM where Burstable Pod requests total 7 GiB. In earlier versions, that 7 GiB would be locked as <code>memory.min</code>, leaving very little headroom for the kernel, system daemons, or BestEffort workloads. This increased the risk of OOM kills and made the node inflexible.</p> <p>With v1.36’s tiered reservation, Burstable requests map to <code>memory.low</code> instead of <code>memory.min</code>. Under normal pressure, the kernel still protects that memory, but under extreme pressure it can reclaim a portion to avoid system‑wide OOM. Only Guaranteed Pods use <code>memory.min</code>, which keeps the hard‑reservation footprint lower. With the <code>memoryReservationPolicy</code> field, operators can first enable throttling (<code>None</code>), observe workload behavior, and then opt into reservation (<code>TieredReservation</code>) when the node has enough headroom.</p> <h2 id="observability-metrics">Observability Metrics</h2> <p>To help administrators monitor the impact of Memory QoS, v1.36 exposes two alpha‑stability metrics on the kubelet’s <code>/metrics</code> endpoint:</p> <table> <thead> <tr><th>Metric</th><th>Description</th></tr> </thead> <tbody> <tr><td><code>kubelet_memory_qos_node_memory_min_bytes</code></td><td>Total amount of <code>memory.min</code> reserved across all pods on the node.</td></tr> <tr><td><code>kubelet_memory_qos_node_memory_low_bytes</code></td><td>Total amount of <code>memory.low</code> reserved across all burstable pods on the node.</td></tr> </tbody> </table> <p>These metrics allow operators to track how much memory is hard‑reserved vs. soft‑reserved, making it easier to tune the <code>memoryReservationPolicy</code> and <code>memoryThrottlingFactor</code> for optimal resource utilization.</p> <h2 id="kernel-warning">Kernel‑Version Warning for <code>memory.high</code></h2> <p>One subtle but important addition in v1.36 is a startup warning when the host kernel is older than 5.11. The <code>memory.high</code> cgroup file, used for throttling, only works reliably as a limiter starting from kernel 5.11 (and was completely absent in some older kernels). The kubelet now checks the kernel version and emits a warning if the feature might behave unexpectedly. This helps administrators avoid confusing behavior when deploying Memory QoS on nodes running an older kernel.</p> <h2 id="conclusion">Conclusion</h2> <p>Kubernetes v1.36 marks a significant step forward for Memory QoS. By separating throttling from reservation, introducing tiered protection based on QoS class, and providing observability metrics, the feature gives administrators the flexibility to balance performance and resource efficiency. The opt‑in approach allows safe rollout: enable throttling first, monitor, then add hard reservations for Guaranteed Pods when confidence grows. Combined with the kernel‑version warning, these changes make the alpha feature more production‑ready than ever. As with all alpha features, careful testing in non‑critical environments is advised before wide deployment.</p>
Tags: