Unlocking AKS Performance: Identify and Resolve CPU Saturation Issues in Azure

Updated: 14 May, 2025 • 0 • min read

Table of Contents

CPU saturation in Azure Kubernetes Service (AKS) nodes can lead to significant performance bottlenecks, affecting application responsiveness, increasing latency, and potentially causing instability across your cluster. When a node’s CPU resources are fully consumed, scheduled pods on that node might experience throttling, leading to slower processing, increased error rates, and overall degraded service quality. Identifying and resolving these saturation issues is crucial for maintaining a healthy and performant AKS environment. This guide outlines the steps to diagnose and fix CPU saturation.

Understanding CPU Saturation in AKS¶

CPU saturation occurs when the demand for CPU resources on a node exceeds its available capacity. In Kubernetes, CPU is measured in ‘cores’, with 1 core equivalent to 1000 millicores (m). Pods can request and limit CPU resources, which influences how the Kubernetes scheduler places them and how the Kubelet enforces resource limits.

When node CPU usage consistently approaches or reaches 100%, several negative consequences arise:
* Performance Degradation: Applications running on saturated nodes slow down significantly due to CPU throttling.
* Increased Latency: Requests take longer to process, impacting user experience.
* Instability: High CPU usage can lead to node unresponsiveness, potentially causing pods to be evicted or nodes to become unhealthy.
* Scaling Issues: New pods might fail to schedule if nodes are already saturated, preventing horizontal scaling of applications.
* Increased Costs: Inefficient resource utilization can lead to running more nodes than necessary to compensate for saturation.

Common causes include resource-hungry applications, misconfigured pod resource requests and limits, inefficient code, unexpected traffic spikes, or simply an undersized node pool for the workload.

Monitoring and Identifying CPU Saturation¶

The first step in addressing CPU saturation is accurately identifying where and when it is occurring. Azure Monitor, specifically Container Insights, provides the necessary tools to gain visibility into your AKS cluster’s performance.

Utilizing Azure Monitor and Container Insights¶

Container Insights allows you to visualize and analyze the performance of your AKS cluster nodes and pods. To identify nodes experiencing high CPU usage:

Navigate to your AKS cluster in the Azure portal, then select Monitoring -> Insights. Within Insights, select the Nodes tab.

Here, you can configure the metrics to display. Set the Metric dropdown to CPU Usage (millicores) or CPU Usage (%). For quick identification of problematic nodes, it’s often useful to set the sample type to Max.

Use the sort feature on the column representing maximum CPU usage (Max% or Max Millicores) to order the nodes from highest to lowest usage. This immediately highlights the nodes that have experienced peak CPU load.

Example visual representation within Azure Monitor showing node CPU utilization over time. A node displaying a graph that frequently spikes or remains near 100% requires investigation. Observe the graphs to understand if the high usage is a constant state or sporadic spikes. Consistently high usage indicates a sustained workload exceeding capacity, while spikes might point to specific events or batch jobs.

While the percentage metric is useful for a quick overview, looking at CPU Usage (millicores) provides a more concrete measure of the actual resource consumption. Remember that the percentage view for nodes is typically based on the node’s allocatable CPU resources, not its total physical CPU capacity, which can sometimes lead to confusion. Allocatable resources are the total resources minus those reserved by the Kubernetes system components.

For more granular analysis or historical trends, you can utilize Azure Monitor Logs (Log Analytics Workspace) and Kusto Query Language (KQL). Queries against the InsightsMetrics or KubeNodeInventory tables can provide detailed performance data over custom time ranges, allowing you to pinpoint saturation events precisely.

// Example KQL query to find nodes with average CPU usage > 90% over the last hour
InsightsMetrics
| where Origin == "azure.containers"
| where Namespace == "node"
| where Name == "cpuUsageMillicores"
| where TimeGenerated > ago(1h)
| summarize avg(Val) by Computer
| render anomalysparkline

Analyzing this data allows you to confirm if saturation is a persistent problem or a transient issue.

Drilling Down to Identify High-CPU Pods¶

Once you’ve identified the nodes experiencing high CPU usage, the next logical step is to determine which pods running on those nodes are consuming the majority of the CPU resources.

From the same Container Insights view, select the problematic node(s). This action typically drills down into the performance details for that specific node, showing a list of pods running on it.

Within the node detail view, you can again sort the pods by their CPU usage. Similar to nodes, you will see metrics for pods like CPU Usage (millicores) and CPU Usage (%).

Important Note: For pods, the CPU Usage (%) is calculated relative to the CPU request specified for that container in its pod definition, not the total CPU available on the node. For example, a pod requesting 500m CPU might show 200% CPU usage if it’s consuming 1000m (1 core). While this tells you if a pod is exceeding its requested amount, it doesn’t directly tell you its impact on the node’s saturation. To understand the pod’s contribution to node saturation, always look at the CPU Usage (millicores) metric. This value represents the actual CPU cycles the pod is consuming, regardless of its request.

By examining the CPU Usage (millicores) for each pod on the saturated node and sorting, you can quickly pinpoint the top CPU consumers. These are the applications or system components that are most likely causing or contributing to the node’s saturation problem.

Identify the specific pods and the deployments, stateful sets, or other workloads they belong to. This mapping is essential for understanding which applications require further investigation.

Deep Dive into Pod Resource Configuration¶

Kubernetes uses resource requests and limits to manage compute resources (CPU and memory) for containers. Requests are used by the scheduler to decide which node a pod should run on, ensuring the node has sufficient available resources. Limits are enforced by the Kubelet to restrict how much of a resource a container can consume, preventing a single container from monopolizing a node’s resources (the “noisy neighbor” problem).

Inspecting Pod Resource Requests and Limits¶

After identifying high-CPU pods, check their configured resource requests and limits. You can do this using kubectl describe.

First, to understand the overall resource allocation on a saturated node, use:

kubectl describe node <node_name>

Look for sections like Allocatable and Allocated resources.

Allocatable:
  cpu:                1930m
  ephemeral-storage:  178274469761
  memory:             6099632Ki
  pods:               110
Capacity:
  cpu:                2000m
  ephemeral-storage:  201135138Ki
  memory:             6904232Ki
  pods:               110

...

Allocated resources:
  (Total limits may exceed allocatable resources, but resources requests cannot exceed allocatable resources.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                1200m      3000m
  memory             2000Mi     4000Mi
  ephemeral-storage  0          0

This output shows the total CPU and memory allocatable on the node (capacity minus Kubernetes system overhead) and the total requested and limited resources by all pods currently scheduled on this node.

The Requests column tells you how much CPU the scheduler reserved for pods on this node. If the total requests for CPU are close to or exceed the Allocatable CPU, the node is likely densely packed, and even minor spikes could cause saturation.
The Limits column shows the maximum total CPU that pods on this node are allowed to consume. Note that total limits can exceed allocatable resources because Kubernetes only enforces limits when containers exceed their requests. If limits are much higher than requests, or not set at all, a few pods could potentially consume a large amount of CPU, impacting others.

Next, examine the resource configuration for the specific high-CPU pods you identified.

kubectl describe pod <pod_name> -n <namespace>

Look for the Containers section and check the Resources block for each container in the pod:

Containers:
  my-container:
    ...
    Resources:
      Requests:
        cpu: 200m
        memory: 256Mi
      Limits:
        cpu: 500m
        memory: 512Mi
    ...

Comparing the actual CPU usage (from Container Insights) with the configured requests and limits provides critical clues:

Actual Usage >> Limit: The container is hitting its CPU limit and being throttled. While this prevents it from consuming more CPU, it indicates the limit is too low for its workload, leading to performance issues within that pod.
Actual Usage > Request (but < Limit): The container is using more CPU than it requested. This is normal behaviour if the node has available capacity. However, if many pods on the same node behave this way without limits, they can collectively saturate the node.
Actual Usage >> Request (and No Limit): The container is consuming significantly more CPU than requested and has no upper bound enforced. This pod is a prime suspect for causing high node CPU saturation as it can consume any available resource, potentially starving other pods.
High Actual Usage (low Request/Limit or High Request/Limit): Regardless of the request/limit settings, the pod is consuming a large amount of CPU in absolute terms (millicores). This indicates that the application running in the pod has a high computational demand, either due to its nature or an underlying issue.

Understanding this relationship between usage, requests, and limits is vital for diagnosing whether the issue is a misconfiguration of resources, an inherently resource-intensive workload, or an application-level problem.

Diagnosing the Root Cause of High CPU Usage¶

Identifying which pods are consuming CPU is only part of the solution. The next step is to understand why those specific pods require so much CPU. This often involves diving deeper into the application or workload running within the high-CPU pod.

Application-Level Issues¶

The most common reason for a pod consuming excessive CPU is the application code itself.

Inefficient Code: Poorly optimized algorithms, excessive loops, or blocking operations can cause threads to consume CPU cycles unnecessarily.
High Traffic/Load: A sudden or sustained increase in incoming requests can overwhelm the application, causing it to use more CPU to process the load.
Background Processing/Batch Jobs: Certain tasks, like data processing, complex calculations, or reporting, are inherently CPU-intensive and might run periodically or on demand.
Garbage Collection Pressure: Applications with high object allocation/deallocation rates can spend a significant amount of CPU time performing garbage collection.
Configuration Issues: Sometimes, application settings, like overly verbose logging levels or incorrect connection pool configurations, can inadvertently increase CPU usage.
Dependency Bottlenecks: If an application is waiting on a slow dependency (like a database or external service), its threads might spin or consume CPU while waiting, especially if not implemented asynchronously.

Configuration and Environment Issues¶

Beyond the application code, the way the application is configured and the environment it runs in can contribute to high CPU.

Misconfigured HPA: If the Horizontal Pod Autoscaler (HPA) is configured incorrectly (e.g., threshold too high or metrics server issues), it might not scale up replicas quickly enough to handle increasing load, causing existing pods to saturate.
External System Issues: Problems with databases, message queues, or other external systems can cause applications to retry operations repeatedly or get stuck in tight loops, increasing CPU usage.

Diagnosing these root causes often requires application-specific troubleshooting: examining application logs, profiling the running application, reviewing recent code changes, and analyzing application-level metrics (if available).

Implementing Solutions to Resolve Saturation¶

Based on the diagnosis, you can apply targeted solutions to alleviate CPU saturation. Solutions often involve a combination of optimizing the application, adjusting resource allocations, and scaling the cluster infrastructure.

Optimizing Application Performance¶

If the root cause is inefficient application code or configuration, focus on improving the application itself.

Code Profiling: Use profiling tools to identify CPU hotspots within the application code. Optimize algorithms, reduce unnecessary computations, and improve concurrency models.
Review Configuration: Check application configuration for potentially CPU-intensive settings like logging verbosity, connection pooling, or caching strategies.
Handle Dependencies Gracefully: Ensure applications handle slow or unresponsive dependencies without consuming excessive CPU (e.g., using asynchronous operations, proper timeouts, circuit breakers).
Tune Garbage Collection: For managed runtimes (like Java, .NET), tune GC settings if profiling indicates high GC CPU time.

**Optimizing application performance is often the most effective long-term solution, as it addresses the source of the demand rather than just adding more resources.

Adjusting Resource Requests and Limits¶

Correctly setting CPU requests and limits is crucial for efficient scheduling and preventing saturation.

Increase Requests: If monitoring shows that pods frequently exceed their CPU requests and this correlates with node saturation, consider increasing the CPU requests. This helps the scheduler place pods on nodes with sufficient available resources, reducing the chance of scheduling conflicts and oversubscription.
Set or Adjust Limits: Implement CPU limits for all containers. If existing limits are too low and pods are being throttled, increase the limit. If limits are not set or are excessively high, set reasonable limits to prevent a single misbehaving pod from consuming all CPU resources on a node. Start with limits slightly above typical peak usage and adjust based on monitoring.
Iterative Process: Setting requests and limits is often an iterative process. Start with educated guesses based on testing or historical data, monitor performance, and adjust as needed.

It is essential to test resource changes in a non-production environment first to understand their impact.

Scaling Your AKS Cluster¶

If the workload demand genuinely exceeds the current infrastructure capacity, scaling the AKS cluster is necessary.

Horizontal Pod Autoscaling (HPA): Configure HPA for your deployments to automatically scale the number of pod replicas based on CPU usage (or other metrics). If pods are saturating nodes due to increased load, HPA will create more replicas, distributing the load across more pods and potentially new nodes (if Cluster Autoscaler is enabled). Ensure HPA thresholds are set appropriately based on pod resource requests/limits and desired performance.
Cluster Autoscaler (CA): Enable and configure the Cluster Autoscaler for your node pools. CA monitors for pending pods (pods that couldn’t be scheduled due to insufficient resources) and automatically scales the number of nodes in the node pool up or down. If HPA creates more pods but there aren’t enough resources on existing nodes, CA will add new nodes, allowing the pending pods to schedule and relieving pressure on saturated nodes.
Manual Node Pool Scaling: If autoscaling is not suitable or for planned capacity increases, manually scale the number of nodes in the affected node pool.
Choose Appropriate VM SKUs: Ensure your node pool uses VM sizes (SKUs) that are suitable for your workloads’ CPU requirements. For highly CPU-intensive applications, consider CPU-optimized VM series.

[Replace with a relevant YouTube video embed discussing AKS monitoring or performance troubleshooting. A placeholder is used as per instructions.]

Scaling needs to be balanced with cost considerations. Adding more nodes increases operational costs, so it’s important to ensure that the scaling is truly necessary and not just masking underlying application inefficiencies.

Advanced Troubleshooting Techniques¶

For complex issues, more advanced techniques might be required:

Analyze Application Logs: Detailed application logs can reveal patterns or errors that correlate with high CPU usage.
Distributed Tracing: Implementing distributed tracing helps visualize the flow of requests through services and identify specific operations or services causing delays and high CPU.
Profiling in Kubernetes: Tools and techniques exist to profile applications running directly within Kubernetes pods, offering deep insights into runtime behaviour.

Proactive Strategies for Maintaining Performance¶

Preventing CPU saturation is more efficient than reacting to it. Implementing proactive measures ensures a more stable and performant AKS cluster.

Comprehensive Monitoring and Alerting: Set up detailed monitoring for node and pod CPU usage. Configure alerts for when CPU usage exceeds predefined thresholds, allowing you to identify and address potential saturation before it impacts users.
Establish Resource Governance: Implement policies that require all deployments to specify CPU requests and limits. This prevents runaway pods and helps the scheduler make informed decisions.
Regular Performance Reviews: Periodically review the performance of your applications and the AKS cluster. Identify trends in resource usage and proactively scale or optimize before saturation becomes a problem.
Load Testing: Conduct regular load tests on your applications in a staging environment that mimics production. This helps identify performance bottlenecks and determine appropriate resource requests/limits and scaling configurations under realistic load.
CI/CD Integration: Integrate performance testing and resource request/limit validation into your CI/CD pipelines to catch potential issues early.
Developer Training: Educate development teams on writing resource-efficient code and the importance of setting appropriate resource requests and limits in their Kubernetes manifests.

Addressing CPU saturation in AKS is a multi-faceted process involving monitoring, diagnosis, and targeted solutions. By following these steps and implementing proactive strategies, you can ensure your AKS cluster remains healthy, performant, and cost-effective.

What challenges have you faced with CPU saturation in your AKS clusters, and what strategies have you found most effective in resolving them? Share your experiences in the comments below!

Computacionvinchi