Sources: CNCF 2025 Annual Cloud Native Survey, AWS Containers Blog (KubeCon EU 2026), Microsoft Azure Tech Community, SiliconAngle KubeCon EU preview, Nutanix KubeCon Amsterdam 2026 analysis, Red Hat AI Enterprise announcement, KubeCon EU 2026 official schedule. KubeCon + CloudNativeCon Europe 2026 runs March 23-26 in Amsterdam.
Table of Contents
- Kubernetes Is Not a Container Orchestrator Anymore
- 82% in Production, 66% Running AI: The Survey Numbers That Define the Shift
- The GPU Scheduling Problem That Is Quietly Breaking Production Clusters
- Kubernetes 1.35: What Actually Changed for AI Workloads
- Amazon EKS at 100,000 Nodes: What Had to Be Rebuilt to Get There
- The “Invisible Kubernetes” Vision: Karpenter, kro, and Cedar
- The Return of Stateful Architecture and Why Serverless Is Losing Ground
- What KubeCon Amsterdam Is Actually About This Year
- The Cloud Bill Problem: FinOps Is Now a Platform Engineering Problem
- Data Sovereignty and Why European Cloud Is a Different Conversation
- What Every Cloud Engineer Should Take Away From This Week
Kubernetes Is Not a Container Orchestrator Anymore
That description was accurate in 2018. In 2026 it is the way people who have not worked with Kubernetes recently still explain it to each other. The CNCF’s own 2025 Annual Cloud Native Survey describes Kubernetes as the “de-facto operating system for AI,” which is a meaningfully different framing. An operating system is the layer everything else runs on. A container orchestrator is a tool for managing specific workload types. The gap between those two descriptions is the entire story of what happened to Kubernetes over the last few years.
KubeCon + CloudNativeCon Europe 2026 starts this Monday, March 23, in Amsterdam. It is the largest cloud native conference in the world and the sessions, keynotes, and booth conversations that happen there over four days tend to be the most accurate preview of where cloud infrastructure is heading for the next twelve months. This year the central question is not whether AI can run on Kubernetes. That question was settled. The question is whether the current version of Kubernetes is actually ready for AI workloads at the scale and reliability standard that production deployment demands, and what needs to change in the clusters, the scheduling layer, the storage architecture, and the operational tooling to get there.
The answer from AWS, Microsoft, Red Hat, Google, and essentially every major cloud provider showing up in Amsterdam this week is: a lot has already changed, more is changing right now, and the changes are significant enough that teams running Kubernetes clusters designed for traditional stateless microservices workloads are going to need to rethink some foundational assumptions.
82% in Production, 66% Running AI: The Survey Numbers That Define the Shift
The CNCF’s 2025 Annual Cloud Native Survey is the most comprehensive snapshot of how organizations actually use Kubernetes at scale. Two numbers from that survey define the current moment. First: 82 percent of container users now run Kubernetes in production environments. That is not experimental adoption or pilot programs. That is production infrastructure for more than four in five organizations that use containers seriously. Kubernetes has effectively won the container orchestration market and is now the infrastructure substrate that most serious cloud-native applications run on.
Second, and more significant for understanding what is happening right now: 66 percent of those production Kubernetes users are using it to host generative AI workloads. Two-thirds of organizations that run Kubernetes in production are running AI on it. Not planning to. Not evaluating whether they should. Running it now.
That 66 percent figure is what makes KubeCon Amsterdam different from every previous KubeCon. The conversations about AI infrastructure at prior events were forward-looking, about how to prepare for AI workloads, how to add GPU node pools, how to think about the new requirements. This year those conversations are operational. Organizations are already running AI in production on Kubernetes and hitting the specific limitations that come from running workloads the system was not originally designed for. The sessions on the KubeCon schedule reflect this: GPU scheduling optimization, AI/ML lifecycle security, hardware-aware scheduling for heterogeneous accelerators, distributed transactions on Kubernetes, and stateful architecture patterns are all on the agenda specifically because these are problems organizations are solving right now, not problems they are anticipating.
The CNCF numbers: 82% of container users run Kubernetes in production (2025 Annual Cloud Native Survey). 66% use Kubernetes to host GenAI workloads. IBM’s generative AI book of business surpassed $12.5 billion in Q4 2025, up from $9.5 billion the prior quarter. Nearly all of that workload runs on Kubernetes-based infrastructure. These are not forecasts. They are current production deployments.
The GPU Scheduling Problem That Is Quietly Breaking Production Clusters
Running traditional stateless microservices on Kubernetes is a well-understood problem with well-understood solutions. You define your deployment, set your resource requests and limits for CPU and memory, let the scheduler place pods on appropriate nodes, and horizontal pod autoscaling handles traffic spikes. The tooling for this is mature and reliable. Most of the hard problems were solved between 2019 and 2023.
GPU workloads break most of these assumptions in specific ways that are not immediately obvious until you are running them at scale. The first problem is that GPUs are expensive, discrete, and non-fungible in ways that CPU cores are not. When you request 100 millicores of CPU, the scheduler has enormous flexibility in how to satisfy that request across a large cluster. When you request a specific GPU type for a training job or inference server, the scheduler has to find a node with that specific hardware, in sufficient quantity, with appropriate topology awareness to ensure the GPUs can communicate with each other at full bandwidth. A training job that places GPU pods on nodes where the GPUs are not connected through NVLink or high-bandwidth interconnect runs significantly slower than one where topology placement is correct. That performance difference is measured not in percentages but in multiples, and the cost of getting it wrong is measured in GPU-hours, which are expensive.
The second problem is that AI training jobs are bursty in a way that stateless web services are not. A web service sees gradual traffic ramp-up that autoscaling can respond to predictably. A training job needs hundreds of GPU nodes immediately, uses them intensively for hours or days, and then releases them all at once. The node provisioning latency that is acceptable for web traffic, where spending five minutes waiting for new nodes to join the cluster costs you some user experience, is not acceptable for training jobs where every minute of cluster preparation time costs the same as minutes of active GPU usage. Provisioning GPU nodes fast enough to not waste expensive GPU time while not over-provisioning and paying for idle GPU capacity is a scheduling problem that traditional Kubernetes autoscaling was not designed to solve.
The third problem is that inference workloads have different latency requirements from training workloads and different resource profiles from web services. A model inference server needs to maintain loaded model weights in GPU memory continuously, cannot be scheduled and rescheduled as freely as a stateless container, and has tail latency requirements that require careful thought about resource contention with other workloads sharing the same node. Multi-tenant Kubernetes clusters where AI inference and traditional services share hardware require much more careful resource isolation than clusters running uniform workload types.
Kubernetes 1.35, released in early 2026, addresses several of these problems directly. The previous versions addressed them partially. Understanding what changed and why it matters is the technical substance behind the “Kubernetes for AI” conversation happening in Amsterdam this week.
Kubernetes 1.35: What Actually Changed for AI Workloads
The most significant change in Kubernetes 1.35 for AI workloads is the move toward dynamic, workload-aware orchestration for expensive accelerators. The specific mechanism is in-place pod resource resize, which allows the resource allocation for a running pod to be modified without killing and restarting the pod. This sounds like a minor quality-of-life improvement until you think about what it means for GPU workloads specifically.
Previously, if an inference server needed more GPU memory because request patterns changed, the only option was to terminate the pod, update the spec, and restart it. For a stateless web service, restarting a pod is routine. For an inference server that has loaded a large model into GPU memory, restarting means evicting the model, bringing up a new pod, and reloading the model from storage, which can take minutes for large models and means serving degraded latency during that window. In-place resize allows resource adjustments without this disruption, which makes AI inference workloads significantly more operational in practice.
The AI-optimized scheduling improvements in 1.35 address the topology awareness problem directly. Hardware-aware scheduling that understands GPU interconnect topology, NUMA node boundaries, and accelerator types allows the scheduler to make placement decisions that maximize the effective bandwidth available to distributed training jobs. This is infrastructure plumbing that does not show up in demo videos but is the difference between a training cluster that achieves 80 percent of theoretical bandwidth and one that achieves 40 percent because pods were placed on nodes that cannot communicate efficiently.
The mutable resource management changes represent a broader philosophical shift that the Art of CTO’s analysis captured well: “static capacity planning is giving way to dynamic, workload-aware orchestration for expensive accelerators and spiky inference and training patterns.” This matters because the economics of GPU compute make static over-provisioning much more expensive than the economics of CPU compute ever did. Organizations that can provision GPU capacity precisely when they need it and release it immediately when they do not save meaningfully compared to organizations that hold reserved GPU capacity to ensure it is available when a training job starts.
Amazon EKS at 100,000 Nodes: What Had to Be Rebuilt to Get There
In the summer of 2025, Amazon announced that EKS supports up to 100,000 worker nodes in a single cluster, equivalent to 1.6 million AWS Trainium accelerators or 800,000 Nvidia GPUs. That number is large enough to require a moment to contextualize. A cluster of 800,000 Nvidia GPUs is significantly larger than most of the largest AI training clusters that exist anywhere in the world. The xAI Colossus system described in previous CyberDevHub articles runs 100,000 GPUs. Amazon’s announcement means you can run a cluster eight times that size in EKS if you have the hardware.
Getting to 100,000 nodes did not happen by simply running more instances. It required architectural changes to core Kubernetes components that AWS engineers contributed back to the upstream community. Two specific changes are worth understanding. The first is a reimagined etcd storage layer. Etcd is the distributed key-value store that holds all of Kubernetes’ cluster state: every pod, every service, every deployment configuration, every node registration. At small cluster sizes, etcd’s performance is not a concern. At 100,000 nodes with millions of pods being scheduled, updated, and terminated, etcd becomes the central bottleneck. The reimagined storage layer improves efficient state management at this scale in ways that the default etcd configuration cannot provide.
The second change is an optimized control plane capable of handling millions of scheduling, discovery, and repair operations. The Kubernetes control plane processes every event in the cluster: pod placements, node health checks, service endpoint updates, configuration changes. At 100,000 nodes, the event rate is high enough that a control plane designed for thousands of nodes cannot keep up. The optimizations AWS contributed allow the control plane to process this event volume while maintaining the scheduling latency that AI training jobs require, where a slow scheduler means idle GPUs waiting for work to be assigned.
AWS is presenting a keynote at KubeCon Amsterdam on March 24 titled “From Complexity to Clarity: Engineering an Invisible Kubernetes,” delivered by Jesse Butler, the principal product manager for Amazon EKS. The talk covers three community-driven upstream innovations that AWS is positioning as the path to Kubernetes becoming infrastructure that developers do not have to think about: Karpenter, kro, and Cedar. These three together represent what AWS thinks Kubernetes needs to look like to handle the next decade of workloads.
The “Invisible Kubernetes” Vision: Karpenter, kro, and Cedar
The “invisible Kubernetes” framing from AWS’s KubeCon keynote description captures something real about where the community is trying to take the platform. The goal is for Kubernetes to fade into the infrastructure stack the same way TCP/IP has. Nobody thinks about TCP/IP when they use the internet. It is there, it works, and the complexity is hidden behind abstractions that work reliably. Kubernetes is trying to reach that same level of invisibility, where developers define what they want to run and the cluster figures out how to run it without requiring deep Kubernetes expertise.
Karpenter is the node autoprovisioner that represents the current best implementation of this idea for compute. Rather than relying on cluster autoscaler, which adds and removes nodes based on pending pod scheduling failures, Karpenter makes node provisioning decisions based on the actual requirements of workloads before pods are stuck in a pending state. It considers cost, capacity availability, and consolidation opportunities in real time, provisions the right node type for each workload, and terminates unused nodes aggressively. For GPU workloads, Karpenter’s ability to provision heterogeneous node types, selecting the specific GPU instance type that matches the workload’s resource request rather than only the instance types pre-configured in a node group, is operationally significant. Microsoft is also presenting Karpenter-based GPU capacity scheduling at KubeCon, covering cross-cloud AI inference with elastic autoscaling.
kro, which stands for Kube Resource Orchestrator, is a framework for building higher-level abstractions on top of Kubernetes primitives. The problem it addresses is that deploying a real application on Kubernetes typically requires creating many interdependent resources: deployments, services, config maps, service accounts, ingress rules, horizontal pod autoscalers, and more. Each of these is a separate Kubernetes object. Keeping them in sync, understanding the dependencies between them, and exposing them to application teams without requiring those teams to understand every Kubernetes primitive is an operational challenge that grows with cluster size. kro allows platform teams to define composite application templates that application developers can instantiate with a simple, high-level configuration, hiding the underlying Kubernetes complexity.
Cedar is a policy language that AWS uses for fine-grained authorization, and its inclusion in this trio reflects the governance challenge that comes with large, multi-tenant Kubernetes clusters. When hundreds of teams are deploying to the same cluster, consistent enforcement of security policies, resource quotas, naming conventions, and compliance requirements requires a programmatic policy layer that is consistent across the entire cluster. Cedar’s expressive policy language and the tooling around it provides this at a level of precision that older admission controller approaches make difficult to maintain.
The Return of Stateful Architecture and Why Serverless Is Losing Ground
One of the more interesting patterns visible in the KubeCon session schedule and in technical community discussions heading into this week is a partial reversal of the serverless-first orthodoxy that dominated cloud architecture conversations from roughly 2019 through 2024. The specific trigger for this reversal is AI workloads, but the underlying reasoning generalizes beyond AI.
Serverless architectures, functions as a service, managed cloud databases, stateless containers that boot and terminate per request, were designed around the assumption that the cost of state is high and should be minimized wherever possible. For web APIs, data transformation pipelines, and event handling, this is largely true. Stateless services scale horizontally without coordination overhead, tolerate failures gracefully, and require minimal operational attention.
AI inference does not fit this pattern. A model inference server is fundamentally stateful. It loads model weights into GPU memory when it starts, and those weights are the expensive, slow-to-load resource that makes inference fast. A serverless function that loads a large language model from storage on every invocation spends more time loading the model than it does serving the request. This is not a solvable problem with better cold start optimization. It is a structural mismatch between the serverless execution model and the resource characteristics of AI inference.
Unkey’s publicly documented decision to migrate from Cloudflare Workers, a serverless platform, back to stateful Go servers is one of the concrete examples that has circulated in the technical community as an illustration of this pattern. Their reasoning was explicit: authentication, rate limiting, and policy enforcement sit on the critical performance path and require predictable performance characteristics that serverless invocation models cannot guarantee when latency sensitivity matters. The Art of CTO’s analysis of this noted a broader “paved road versus bespoke path tension” where frameworks designed for maximum scale impose constraints that become unacceptable once AI-driven traffic patterns and latency sensitivity appear.
The KubeCon session titled “Beyond Stateless: Distributed Transactions with Autoscaling and Consistency on Kubernetes” is directly addressing this pattern. If AI inference needs to be stateful, and AI training definitely needs to be stateful, and the applications calling AI services increasingly need strong consistency guarantees, then Kubernetes needs to be genuinely good at stateful workloads rather than merely tolerating them. That is a different set of requirements from the stateless microservices use case that Kubernetes originally optimized for.
What KubeCon Amsterdam Is Actually About This Year
Looking at the session schedule for KubeCon EU 2026, five themes appear consistently enough to represent the actual state of what the community is working on, as opposed to what vendors are marketing.
GPU orchestration and AI scheduling is the dominant technical theme. Sessions cover hardware-aware scheduling, GPU resource management, distributed workload scheduling, and the Certified Kubernetes AI Conformance specification that AWS helped establish at KubeCon North America 2025. The certification validates that a Kubernetes distribution supports the specific capabilities required for AI workloads: GPU resource management, distributed workload scheduling, intelligent accelerator scaling, and integrated infrastructure monitoring. It is the AI equivalent of the standard Kubernetes conformance tests, and its existence signals that the community has defined what “Kubernetes for AI” means at a technical specification level.
Platform engineering maturation is the second theme. Multiple sessions address how platform teams are building internal developer platforms that abstract Kubernetes complexity for application teams. The SiliconAngle pre-conference analysis described platform engineering as “maturing from an aspirational concept into an operational discipline.” The sessions at KubeCon reflect this: they are covering production platform experiences and lessons learned rather than theoretical frameworks for how platform engineering should work.
Observability standardization around OpenTelemetry is the third theme. As clusters run more diverse workload types, with web services, AI training, AI inference, data pipelines, and agent systems potentially sharing the same infrastructure, the observability challenge grows proportionally. OpenTelemetry is emerging as the standard that makes it possible to have consistent logs, metrics, and traces across this heterogeneous environment. The push toward standardization is being driven by operational necessity: teams cannot monitor AI workloads effectively when their observability tools only understand the behavior of traditional stateless services.
Supply chain security and MLSecOps is the fourth theme, directly relevant given the GlassWorm campaign covered in CyberDevHub’s earlier cybersecurity article this week. The session on “Securing the AI/ML Lifecycle with MLSecOps” addresses a specific gap: the tools and practices for securing software supply chains were developed for traditional application code and do not automatically extend to AI model artifacts, training data pipelines, and the notebooks and scripts that ML practitioners use. Ensuring that the AI components of a Kubernetes-hosted application are subject to the same supply chain security standards as the application code is an open problem the community is actively working on.
The fifth theme is data sovereignty, which is especially prominent at a European KubeCon. Regulations around where data can be stored and processed, including GDPR and sector-specific rules for healthcare and financial services, create constraints that hyperscaler-first infrastructure sometimes struggles to satisfy. European cloud providers like UpCloud and the European arms of global providers are discussing sovereign cloud infrastructure specifically designed to ensure that organizations can keep data within jurisdictional boundaries without sacrificing Kubernetes-native operational capabilities.
The Cloud Bill Problem: FinOps Is Now a Platform Engineering Problem
The State of FinOps 2025 report identified workload optimization and waste reduction as the top priorities for FinOps practitioners. That result deserves context: for years, the top FinOps priority was simply getting visibility into what was being spent. The shift to optimization as the primary concern means most mature organizations have solved the visibility problem and are now focused on actually reducing cost based on what they can see.
Kubernetes makes this harder than it used to be and, with the right tooling, potentially easier. The “harder” part comes from the flexibility that makes Kubernetes powerful. Containers can be scheduled on any compatible node, autoscaling can spin up resources in seconds, and multi-tenant clusters share underlying infrastructure across teams and workloads. The result is that answering “where is the money actually going” in a large Kubernetes environment is non-trivial. Pod-level cost attribution requires tracking which pods ran on which nodes for how long, mapping that to node costs that vary by instance type and utilization, and allocating shared cluster overhead across the workloads that consumed it.
The GPU dimension makes this significantly more expensive and therefore more important to get right. A GPU node costs an order of magnitude more than a comparable CPU node. A GPU that is not being utilized because of poor scheduling, waiting for a training job to start, or sitting idle after a job completes is burning money in a way that an idle CPU node does not. The Nutanix KubeCon preview analysis made this specific: “The same flexibility that makes containers so powerful has broken the old ways of managing costs. In environments where clusters are spread across clouds, regions, and on-premises locations, it is often hard to answer a simple question: where is the money actually going?”
Platform engineering is the organizational mechanism that connects FinOps principles to actual Kubernetes behavior. A platform team that builds cost visibility directly into the internal developer platform, showing application teams the cost impact of their resource requests before they are deployed, makes FinOps a real-time engineering discipline rather than a retroactive finance report. The integration of cost metrics into scaling policies, so that autoscaling decisions consider cost alongside performance, is a Kubernetes operational pattern that the KubeCon community is actively developing tooling to support.
Data Sovereignty and Why European Cloud Is a Different Conversation
KubeCon in Amsterdam takes place in a specific regulatory context that makes some conversations there different from the same conversations at KubeCon North America. European organizations are subject to GDPR requirements around data residency and processing, sector-specific regulations for healthcare and financial services that add additional constraints, and an increasing political conversation about strategic technology independence from non-European providers.
The practical consequence for Kubernetes is that “run this on the closest AWS region” is not always a satisfactory answer for European enterprises. Organizations that need to guarantee their data never leaves the EU, or specifically never leaves a particular member state, need infrastructure that can make and honor that guarantee at the platform level rather than relying on contractual commitments from hyperscalers who ultimately operate infrastructure in the US and are subject to US legal jurisdiction regardless of where their European data centers are located.
The European cloud providers and the European deployments of global providers presenting at KubeCon Amsterdam are specifically addressing this. UpCloud’s session positioning is about “sovereign cloud infrastructure” for organizations navigating the complexity of maintaining data sovereignty without sacrificing Kubernetes-native operational capabilities. Red Hat’s presentation covers regulatory and enterprise requirements for AI workloads, explicitly including compliance as a first-class concern alongside performance and operational efficiency. The “data sovereignty under closer examination than ever” framing in the CNCF’s own KubeCon preview reflects a real policy environment that is tightening rather than relaxing.
For developers and architects building systems that serve European users or operate in regulated European industries, the KubeCon Amsterdam conversations about sovereign cloud architecture are directly relevant to infrastructure decisions they are making right now. The pattern of “use whatever cloud is cheapest and most convenient” that worked for early-stage startups runs into increasingly real regulatory constraints as organizations scale and move into regulated sectors.
What Every Cloud Engineer Should Take Away From This Week
KubeCon Amsterdam is not a conference you attend to find out whether Kubernetes matters. That question was settled by the 82 percent production adoption number. It is a conference where the people actually running the largest and most complex Kubernetes deployments on earth share what they have learned about making it work for workloads that the platform was not originally designed to support.
The things worth paying attention to from this week, whether you are attending in Amsterdam or following the coverage remotely, are the operational lessons from production AI on Kubernetes rather than the roadmap announcements. AWS scaling to 100,000 nodes required specific architectural changes. Red Hat’s production AI platform experience at their enterprise customer scale provides a different data point. The session on distributed transactions on Kubernetes addresses a real problem that a lot of teams running AI inference services are hitting without necessarily recognizing it as a solved problem in the research literature.
For developers who are not yet running AI workloads on Kubernetes but are planning to: the shift happening right now from static resource management to dynamic workload-aware orchestration is the most consequential architectural change to understand. If your cluster is sized based on average load with manual node pools and cluster autoscaler, the Karpenter model of just-in-time node provisioning with workload-aware instance selection is a meaningfully better approach for the bursty, expensive resource patterns that AI workloads create. The engineering investment to migrate to Karpenter is real but so is the cost savings from not over-provisioning GPU capacity.
For platform engineers: the three-way intersection of GPU scheduling, FinOps cost visibility, and supply chain security is where the most interesting operational challenges are being solved right now. None of these problems are fully solved by any single tool or vendor. The KubeCon ecosystem presentations are where you get the honest assessment of what works, what does not, and what the community is still figuring out.
For engineering managers and architects: the stateful architecture discussion is the one that may require the most significant rethinking of existing decisions. If your organization made serverless-first architectural choices in the last few years based on the assumption that stateless services are always better, the emergence of AI inference as a first-class workload type that is fundamentally stateful is a reason to revisit those choices for the specific cases where it matters.
KubeCon keynotes and session recordings are posted publicly on the CNCF YouTube channel after the event. Worth watching the AWS “Invisible Kubernetes” keynote and the Red Hat OpenShift Commons recordings specifically, as those tend to contain the most candid assessments of where the technology actually stands versus where the marketing says it is.
What version of Kubernetes are you running in production and what is your biggest operational pain point with it right now? The gap between “what KubeCon talks about” and “what teams are actually struggling with” is always interesting and the comments here tend to surface the real problems faster than conference sessions do.
References (March 19, 2026):
CNCF 2025 Annual Cloud Native Survey (82% production, 66% GenAI, Kubernetes as AI OS): cncf.io
AWS at KubeCon EU 2026 (100K nodes, Karpenter/kro/Cedar, EKS AI conformance): aws.amazon.com/blogs/containers
Microsoft Azure at KubeCon Europe 2026 (GPU scheduling, cross-cloud inference, Brendan Burns keynote): techcommunity.microsoft.com
SiliconAngle: “Cloud-native AI on Kubernetes takes shape at KubeCon EU” (IBM $12.5B GEN AI, Red Hat AI Enterprise): siliconangle.com
Nutanix: “Top 5 Kubernetes Trends to Watch at KubeCon Amsterdam 2026” (FinOps, GPU ops): nutanix.com
KubeCon + CloudNativeCon Europe 2026 official schedule: kccnceu2026.sched.com
The Art of CTO: “AI-Native Platforms and the Return of Stateful Architecture” (Kubernetes 1.35, in-place pod resize): theartofcto.com
State of FinOps Report 2025 (workload optimization as top priority): finops.org
Red Hat at KubeCon EU 2026 (OpenShift Commons, AI lifecycle security): redhat.com
In 2018 Kubernetes was a way to run containers.
In 2026 it is the operating system that 800,000 Nvidia GPUs run on. The conference starting Monday is about making sure those GPUs stay busy.




Leave a Reply