What Is Azure Kubernetes Service (AKS) and Why Should You Care?

In every cloud native architecture discussion you have had over the last few years or are going to have in the coming years, you can be guaranteed that someone has or will introduce Kubernetes as a hosting option on which your solution will run.

There’s also different options when Kubernetes enters the conversation – you can choose to run:

Kubernetes promises portability, scalability, and resilience. In reality, operating Kubernetes yourself is anything but simple.

Have you’ve ever wondered whether Kubernetes is worth the complexity—or how to move from experimentation to something you can confidently run in production?

Me too – so let’s try and answer that question. For anyone who knows me or has followed me for a few years knows, I like to get down to the basics and “start at the start”.

This is the first post is of a blog series where we’ll focus on Azure Kubernetes Service (AKS), while also referencing the core Kubernetes offerings as a reference. The goal of this series is:

By the end (whenever that is – there is no set time or number of posts), we will have designed and built a production‑ready AKS cluster, aligned with the Azure Well‑Architected Framework, and suitable for real‑world enterprise workloads.

With the goal clearly defined, let’s start at the beginning—not by deploying workloads or tuning YAML, but by understanding:

  • Why AKS exists
  • What problems it solves
  • When it’s the right abstraction.

What Is Azure Kubernetes Service (AKS)?

Azure Kubernetes Service (AKS) is a managed Kubernetes platform provided by Microsoft Azure. It delivers a fully supported Kubernetes control plane while abstracting away much of the operational complexity traditionally associated with running Kubernetes yourself.

At a high level:

  • Azure manages the Kubernetes control plane (API server, scheduler, etcd)
  • You manage the worker nodes (VM size, scaling rules, node pools)
  • Kubernetes manages your containers and workloads

This division of responsibility is deliberate. It allows teams to focus on applications and platforms rather than infrastructure mechanics.

You still get:

  • Native Kubernetes APIs
  • Open‑source tooling (kubectl, Helm, GitOps)
  • Portability across environments

But without needing to design, secure, patch, and operate Kubernetes from scratch.

Why Should You Care About AKS?

The short answer:

AKS enables teams to build scalable platforms without becoming Kubernetes operators.

The longer answer depends on the problems you’re solving.

AKS becomes compelling when:

  • You’re building microservices‑based or distributed applications
  • You need horizontal scaling driven by demand
  • You want rolling updates and self‑healing workloads
  • You’re standardising on containers across teams
  • You need deep integration with Azure networking, identity, and security

Compared to running containers directly on virtual machines, AKS introduces:

  • Declarative configuration
  • Built‑in orchestration
  • Fine‑grained resource management
  • A mature ecosystem of tools and patterns

However, this series is not about adopting AKS blindly. Understanding why AKS exists—and when it’s appropriate—is essential before we design anything production‑ready.


AKS vs Azure PaaS Services: Choosing the Right Abstraction

Another common—and more nuanced—question is:

“Why use AKS at all when Azure already has PaaS services like App Service or Azure Container Apps?”

This is an important decision point, and one that shows up frequently in the Azure Architecture Center.

Azure PaaS Services

Azure PaaS offerings such as App Service, Azure Functions, and Azure Container Apps work well when:

  • You want minimal infrastructure management responsibility
  • Your application fits well within opinionated hosting models
  • Scaling and availability can be largely abstracted away
  • You’re optimising for developer velocity over platform control

They provide:

  • Very low operational overhead – the service is an “out of the box” offering where developers can get started immediately.
  • Built-in scaling and availability – scaling comes as part of the service based on demand, and can be configured based on predicted loads.
  • Tight integration with Azure services – integration with tools such as Azure Monitor and Application Insights for monitoring, Defender for Security monitoring and alerting, and Entra for Identity.

For many workloads, this is exactly the right choice.

AKS

AKS becomes the right abstraction when:

  • You need deep control over networking, runtime, and scheduling
  • You’re running complex, multi-service architectures
  • You require custom security, compliance, or isolation models
  • You’re building a shared internal platform rather than a single application

AKS sits between IaaS and fully managed PaaS:

Azure PaaS abstracts the platform for you. AKS lets you build the platform yourself—safely.

This balance of control and abstraction is what makes AKS suitable for production platforms at scale.


Exploring AKS in the Azure Portal

Before designing anything that could be considered “production‑ready”, it’s important to understand what Azure exposes out of the box – so lets spin up an AKS instance using the Azure Portal.

Step 1: Create an AKS Cluster

  • Sign in to the Azure Portal
  • In the search bar at the top, Search for Kubernetes Service
  • When you get to the “Kubernetes center page”, click on “Clusters” on the left menu (it should bring you here automatically). Select Create, and select “Kubernetes cluster”. Note that there are also options for “Automatic Kubernetes cluster” and “Deploy application” – we’ll address those in a later post.
  • Choose your Subscription and Resource Group
  • Enter a Cluster preset configuration, Cluster name and select a Region. You can choose from four different preset configurations which have clear explanations based on your requirements
  • I’ve gone for Dev/Test for the purposes of spinning up this demo cluster.
  • Leave all other options as default for now and click “Next” – we’ll revisit these in detail in later posts.

Step 2: Configure the Node Pool

  • Under Node pools, there is an agentpool automatically added for us. You can change this if needed to select a different VM size, and set a low min/max node count

    This is your first exposure to separating capacity management from application deployment.

    Step 3: Networking

    Under Networking, you will see options for Private/Public Access, and also for Container Networking. This is an important chopice as there are 2 clear options:

    • Azure CNI Overlay – Pods get IPs from a private CIDR address space that is separate from the node VNet.
    • Azure CNI Node Subnet – Pods get IPs directly from the same VNet subnet as the nodes.

    You also have the option to integrate this into your own VNet which you can specify during the cluster creation process.

    Again, we’ll talk more about these options in a later post, but its important to understand the distinction between the two.

    Step 4: Review and Create

    Select Review + Create – note at this point I have not selected any monitoring, security or integration with an Azure Container Registry and am just taking the defaults. Again (you’re probably bored of reading this….), we’ll deal with these in a later post dedicated to each topic.

    Once deployed, explore:

    • Node pools
    • Workloads
    • Services and ingresses
    • Cluster configuration

    Notice how much complexity is hidden – if you scroll back up to the “Azure-managed v Customer-managed” diagram, you have responsibility for managing:

    • Cluster nodes
    • Networking
    • Workloads
    • Storage

    Even though Azure abstracts away responsibility for things like key-value store, scheduler, controller and management of the cluster API, a large amount of responsibility still remains.


    What Comes Next in the Series

    This post sets the foundation for what AKS is and how it looks out of the box using a standard deployment with the “defaults”.

    Over the course of the series, we’ll move through the various concepts which will help to inform us as we move towards making design decisions for production workloads:

    • Kubernetes Architecture Fundamentals (control plane, node pools, and cluster design), and how they look in AKS
    • Networking for Production AKS (VNets, CNI, ingress, and traffic flow)
    • Identity, Security, and Access Control
    • Scaling, Reliability, and Resilience
    • Cost Optimisation and Governance
    • Monitoring, Alerting and Visualizations
    • Alignment with the Azure Well Architected Framework
    • And lots more ……

    See you on the next post!

    Azure Lab Services Is Retiring: What to Use Instead (and How to Plan Your Migration)

    Microsoft has announced that Azure Lab Services will be retired on June 28, 2027. New customer sign-ups have already been disabled as of July 2025, which means the clock is officially ticking for anyone using the service today.

    You can read the official announcement on Microsoft Learn here: https://learn.microsoft.com/en-us/azure/lab-services/retirement-guide

    While 2027 may feel a long way off, now is the time to take action!

    For those of you who have never heard of Azure Lab Services, lets take a look at what it was and how you would have interacted with it (even if you didn’t know you were!).

    What is/was Azure Lab Services?

    Image: Microsoft Learn

    Azure Lab Services allowed you to create labs with infrastructure managed by Azure. The service handles all the infrastructure management, from spinning up virtual machines (VMs) to handling errors and scaling the infrastructure.

    If you’ve ever been on a Microsoft course, participated in a Virtual Training Days course, or attended a course run by a Microsoft MCT, Azure Lab Services is what the trainer would have used to facilitate:

    • Classrooms and training environments
    • Hands-on labs for workshops or certifications
    • Short-lived dev/test environments

    Azure Lab Services was popular because it abstracted away a lot of complexity around building lab or classroom environments. Its retirement doesn’t mean Microsoft is stepping away from virtual labs—it means the responsibility shifts back to architecture choices based on the requirements you have.

    If you or your company is using Azure Lab Services, the transition to a new service is one of those changes where early planning pays off—especially if your labs are tied to academic calendars, training programmes, or fixed budgets.

    So what are the alternatives?

    Microsoft has outlined several supported paths forward. None are a 1:1 replacement, so the “right” option depends on who your users are and how they work. While these solutions aren’t necessarily education-specific, they support a wide range of education and training scenarios.

    Azure Virtual Desktop (AVD)

    Image: Microsoft Learn

    🔗 https://learn.microsoft.com/azure/virtual-desktop/

    AVD is the most flexible option and the closest match for large-scale, shared lab environments. AVD is ideal for providing full desktop and app delivery scenarios and provides the following benefits:

    • Multi-session Windows 10/11, which either Full Desktop or Single App Delivery options
    • Full control over networking, identity, and images. One of the great new features of AVD (still in preview mode) is that you can now use Guest Identities in your AVD environments, which can be really useful for training environments and takes the overhead of user management away.
    • Ideal for training labs with many concurrent users
    • Supports scaling plans to reduce costs outside working hours (check out my blog post on using Scaling Plans in your AVD Environments)

    I also wrote a set of blog posts about setting up your AVD environments from scratch which you can find here and here.

    Windows 365

    🔗 https://learn.microsoft.com/windows-365/

    Windows 365 offers a Cloud PC per user, abstracting away most infrastructure concerns. Cloud PC virtual machines are Microsoft Entra ID joined and support centralized end-to-end management using Microsoft Intune. You assign Cloud PC’s by assigning a license to that user in the same way as you would assign Microsoft 365 licences. The benefits of Windows 365 are:

    • Simple to deploy and manage
    • Predictable per-user pricing
    • Well-suited to classrooms or longer-lived learning environments

    The trade-off is that there is less flexibility and typically higher cost per user than shared AVD environments, as the Cloud PC’s are dedicated to the users and cannot be shared.

    Azure DevTest Labs

    Image: Microsoft Learn

    🔗 https://learn.microsoft.com/azure/devtest-labs/

    A strong option for developer-focused labs, Azure DevTest labs are targeted at enterprise customers. It also has a key difference to the other alternative solutions, its the only one that offers access to Linux VMs as well as Windows VMs.

    • Supports Windows and Linux
    • Built-in auto-shutdown and cost controls
    • Works well for dev/test and experimentation scenarios

    Microsoft Dev Box

    🔗 https://learn.microsoft.com/dev-box/

    Dev Box is aimed squarely at professional developers. It’s ideal for facilitating hands-on learning where training leaders can use Dev Box supported images to create identical virtual machines for trainees. Dev Box virtual machines are Microsoft Entra ID joined and support centralized end-to-end management with Microsoft Intune.

    • High-performance, secure workstations
    • Integrated with developer tools and workflows
    • Excellent for enterprise engineering teams

    However, its important to note that as of November 2025, DevBox is being integrated into Windows365. The service is built on top of Windows365, so Micrsoft has decided to unify the offerings. You can read more about this announcement here but as of November 2025, Microsoft are no longer accepting new DevBox customers – https://learn.microsoft.com/en-us/azure/dev-box/dev-box-windows-365-announcement?wt.mc_id=AZ-MVP-5005255

    When First-Party Options Aren’t Enough

    If you relied heavily on the lab orchestration features of Azure Lab Services (user lifecycle, lab resets, guided experiences), you may want to evaluate partner platforms that build on Azure:

    These solutions provide:

    • Purpose-built virtual lab platforms
    • User management and lab automation
    • Training and certification-oriented workflows

    They add cost, but also significantly reduce operational complexity.

    Comparison: Azure Lab Services Alternatives

    Lets take a look at a comparison of each service showing cost, use cases and strengths:

    ServiceTypical Cost ModelBest Use CasesKey StrengthWhen 3rd Party Tools Are Needed
    Azure Virtual DesktopPay-per-use (compute + storage + licensing)Large classrooms, shared labs, training environmentsMaximum flexibility and scalabilityFor lab orchestration, user lifecycle, guided labs
    Windows 365Per-user, per-monthClassrooms, longer-lived learning PCsSimplicity and predictabilityRarely needed
    Azure DevTest LabsPay-per-use with cost controlsDev/test, experimentation, mixed OS labsCost governanceFor classroom-style delivery
    Microsoft Dev BoxPer-user, per-monthEnterprise developersPerformance and securityNot typical
    Partner PlatformsSubscription + Azure consumptionTraining providers, certification labsTurnkey lab experiencesCore dependency

    Don’t Forget Hybrid Scenarios

    If some labs or dependencies must remain on-premises, you can still modernise your management approach by deploying Azure Virtual Desktop locally and manage using Azure Arc, which will allow you to

    • Apply Azure governance and policies
    • Centralise monitoring and management
    • Transition gradually toward cloud-native designs

    Start Planning Now

    With several budget cycles between now and June 2027, the smartest move is to:

    1. Inventory existing labs and usage patterns
    2. Map them to the closest-fit replacement
    3. Pilot early with a small group of users

    Azure Lab Services isn’t disappearing tomorrow—but waiting until the last minute will almost certainly increase cost, risk, and disruption.

    If you treat this as an architectural evolution rather than a forced migration, you’ll end up with a platform that’s more scalable, more secure, and better aligned with how people actually learn and work today.