Control your Azure Virtual Desktop costs with Scaling Plans

Cloud Computing has changed the way we approach our enterprise infrastructure.

The amount of options available to us now means that we can finally ditch that dusty old server sitting the the bottom of the server rack (or in some cases at the back of a cupboard) for a modern secure solution that we don’t need to sit and pray in front of every time we need to restart it.

The Problem with the Cloud

But …. some people would prefer to keep old “Dusty Springfield” alive because the effort to migrate and in some cases re-architect the service is too much and too costly. And thats the thing we hear the most when a suggestion to migrate to a cloud service is raised – “the cloud is very expensive…”.

And lets be honest, it is …..

Money money money ……

There, I said it. Out Loud. In Print. Cloud Computing is expensive. There’s a helicopter hovering over my house at the minute but I’m sure its nothing to worry about ……

In all seriousness though, when scoping out a Cloud solution the first thing that is looked at is cost. You can argue as much as you want about the redundancy, the lower power and cooling costs, lack of hardware costs etc. The bean counters will look at the bottom line and say “we’re not paying that much now….”. And “Dusty Springfield” limps on defiantly in corner.

Of course, your cloud computing costs are defined by the options you select and what level of redundancy you need. Scale Sets, Storage redundancy across zones and regions. Or just keep it as locally redundant storage? Then you get into the sizing of your solutions.

How the Costs add up

Azure Virtual Desktop is one of those cool technologies that can help you provide a secure environment for your users to access Cloud or Hybrid environments in a consistent and unified experience. But because its built on underlying VMs which you need to size based on your requirements, the costs can mount up.

Lets take a look at an example of a standard Azure Virtual Desktop host pool that contains 10 Session Hosts which are delivering Remote Apps to 100 users. The Session Hosts are generally sized from the General Purpose VM type and the most common one used is the “Standard_D4s_v3”, which has 4 vCPU’s and 16GB memory.

The base cost for this VM if you create a standard Azure Virtual machine comes in at approx $160 per month.

Standard Virtual Machine Type

However, if we use this VM type for our Azure Virtual Desktop Session Hosts with Windows 10 Enterprise Multi-Session version 21H2 with Microsoft 365 Apps installed, the cost then jumps to $290 per month.

Azure Virtual Desktop Virtual Machine Type

So, lets go back to our 10 Session Hosts – at that price we’re talking $2900 per month, or just under $35000 per year. And thats for just 10 VMs in the environment. And thats why Cloud Computing is expensive! Of course, this doesn’t take into account reserved instances or spot instances, but you get the idea.

The $290 per month cost for a VM isn’t based on a cost per month – its based on 730 hours of usage or 24 hours multiplied by just over 30. This where you can start cutting into that $35000 per year cost, and where Scaling Plans applied to your Azure Virtual Desktop Host Pools can help.

Scaling Plans

Scaling Plans lets you scale your session host virtual machines (VMs) in a host pool up or down to optimize deployment costs. You can create a scaling plan based on:

  • Time of day
  • Specific days of the week
  • Session limits per session host

You follow the guidelines below when creating your scaling plan:

  • At the time of writing, you can only configure autoscale with existing Pooled host pools. This won’t work with Personal host pools
  • You must create the scaling plan in the same Azure region as the host pool you assign it to.
  • All host pools you use with autoscale must have a configured MaxSessionLimit parameter. Don’t use the default value.
  • You must grant Azure Virtual Desktop access to manage the power state of your session host VMs.

Create a custom RBAC role

Now that we know the benefits and rules, the first thing we need to do is create a custom RBAC role. This custom role and assignment will allow Azure Virtual Desktop to manage the power state of any VMs in those subscriptions. It will also let the service apply actions on both host pools and VMs when there are no active user sessions.

The steps for creating the Custom RBAC Role are as follows (this is the same for creating any Custom RBAC Role):

  • First, create a json file using whatever your favourite editor is (I’m using Sublime in this example). Save the file as avdscale.json and add the following information into it:
  • Open the Azure portal and go to Subscriptions and select a subscription that contains a host pool and session host VMs you want to use with autoscale. Select Access control (IAM). Select the + Add button, then select Add custom role from the drop-down menu.

  • On the “Basics” screen, go to Baseline permissions and browse to the avdscale.json file that you just created.
  • This will import all of your settings, so on the next screen you will see the permissions that you had specified in your json file.
  • Next, we have “Assignable Scopes”. You want to assign this at subscription level as assigning this custom role at any level lower than your subscription, such as the resource group, host pool, or VM, will prevent autoscale from working properly.
  • We can now skip to the “Review and Create” screen, as this will validate and list out our permissions for the RBAC role. Review these and then click “Create”:
  • And once thats created, we can see its been created as a Custom Role:

  • Now we need to add a Role Assignment for our RBAC Role. So we click on “Add role assignment”

  • We select our Custom RBAC role and in the members screen, we choose to assign access to a User, group or service principal. From the select members screen, search for “Windows Virtual Desktop”
  • Go to “Review and Assign” and click create:
  • And we can see that at subscription level the role has been assigned:

Create our Scaling Plan

Now that our RBAC role is done, we can create our scaling plan.

  • Open the Azure portal. In the search bar, type Azure Virtual Desktop and select the matching service entry. Select Scaling Plans, then select Create.
  • On the Basics screen, provide the following:
    • Subscription and Resource Group where the Scaling Plan will be created
    • Name
    • Location (remember this needs to be in the same region as your Host Pool)
    • Time Zone

The other entries are optional, however an important one to note is Exclusion Tags – you can use this in conjunction with Tags to excluse certain VMs from autoscaling operations

  • Click next and this will bring you to the Schedules screen. Click on Add Schedule
  • In the General screen, we enter a Schedule Name and also select the days we want the schedule to apply to.
  • In the Ramp-up screen, we specify a default starting point.
    • So in this instance, we want to have 20% (or 2 out of our 10 Session Hosts) powered on and ready to accept connections at 08:00.
    • We’ve selected “Breadth First” for Load balancing – this means users will be spread evenly across available hosts and is recommended for consistent performance.
    • Finally, we have set a Capacity threshold of 80%. If you recall, we set our hosts to accept a maximum of 10 connections. We have 2 hosts powered on, so once we reach 16 users across those 2 hosts, the next host will automatically power on.
  • Next up is Peak hours. For this we specify a starting time (which is normally when the majority of your users will be logging on) and we’ve also flipped the Load Balancing to “Depth-first”, which will load up all available hosts with user sessions (up to our 80% threshold) before bringing another one online. This is really up to you as to how you want to load balance, but as a reminder:
    • Breadth-first load balancing distributes new user sessions across all available session hosts in the host pool.
    • Depth-first load balancing distributes new sessions to any available session host with the highest number of connections that hasn’t reached its session limit yet.
  • Next up is Ramp-down, this is where we start deallocating hosts at the end of the working day and as you can see, the target is to get back down to 20% of the hosts. The important point to make here is the “Force logoff users” option. If this is enabled then the following applies:
    • This will choose the session host with the lowest number of user sessions to shut down. Autoscale will put the session host in drain mode, send all active user sessions a notification telling them they’ll be signed out, and then sign out all users after the specified wait time is over. After autoscale signs out all user sessions, it then deallocates the VM.
    • During ramp-down, autoscale will only shut down VMs if all existing user sessions in the host pool can be consolidated to fewer VMs without exceeding the capacity threshold.
  • Finally, we get to “Off-peak hours” which is the end of the “Ramp-down” period.
  • And thats our weekday schedule created. You can also go back in and create a weekend schedule where you can bring the number of hosts down to 10% and have a higher capacity threshold at weekends:
  • Once the schedules are created, we assign the Scaling Plan to our Host pool and click on “Enable autoscale”:
  • And now we can validate our options and click on “Review and create”:

Give all of this about an hour to kick in and you will see your Azure Virtual Desktop session hosts automatically deallocated as per your schedules if not in use!

Money money money ….

Earlier in this post, I gave a yearly figure of approx $35000 to run our 10 Session Host VMs. However, that figure is based on full consumption. So lets do some very quick calculations to see how our scaling plan affects that figure:

  • As we said, a single VM running at full consumption (or the full 730 hours) will cost us $290 per month.
  • Based on our schedules created above, we’re going to have 1 VM running full time for both weekdays and weekends. So thats $290 per month, or $3,480 per year.
  • We’re then guaranteed to have 1 VM running from Monday until Friday for 24 hours, and also on weekends for 12 hours each day (depending on how schedule is created). Thats effectively 6 days a week instead of 7. So we need to calculate that over a year which is a case of getting 6/7ths of our full price figure. Thats coming in at $2,983 per year for that VM.
  • Now, its back to the other 8 VMs and the 100 users who are using this. “If” those 100 users are logged on, the other 8 VMs will be up for 12 hours a day from Monday to Friday only as per our schedule. So for that, we need to get 5/7ths of our full price figure (which is $2,486) and then half it because we’re only using for 12 hours a day (and thats coming in at $1,243 per VM).

In summary, what we’ve got is:

  • $3,480 – 1 VM at full consumption
  • $2,983 – 1 VM at slightly reduced consumption for weekdays and weekends
  • $9,944 – 8 VMs running for 12 hours a day from Monday to Friday

Add those figures up and you get a total of $16,407. And we need to remember, that figure doesn’t available cost reductions like Reserved Instances or Hybrid Benefit.


So by implementing a Scaling Plan for the Host pool above, we’ve saved ourselves nearly $20,000. Again I’m going to stress the figures I’m quoting here are approximate, may not represent what you see in your own personal or enterprise subscriptions, and should not be taken as exact savings. Make sure to speak to your Microsoft TAM or Cloud Service Provider for more details. You can find out more about scaling plans here.

Hope you enjoyed this post, until next time!

Is it the (long overdue) end of the road for on-premises Exchange Servers?

A few weeks ago, I posted a Wired.com article on my LinkedIn feed entitled “Your Microsoft Exchange Server Is a Security Liability” by Andy Greenburg.

Image Credit – Priasoft

It was a great article that was released on the back of the most recent Exchange security vulnerability: this time the ProxyNotShell Zero-Day which oddly enough took almost 2 months to patch correctly. This has been released as part of the November Patch Tuesday release, and there are a few pre-requisites required (basically, be at the latest CU version for your Exchange environments and then apply the patch).

Image Credit – Microsoft

Its the latest in a long line of Exchange Server vulnerabilities. And its interesting to note this line in the Microsoft Tech Community Article that states:

These vulnerabilities affect Exchange Server. Exchange Online customers are already protected from the vulnerabilities addressed in these SUs and do not need to take any action other than updating any Exchange servers in their environment.

Well, of course Exchange Online isn’t affected. And in his Wired article, Andy Greenburg makes the point that Microsoft are happy to put all of their security efforts into protecting their Exchange Online services and customers as that makes up the majority of their customer base.

A brief history of Exchange Online

If we look back on the history of Exchange Online, it all started with BPOS way back in 2008. At the time of release, Microsoft had been privately offering customers a hosted email service since early 2007. That was around the time that Exchange Server 2007 was released, and it was also the time when Exchange started to get really complicated as regards the amount of different server roles involved and the overhead involved in maintaining them.

Now lets just put one thing on record. I would never dream of believing that Microsoft would conspire to over-complicate an on-premises solution with the intention of pushing more customers towards a cloud offering. I mean, they wouldn’t, would they?

There was always an option for having a Front-End sever separate, and the solution could sometimes be integrated with the long gone but not forgotten ISA Server.

A look at the diagram below shows us the evolution of how Exchange roles have changed since 2000/2003 versions, and have pretty much rolled back into less complicated instances with the release of 2016/2019 versions:

Image Credit – devco.re

Whether Microsoft intended to make Exchange Server more complicated or not, segregation of those roles was was needed due to the evolution of security threats and the rate of attacks that were happening on Exchange Server installations. What it did though was make Exchange a monster to manage from an adminstration perspective. Almost to the point that it made the decision to migrate to Exchange Online easier, as it offset the cost for some organisations of hiring a full time Exchange Administrator to manage that environment.

So I should Migrate?

The easy answer to that is yes, you should migrate. There’s a number of factors to take into consideration in answering that question:

  • As we saw in the recent ProxyNotShell Zero-Day and the length of time it took to remediate, Microsoft really doesn’t care about on-premises Exchange anymore. From Andy’s Wired article, the quote from Microsoft states that: "We strongly recommend customers migrate to the cloud to take advantage of real-time security and instant updates to help keep their systems protected from the latest threats".
  • The recent announcement that the next CU release will only be for Exchange Server 2019 (CU13). Because 2013 (which goes EOL in April 2023) and 2016 are now in Extened support, there will only be Security Updates released as required (such as the patch for the Zero-Day). But in order to install that and to get support from Microsoft, you must be in the most recent (and last) CU version.
  • There hasn’t been an Exchange Server 2022 release yet. This was touted as being released in late 2021, and early indication were that this would be a subscription based service. The latest update on this was released in this post in June 2022, where the updated roadmap is to release the next Exchange Server version in 2025. Are we really prepared to wait that long if the vulnerabilities continue at this rate? Again, the interesting quote to take ouit of this release is: The next version will require Server and CAL licenses and will be accessible only to customers with Software Assurance, similar to the SharePoint Server and Project Server Subscription Editions.
  • If you decide to migrate to Exchange Online, what does your business want to get out of the migration? Its the question thats rarely asked but its the most important one for any migration scenario. Because unlike 15 years ago when it was hosted Email and SharePoint with Live Meetings thrown in, Microsoft 365 is an extensive offering of Apps, Services and Licencing options and can open a gateway to a full cloud migration if planned correctly.
  • You can go for the Basic plans such as Business Basic or Office 365 E1 and “just” have Email, Sharepoint and Teams if you want. But go a little further, you take Office licensing into the equation, and maybe Defender, and then maybe Azure Virtual Desktop rights. The opportunities are there, it’s not just about lifting and shifting the tech anymore. You can check out my previous post on the different licensing options here.

Why can’t everyone just migrate to Exchange Online?

The majority of companies have already migrated to Exchange – nearly 350 million Office365 users running over 7 billion (yes, billion) mailboxes running on 300,000 Exchange Online instances on servers running in Microsoft Datacenters across the world.

There are those special cases who still need Exchange Servers On-Premises, and those servers need to be hardened or have specialist teams supporting them.

Then there are those companies that have specific Data Residency requirements. And thats really all they say ….. "We're not moving our data into the Cloud". It shows a lack of understanding of how Data Residency in Exchange Online works. Depending on where you are in the world, you can find out on this site the different options for where your Microsoft 365 data would be stored post migration, depending on the options you select at tenant creation and also in what datacenters the services are available around the world (for example, Forms is not available in all datacenters, only some US ones).


Having your data secured by Microsoft is better than having your data potentially exposed because of a mistrust or misunderstanding of what the cloud can offer as regards data residency. You also have the admin overhead of managing and securing your Exchange environment.

I think its the end of the road for Exchange Server – while a migration amy sound painful to some, a compromised server is much worse.

Hope you enjoyed this post, until next time!

Microsoft Ignite 2022 – Highlights of the Announcements (with a few personal opinions thrown in)!

For this year’s Microsoft Ignite, in-person conferences were held in cities around the world after two years of being online and I was fortunate enough to attend the Manchester Spotlight event last week.

At the conference Microsoft had their usual presentations, ‘Ask the Expert’ sessions, exhibition areas and a Cloud Skills Challenge. But of course it’s the announcements that everyone looks forward to the most, where improvements, changes and updates to the various technologies in the Microsoft product portfolio are revealed.

I’ve picked out my top highlights below!

  • Azure Stack HCI

I’m on both sides of the fence about the Azure Stack HCI announcements.

I love the Azure Stack HCI product and have been using it since the days when it was called Storage Spaces Direct and ran on Hyper-Converged Infrastructure in on-premises datacenters. As it has evolved, Microsoft has invested heavily in the Azure Stack HCI product, which allows you to run Azure Managed Infrastructure in your own datacentres and combine on-premises infrastructure with Azure Cloud Services.

One of the big announcements was around licensing, and gives Enterprise Agreement customers with Software Assurance the ability to exchange their existing licensed cores of Windows Server Datacentre to get Azure Stack HCI at no additional cost. This includes the right to run unlimited Azure Kubernetes Service and unlimited Windows Server guest workloads on the Azure Stack HCI cluster.

Speaking of Kubernetes, support for Azure Kubernetes Service on Azure Stack HCI is now available, meaning you can deploy and manage containerised apps side-by-side with your VMs on the same physical server or cluster. You can also now make provisioning for hybrid AKS clusters directly from Azure onto your Azure Stack HCI using Azure Arc

On the hardware side, you could previously purchase validated hardware for multiple vendors but in early 2023, Microsoft will begin offering an Azure Stack HCI integrated system based on hardware that’s designed, shipped, and supported by Microsoft (in partnership with Dell). 

This will be available in several configurations:

I mentioned both sides of the fence above, and the licensing announcement is one of the worrying ones, because like the recent announcements that Defender for Servers requires an Azure Subscription (Microsoft Defender for Endpoint (Server Version) is no longer available on the EA price list), we’re now potentially going down the route of Microsoft only allowing Windows Server Datacenter to run on Azure Stack HCI accredited hardware. Or potentially getting rid of the Windows Server Datacenter SKU entirely and having it as a “cloud-connected only” product. Only time will tell.

  • Azure Savings Plan for Compute

Azure Savings Plan for Compute is based on consumption, and allows you to by a one- or three-year savings plan and commit to a spend of $5 per hour per virtual machine (VM). This is based on Azure Advisor Recommendations in the Cost Management and Billing section of the Azure Portal.

Once purchased, this is applied on a hourly basis based on consumption and even if you go above the $5 spend, the initial commitment is still billed at the lower rate and any additional consumption is billed at a Pay-As-You-Go rate.

The main difference between this and Reserved Instances is that Reserved Instances is an up-front commitment whether the VM is powered on or not. Azure Savings Plan for Compute unlocks those lower savings based on consumption.

You can find more details in this article on the Microsoft Community Hub.

  • Azure Virtual Machine Scale Sets – Mixing Standard and Spot instances

Staying on the Cost Savings topic, you can now specify a % of Spot Instance VMs that you wish to run in a VM Scale Set.

This feature (which is in Preview) allows you to reduce compute infrastructure costs by leveraging the deep discounts that Spot VMs can provide while maintaining the compute capacity your workload needs. 

More information can be found here.

  • Microsoft 365 updates

A huge number of announcements were made about Microsoft 365 at this year’s Ignite, most notably:

  • The release of the Microsoft 365 app, which will replace the Office Mobile and Office for Windows App for all Microsoft 365 customers who use this as part of their subscriptions.
  • Teams Premium, which will be available to E5 subscriptions and will bring enhanced meeting features such as insights and live translation in more than 40 languages so that participants can read captions in their own language.
  • Microsoft Places, which will assist with the hybrid working model and let everyone know who will be in the office at what times, where colleagues are sitting, what meetings to attend in person; and how to book space on the days your team is planning to go into the office.

The Teams announcements are great, in particular the live translation option. For us as a multi-national and multi-language organisation, this is a massive step in fostering the inclusion of all users. There is an assumption in the world that spoken English is the native language of Tech, but it’s not everyone’s first language.

  • Microsoft Intune

Microsoft Endpoint Manager is being renamed to Microsoft Intune, which is what it was called before it was renamed to Endpoint Manager. This effectively bundles all Endpoint Management tools under a single brand, including Microsoft Configuration Manager. Some of the main features announced were:

  • ServiceNow Integration
  • Cloud LAPS for Azure Virtual Machines
  • Update Policies or MacOS and Linux Support
  • Endpoint Privileged Management – no more permanent admin permissions on devices!

For me, Endpoint Privileged Management is huge addition which removes the need for any permanent administrative permissions on devices. Cloud LAPS is also a huge security step.

  • Security

Finally on to Security, which was a big focus this year. This year’s updates to the Microsoft Security portfolio coincided with the announcement that Microsoft is now recognised as a leader in the Gartner Magic Quadrant for Security Information and Event Management.

First and foremost is Microsoft’s announcement of a limited-time sale of 50% off Defender for Endpoint Plan 1 and Plan 2 licenses, allowing organisations to do more and spend less by modernising their security with a leading endpoint protection platform. The offer runs until June 2023.

Microsoft 365 Defender now automatically disrupts ransomware attacks. This is possible because Microsoft 365 Defender collects and correlates signals across endpoints, identities, emails, documents and cloud apps into unified incidents and uses the breadth of signal to identify attacks early with a high level of confidence. Microsoft 365 Defender can automatically contain affected assets, such as endpoints or user identities. This helps stop ransomware from spreading laterally.

A number of new capabilities have been announced for Defender for Cloud:

  • Microsoft Defender for DevOps: A new solution that will provide visibility across multiple DevOps environments to centrally manage DevOps security, strengthen cloud resource configurations in code and help prioritise remediation of critical issues in code across multi-pipeline and multicloud environments. With this preview, leading platforms like GitHub and Azure DevOps are supported and other major DevOps platforms will be supported shortly.
  • Microsoft Defender Cloud Security Posture Management (CSPM): This solution, available in preview, will build on existing capabilities to deliver integrated insights across cloud resources, including DevOps, runtime infrastructure and external attack surfaces, and will provide contextual risk-based information to security teams. Defender CSPM provides proactive attack path analysis, built on the new cloud security graph, to help identify the most exploitable resources across connected workloads to help reduce recommendation noise by 99%.
  • Microsoft cloud security benchmark: A comprehensive multicloud security framework is now generally available with Microsoft Defender for Cloud as part of the free Cloud Security Posture Management experience. This built-in benchmark maps best practices across clouds and industry frameworks, enabling security teams to drive multicloud security compliance.
  • Expanded workload protection capabilities: Microsoft Defender for Servers will support agentless scanning, in addition to an agent-based approach to VMs in Azure and AWS. Defender for Servers P2 will provide Microsoft Defender Vulnerability Management premium capabilities.

If you’d like to read more about Microsoft’s Ignite announcements from the conference, then go to Microsoft’s Book of News here.

Hope you enjoyed this post, until next time!

MFA and Conditional Access alone won’t save us from Threat Actors

In the end of a week where we have had 2 very different incidents at high profile organisations across the globe, its interesting to look at these and compare them from the perspective of incident response and the “What we could have done to prevent this from happening” question.

Image Credit – PinClipart

Lets analyze that very question – in the aftermath of the majority of cases, the “What could we have done to prevent this from happening” question invariably leads in to the next question of “What measures can we put in place to prevent this from happening in the future”.

The problem with the 2 questions is that they are reactive and come about only because the incident has happened. And it seems that in both incidents, the required security systems were in place.

Or were they?

A brief analysis of the attacks

  • Holiday Inn

If we take the Holiday Inn attack, the hackers (TeaPea) have said in a statement that:

"Our attack was originally planned to be a ransomware but the company's IT team kept isolating servers before we had a chance to deploy it, so we thought to have some funny [sic]. We did a wiper attack instead," one of the hackers said.

This is interesting because it suggests that the Holiday Inn IT team had a mechanism to isolate the servers in an attempt to contain the attack. The problem was that once the attackers were inside their systems and they realized that the initial scope that their attack was based on wasn’t going to work, their focus changed from Cybercriminals who were trying to make a profit to Terrorism, where they decided to just destroy as much data as they could.

Image Credit – Northern Ireland Cyber Security Centre

Essentially, the problem here is two-fold – firstly, you can have a Data Loss Prevention system in place but its not going to report on or block “Delete” actions until its too late or in some cases not at all.

Second, they managed to access the systems using a weak password. So (am I’m making assumptions here), while the necessary defences and intrusion-detection technologies may have been in place, that single crack in the foundations was all it took.

So the how did they get in? The 2 part of their statement shown below explains it all:

TeaPea say they gained access to IHG's internal IT network by tricking an employee into downloading a malicious piece of software through a booby-trapped email attachment.

The criminals then say they accessed the most sensitive parts of IHG's computer system after finding login details for the company's internal password vault. The password was Qwerty1234.

Ouch ….. so the attack originated as a Social Engineering attack.

  • Uber

We know a lot more about the Uber hack and again this is a case of an attack that originated with Social Engineering. Here’s what we know at this point:

  1. The attack started with a social engineering campaign on Uber employees, which yielded access to a VPN, in turn granting access to Uber’s internal network *.corp.uber.com.
  2. Once on the network, the attacker found some PowerShell scripts, one of which contained hardcoded credentials for a domain admin account for Uber’s Privileged Access Management (PAM) solution.
  3. Using admin access, the attacker was able to log in and take over multiple services and internal tools used at Uber: AWS, GCP, Google Drive, Slack workspace, SentinelOne, HackerOne admin console, Uber’s internal employee dashboards, and a few code repositories.

Again, we’re going to work off the assumption (and we need to make this assumption as Uber had been targeted in both 2014 and 2016) that the necessary defences and intrusion detection was in place.

Once the attackers gained access, the big problem here is the one thats highlighted above – hardcoded domain admin credentials. Once they had those, they could then move across the network doing whatever they pleased. And undetected as well, as its not unusual for a domain admin account to have multiple access across the network. And it looks like Uber haven’t learned from their previous mistakes, because as Mackenzie Jackson of GitGuardian reported:

“There have been three reported breaches involving Uber in 2014, 2016, and now 2022. It appears that all three incidents critically involve hardcoded credentials (secrets) inside code and scripts”

So what can we learn?

What these attacks teach us is that we can put as much technology, intrusion and anomaly detection into our ecosystem as we like, but the human element is always going to be the one that fails us. Because as humans, we are fallible. Its not a stick to beat us with (and like most, I do have a lot of sympathy for those users in Uber, Holiday Inn and all of the other companies who have been victim to attakcs that began with Social Engineering).

Do we need constant training and CyberSecurity programmes in our organisations to ensure that our users are aware of these sorts of attacks? Well, they do now at Uber and Holiday Inn but as I said at the start of the article, this will be a reactive measure for these companies.

The thing is though, most of these programmes are put in as “one-offs” in response to an audit where a checkbox is required to say that such user training has been put in place. And once the box has been checked, they’re forgotten about until the next audit is needed.

We can also say that the priveleged account management processes failed in both companies (weak passwords in one, hardcoded credentials in another).


Multi-Factor Authentication. Conditional Access. Microsoft Defender. Anomaly Detection. EDR and XDR. Information Protection. SOC. SIEM. Priveleged Identity Management. Strong Password Policies.

We can tech the absolute sh*t out of our systems and processes, but don’t forget to train and protect the humans in the chain. Because ultimately when they break, the whole system breaks down.

And the Threat Actors out there know this all too well. They know the systems are there, but they need a human to get them past those walls. MFA and Conditional Access can only save us for so long.

100 Days of Cloud – Day 100: The End of the Beginning

Its Day 100 of my 100 Days of Cloud Journey.

Day 100….. I’ve made it! So its time to reflect on the journey, look back at what prompted me to do it, the original goals, how that changed over time and remind myself that this is definitely not the end!

Back before the Start…..

Before we start into the why’s of how “100 Days” came about, we need to go back to a different time for us all – March 2020, when none of us knew what was around the corner despite the reports that something wasn’t well in a part of China that none of us had ever heard of.

The story starts with me on my way home from my brothers’ wedding in Melbourne, and while waiting for the connection during the stopover in Dubai I came across an article informing me that Microsoft were retiring the MCSE certification for good in July 2020, and that there would be no 2019 version of the cert as it was all moving to Azure. I made note of the article and like most articles that interested me, I bookmarked it for future reading and probably emailed myself a copy to remind me to revisit it.

Damn you MCSE retirement!

And therein lies the problem – like most IT people, I have lots of great ideas and intentions and save them away for future reference. Its getting back to them and actually doing them thats the problem.

So anyway – 5 days after arriving home, the country went into lockdown and I was consigned to the makeshift desk in spare room. A few weeks went by, and having become Netflix-man and gotten bored with it, I was doom-scrolling through my emails late one night and came across the MCSE email I’d sent to myself.

It bothered me because I was using the technology on a daily basis,a nd also because I hadn’t pushed myself into doing an exam since I took one of the MCSE 2012 exams 3 years previously. So I have 3 months – I’ll at least get the MCSA portion done, right?

Not quite – I managed to clear the first 2 exams, but then Microsoft threw me a lifeline by extending the deadline to January 2021. Suddenly it was achievable again but I didn’t want to rest on my laurels and become complacent, so I pushed on. MCSA was achieved in July, and the MCSA duly completed by August. So goal achieved!

2 things happened in between MCSA and MCSE.

Firstly, I signed up to Cloudskills.io after seeing a Google Ad offering their Azure Admin Associate Course content for $7. I then signed up for the full platform after subscribing to their Podcast and realising that I needed to know the Fundamentals of Azure before diving deeper.

Secondly, I came across the 100 Days of Cloud website and Github hosted by people like Andrew Brown and Gwyneth Pena Siguenza. Wasn’t really ready for that jump yet, but did my usual bookmark and email to myself for future reference…..

2020 quickly became 2021, and by mid-2021 a number of things had happened:

  • I’d signed up for my free Azure Account and was experimenting with the services on offer.
  • Based on this I’d passed the Az-900 (Azure Fundamentals), AZ-104 (Azure Admin Associate), AZ-140 (Azure Virtual Desktop) and SC-300 (Identity and Access).
  • I’d gone deeper into CloudSkills.io, joined their Community, and started attending User Groups remotely.
  • I’d also changed job and started working for Ekco, a growing MSP based in Dublin.

The Driver behind 100 Days

So over the course of Summer 2021 I was attending user groups and getting involved in Cloudskills.io, and got the opportunity to meet Mike Pfeiffer on a call. I’ve always been a Mike Pfeiffer fan-boy, right back to the old days when he was blogging about Active Directory and Exchange right up to his Pluralsight content that I had used during my MCSE studies. During our conversation, Mike asked me 2 questions:

  • Are you producing any content?
  • Why not (response to answer provided)?

This introduced me to the concept of SODOTO, or:

  • See One – observe someone else teaching you about something.
  • Do One – can you do it yourself based on the teachings above.
  • Teach One – can you get a deep understanding of what you’ve learned and teach that back to an audience in either video or blog format so that they understand it.

This led me to start a blog and the original series on Monitoring Docker Containers with InfluxDB and Grafana. And that got me into the blogging bug for a few months – release a blog every week was the goal.

When that series finished, I wrote a few other smaller blogs, but eventually needed another goal – a longer term one that I could commit to. I’d always wanted to go deeper into Azure and learn more about all of the services that were offered. I played about in my own tenant and through Bootcamps had dived a bit deeper. But it was a big monster, how was I going to do it?

And during another one of my late night doom-scrolling sessions, I came across the 100 Days bookmark that I’d saved the previous year. And the lightbulb in my head turned on …

100 Days

And so I started. I knew I could transfer some of the skills I’d already learned across, so I started small and went for the basic IaaS stuff that I knew well.

At the start, the idea was that I would do 100 Days straight. It became very clear to me around Day 12 that is wasn’t going to be possible because I was doing this as a SODOTO model, and if I had tried to do it I was going to crash and burn quickly and regularly.

Thats the key, and a great piece of advice – learn at a pace that you are comfortable with and can sustain. Don’t rush it, it just won’t go in.

At the start, I was doing this for me – as a challenge that I wanted to finish, and doing it in the open held me accountable. It also meant that family, friends, work colleagues and social connections could enquire and joke about when the next blog was coming out. It wasn’t about likes or followers. It was about me learning and tying together components of the Azure and other cloud ecosystems and how they connect.

And at that point it evolves into being not just about me, but about the followers and giving something back to them and to the wider community.

Lets take Azure Virtual Desktop as an example – I have experience of working with Citrix, so the concepts are pretty transferrable. But think about all of the underlying concepts you need to know:

  • Virtual Machines
  • Storage
  • Authentication
  • MFA
  • Identity
  • Desktop and App Management

You very quicky realize that although all of those are standalone service offerings in Azure, they are not just intertwined in Azure Virtual Desktop but in hundreds of other Azure services. And knowing them as a baseline will give you a better understanding when you go to learn the rest of the services!

Time to give Thanks!

There were times when I never thought I’d reach this goal and doubted myself, but I had some unbelievable support and encouragement along the way.

Firstly and most importantly my wife and family, who put up with me disappearing back to the laptop most evenings and tolerated the late nights where I screamed curses at deployments that had gone wrong. Also for giving me “the eye” every time I flopped down on the couch in the evenings and encouraged me to embrace the challenge and keep going.

To my friends and work colleagues who kept me going with their encouragement, banter and interest in the blog. I’m not going to name you all becasue I’m sure to forget someone, but you’ve all been brilliant.

To my mentors. The opportunity to get to know people like Mike Pfeiffer, Robin Smorenburg, Derek Smith and Kevin Evans, and to be able to pick their brains and get tips and encouragement from them has been mind-blowing. There are many more who I haven’t mentioned, particularly the gang over at Cloudskills.io. You guys are all awesome and you know it – any success I have is down to you.

To the community who have chipped in with words of encouragement and support along the way. To people like Michal Marchlewski, Karl Cooke, Gregor Suttie, Daniel McLoughlin,John Lunn and many more – thanks for reaching out and for the support guys, it really meant a lot.

Finally to everyone who has read the blog and gotten in contact with messages of support and telling me that the blog has helped them and been useful to them. Thank you from the bottom of my heart, even helping one person would have made it all worthwhile, but the response has been genuinely amazing.

Conclusion and What happens next!

What happens next is I’m going to take a break from blogging for a few weeks! I’m going to Scottish Summit on June 10th, so if you see me there please do come over and stay hello! Or please feel free to reach out to me on my social channels. Once I get back from Scotland, I’ll come up with the next challenge, whatever that may be!

I hope you’ve enjoyed the 100 Days as much as I have and have found it useful. As the title says, this is not the end, its just the end of the beginning of the journey.

Until next time!

100 Days of Cloud – Day 99: Microsoft Build 2022

Its Day 99 of my 100 Days of Cloud journey and in todays post we’ll take a quick look at some of the announcements coming out of Microsoft Build.

Microsoft Build is an annual event that is primarily focused on the development side of the Microsoft ecosystem, however like all Microsoft events there are normally some really cool announcements around new technologies and updates to existing technologies.

I’m going to focus particularly on updates to the technologies that I’ve blogged about over the last 99 days! In effect, I’m providing some updates to the blog posts so that if you’ve followed me on the journey this far, you’ll get to here and have the latest news and features!

Azure Container Apps

Azure Container Apps is now Generally Available. This enables you to run microservices and containerized apps on a serverless platform.

Common uses of Azure Container Apps include:

  • Deploying API endpoints
  • Hosting background processing applications
  • Handling event-driven processing
  • Running microservices

Applications built on Azure Container Apps can dynamically scale based on the following characteristics:

  • HTTP traffic
  • Event-driven processing
  • CPU or memory load

We looked at Azure Container instances on Day 82. The key differences between the 2 are:

  • If you need to spin up multiple container (e.g. front end / backend / database), Azure Container Apps is a better choice as it comes with Dapr (Distributed Application Runtime) and it will auto retry the requests and add some telemetry data.
  • If you just need long running jobs or you don’t need multiple containers to communicate with each other, you can go with Azure Container Instances.

You can check out the blog post announcement here, and the offical Microsoft Docs page here for more information.

Azure Cosmos DB

We looked at Azure Cosmos DB back on Day 64 and learned that it is a fully managed NoSQL database provides high availability, globally-distributed access to data with very low latency. There are a number of APIs to choose from that best meets the needs of your database requirements.

Some of the new featres announced for CosmosDB are:

  • Increased serverless capacity to 1 TB.
  • Shared throughput across database partitions.
  • Support for hierarchical partition keys.
  • An improved 30-day free trial experience, now generally available, and support for MongoDB data in the Azure Cosmos DB Linux desktop emulator.
  • A new, free, continuous backup and point-in-time restore capability enables seven-day data recovery and restoration from accidental deletes
  • Role-based access control support for Azure Cosmos DB API for MongoDB offers enhanced security.

You can find out more about the Cosmos DB enhancements here.

Azure Stack HCI

Its timely that we only looked at Azure Stack HCI on Day 95 and commented that your Azure Stack HCI Cluster can contain between 2 and 16 physical servers.

The new single node Azure Stack HCI, now generally available, fulfills the growing needs of customers in remote locations while maintaining the innovation of native integration with Azure Arc. It offers customers the flexibility to deploy the stack in smaller spaces and with less processing needs, optimizing resources while still delivering quality and consistency.

Additional benefits include:

  • Smaller Azure Stack HCI solutions for environments with physical space constraints or that do not require built-in resiliency, like retail stores and branch offices.
  • A smaller footprint to reduce hardware and operational costs.
  • The same scale applies, so you can start at 1 and scale up to 16 nodes if required.

You can find out more about the AZure Stack HCI announcement here.

Azure Migrate

On Day 18 we looked at Azure Migrate, which is an Azure technology which automates planning and migration of your on-premise servers from Hyper-V, VMware or Physical Server environments.

Enhancements to the service now streamline and simlify cloud migration and modernization:

  • Agentless discovery and grouping of dependent Hyper-V virtual machines (VMs) and physical servers to ensure all required components are identified and included during a move to Azure. This feature is generally available.
  • Azure SQL assessment improvements for better customer experience. Assessments now include recommendations for SQL Server on Azure VMs and support for Hyper-V VMs and physical stacks, along with already existing assessments for Azure SQL Managed Instance and Azure SQL Database. This feature is in preview.
  • Pause and resume of migration function has been included to provide control over the migration window. This mechanism can be used to schedule migrations during off-peak periods. This feature is in preview.
  • Discovery, assessment and modernization of ASP.NET web apps to native Azure Application Service. Customers can discover and modernize an ASP.NET web app to Azure Kubernetes Service (AKS) Application Service Container and discover Java apps running on Apache Tomcat.


So thats a quick rundown of the main updates from Microsoft Build. You can find information on all of the updates that were released here in the Microsoft Build Book of News, and its also not too late to register and watch some of the recorded and on-demand sessions from Microsoft Build by signing up here.

As with all Microsoft Conferences, there’s a CloudSkills Challenge and you have until June 21st to sign up and complete the modules from one of the 8 challenges are available. As always, you can earn a free certification exam pass if you complete the challenge! You can sign up here and the list of rules and exams eligible is here!

Hope you enjoyed this post, until next time!

100 Days of Cloud – Day 98: Azure Bicep

Its Day 98 of my 100 Days of Cloud journey and in todays post we’ll take a quick look at Azure Bicep.

Azure Bicep is a domain-specific language (DSL) that uses a declarative syntax to deploy Azure resources. In a Bicep file, you define the infrastructure you want to deploy to Azure, and then use that file throughout the development lifecycle to repeatedly deploy your infrastructure. Your resources are deployed in a consistent manner.

Bicep v JSON

We’ve seen Azure Resource Manager Templates and how they can be used to define your infrastructure based on JSON Templates. Bicep is part of the Azure Resource Managet Template family – the difference is that Bicep is a launguage that uses .bicep files instead of .json files.

If we take a look at the differences between the 2 – below is a JSON template where we want to deploy a Storage Account:

And here we have a Bicep file deploying the same storage account:

You can see the difference in file size and the simpler syntax in use with Bicep over JSON. However, when you build Bicep templates and perform a deployment operation, it will transpile into an ARM template, and then Resource Manager will go and deploy your resources to Azure. So effectively the runtime is unchanged; Bicep only provides an abstract layer and reduces the pain of working with JSON.

You can also use the Bicep Playground to view Bicep and equivalent JSON side by side. This will allow you can compare the implementations of the same infrastructure. You can also decompile an existing ARM template to Bicep, see Decompiling ARM template JSON to Bicep.

Benefits of Azure Bicep

  • Authoring experience: When you use the Bicep Extension for VS Code to create your Bicep files, you get a first-class authoring experience. The editor provides rich type-safety, intellisense, and syntax validation.
  • Repeatable results: Repeatedly deploy your infrastructure throughout the development lifecycle and have confidence your resources are deployed in a consistent manner. Bicep files are idempotent, which means you can deploy the same file many times and get the same resource types in the same state. You can develop one file that represents the desired state, rather than developing lots of separate files to represent updates.
  • Orchestration: You don’t have to worry about the complexities of ordering operations. Resource Manager orchestrates the deployment of interdependent resources so they’re created in the correct order. When possible, Resource Manager deploys resources in parallel so your deployments finish faster than serial deployments. You deploy the file through one command, rather than through multiple imperative commands.
  • Modularity: You can break your Bicep code into manageable parts by using modules. The module deploys a set of related resources. Modules enable you to reuse code and simplify development. Add the module to a Bicep file anytime you need to deploy those resources.
  • Integration with Azure services: Bicep is integrated with Azure services such as Azure Policy, template specs, and Blueprints.
  • Preview changes: You can use the what-if operation to get a preview of changes before deploying the Bicep file. With what-if, you see which resources will be created, updated, or deleted, and any resource properties that will be changed. The what-if operation checks the current state of your environment and eliminates the need to manage state.
  • No state or state files to manage: All state is stored in Azure. Users can collaborate and have confidence their updates are handled as expected.
  • No cost and open source: Bicep is completely free. You don’t have to pay for premium capabilities. It’s also supported by Microsoft support.


  • Limited to Azure — Bicep isn’t going to fly if someone is using multi-cloud and wants to use the same language across multiple cloud providers. This where Terraform has the advantage in this space.
  • Learning Curve — Bicep is basically a new language that expects some learning and understanding in spite of being very simple. Most of the users can prefer to use JSON instead, and if you are familiar with traditional JSON ARM Templates you may decide to stick with that.


Azure Bicep is an exciting technology that promises to make deployments easier if you are using only Azure. There are some great resources out there to start your learning journey:

Hope you all enjoyed this post, until next time!

100 Days of Cloud – Day 97: Azure Terrafy

Its Day 97 of my 100 Days of Cloud journey and in todays post we’ll take a quick look at Azure Terrafy.

Azure Terrafy is a tool which you can use to bring your existing Azure resources into Terraform HCL and import it into Terraform state.

As we saw back on Day 36, Terraform state acts as a database to store information about what has been deployed in Azure. When running terraform plan, the tfstate file is refreshed to match what it can see in the environment. And if you use Terraform to manage resources, you should only use Terraform and not any other deployment tools as this can cause issues with your terraform configuration files and in turn issues with your Azure resources.

But if we’ve deployed infrastructure using means other than Terraform (such as ARM Templates, Bicep, PowerShell or manually using the Azure Portal), its difficult to keep those resources in a consistent state of configuration.

And thats where Azure Terrafy comes to the rescue!

So lets do a quick demo – on my local device I have the latest Terraform version installed and have also downloaded the Azure Terrafy binaries from GitHub. I also have Azure CLI installed and have authenticated to my subscription:

I’ve deployed a Resource Group in Azure and created a VM:

Let run aztfy.exe and see what options its giving us:

The main thing we see here is that we need to specify the resource group. So, we’ll run c:\aztfy\aztfy.exe md-aztfy-rg

Note – C:\aztfy is where I’ve downloaded the Azure Terrafy binary file to, the location C:\md-aztfy-rg that I’m running the command from is an empty directory where I want to store my terraform files once they get created.

Thats a good sign ….. so its initializing and is now interrogating the resource group to see what resources exist and if it can import them.

Once thats done, we get presented with this screen:

As we can see, the first line is the resource that has been identified, the second is the terraform provider that has been identified as being a match for the resource. As we can see, the majority have been identified except for one which is the virtual machine. If we scroll down using the controls listed at the bottom of the screen, there is an option to “show recommendation”.

For this one, its telling me no resource recommendation is available. Thats OK though, because we can hit enter and type in the correct resource:

Once thats done, we click enter to save that line, and then hit choose the option to import:

And as we can see thats started to import our configuration. And eventually we’ll get this screen:

And once thats finished we’ll see this:

So now lets open that directory from Visual Studio Code, and we’ll open the terraform.tfstate file:

Ok, so that looks great and everything looks to be good. But we need to test, so we’ll run terraform plan to see if its worked:

And its telling me my infrastructure matches the configuration! So we can now manage the resources using Terraform!


Azure Terrafy is in the early stages of its development, but we can see that its a massive step forward for those who want to manage their existing resources using Terraform.

There are some great resources out there on Azure Terrafy:

Hope you enjoyed this post, until next time!

100 Days of Cloud – Day 96: Azure Chaos Studio

Its Day 96 of my 100 Days of Cloud journey and in todays post we’ll take a quick look at Azure Chaos Studio and how it can test resiliency in your resources.

Before we delve into the capabilities of Azure Chaos Studio, lets take a step back and define Chaos Engineering and look at some examples of where this would have been useful traditionally.

Chaos Engineering

Chaos Engineering is one of the trending topics in the industry at present, and the ability to test resiliency in deployments is one of the most important steps you will take in ensuring that your Resources will scale correctly or recover from unforseen events such as high loads or unexpected loss of a cluster node.

I’m going to use one of my favourite examples to illustrate this, and thats an on-premises Exchange installation running in a highly available DAG configuration:

Image Credit – Microsoft

This is a standard DAG configuration with 2 Exchange Servers on each site, and a cloud Witness Server in Azure. There are also 2 Domain Controllers in each site and a DC located in Azure.

Lets assume that we have a mailbox database on each of the servers, and lets also assume that replication of the databases follow this pattern:

  • MBX1 replicates a copy to MBX3
  • MBX2 replicates a copy to MBX4
  • MBX4 replicates a copy to MBX1
  • MBX3 replicates a copy to MBX2

There are a number of scenarios here with the potential for failure, such as:

  • What if one of the Exchange Servers fails and the database fails over so that 2 Databases run on a single server, can that server manage the load?
  • What if one of the sites fails in it entirety? Will the single site handle both server load and network load?
  • What if the DAG Network fails and the sites can’t talk to each other so no replication occurs? Will the Witness Server handle the cluster communications?
  • What if the Site A network fails? Will the Witness Server bring up all of the databases on Site B. What happens when Site A comes back up and the servers on that site still think they are hold the “Live” databases?
  • What if the Domain Controllers fail? Will the users whose email is hosted on databases on the site where the DC’s have failed be able to authenticate?

All of the above are real scenarios that could happen. And thats where Chaos Engineering comes in. It is defined as:

the ability to deliberately inject faults that cause system components to fail. The goal is to observe, monitor, respond to, and improve your system’s reliability under adverse circumstances. For example, taking dependencies offline (stopping API apps, shutting down VMs, etc.), restricting access (enabling firewall rules, changing connection strings, etc.), or forcing failover (database level, Front Door, etc.), is a good way to validate that the application is able to handle faults gracefully.

High Availability is no guarantee that a service cannot fail, and designing your service to expect failure is a core approach to creating services and applications. Chaos Engineering strives to anticipate rare, unpredictable, and disruptive outcomes, so that you can minimize any potential impact on your customers.

Azure Chaos Studio

But all of that testing isn’t a problem for you though because you’ve moved to Azure! And you configured cool stuff like Geo-redundant Storage, Virtual Machine Scale Sets and Azure Site Recovery between regions! So all of your stuff is safe, right?

Wrong. And if you haven’t managed to fully test the resiliency of your cloud-based resources, thats where Azure Chaos Studio can help!

Chaos Studio enables you to orchestrate fault injection on your Azure resources in a safe and controlled way and this is done by using a chaos experiment, which describes the faults that should be run and the resources those faults should be run against.

Chaos Studio supports 2 types of faults:

  • Service-direct faults, which run directly against an Azure resource without any installation or instrumentation (for example, rebooting an Azure Cache for Redis cluster or adding network latency to AKS pods)
  • Agent-based faults, which run in virtual machines or virtual machine scale sets to perform in-guest failures (for example, applying virtual memory pressure or killing a process).

Each fault has specific parameters you can control. Lets take VM Scale Sets as an example – you have created an application that runs on VMs in a scale set and have configured this as follows:

  • Minimum of 2 VMs, Maximum of 10 in the scale set
  • Additional VM is created if:
    • CPU hits 90%
    • Memory hits 80%

But have you ever tested that this actually works? In Chaos Studio, you can import your Scale Set and then create an “Experiment” to apply specific faults to the scale set. We can see that there are a number of faults that are available to choose from:

We can also see that we have the concept of Steps and Branches. When you build an experiment:

  • You define one or more steps that execute sequentially, and each step containing one or more branches that run in parallel within the step.
  • Each branch contains one or more actions such as injecting a fault or waiting for a certain duration.
  • Finally, you organize the resources that each fault will be run against into groups called selectors.
Image Credit – Microsoft

When you run your expirement, you will see output around each fault and the results of each of the steps:

Image Credit – Microsoft

The benefits of Chaos Studio experiments on your infrastructure are:

  • Familiarize team members with monitoring tools.
  • Recognize outage patterns.
  • Learn how to assess the impact.
  • Determine the root cause and mitigate accordingly.
  • Practice log analysis.


So thats an overview of Chaos Engineering and how Azure Chaos Studio can help test for faults in your resources and build resiliency into your applications. You can find more details on Azure Chaos Studio here.

Hope you enjoyed this post, until next time!

100 Days of Cloud – Day 95: Azure Stack Edge, HCI and HUB

Its Day 95 of my 100 Days of Cloud journey and in todays post we’ll take a quick look at Azure Stack range of offerings, the differences between them and their capabilities.

Azure Stack HCI

I’m starting with Azure Stack HCI as its the one that going to be most familiar to anyone like me who’s coming from the on-premises Hyper-V and Failover Cluster world.

Azure Stack HCI is a hyperconvered infrastructure cluster solution that sits in your on-premises infrastructure. It hosts virtualized Windows and Linux workloads and their storage and networking in a hybrid environment that is registered with your Azure Tenant.

Azure Stack HCI has its own dedicated operating system, and you can run this on integrated systems from a Microsoft hardware partner with the Azure Stack HCI operating system pre-installed, or buy validated nodes from an approved manufacturer list and install the operating system yourself.

The Azure Stack HCI operating system contains built in Hyper-V, Storage Spaces Direct and Software-Defined Networking. This means the configuration is minimal and you are pretty much ready to go in getting your Clusters ready. A Azure Stack HCI Cluster can contain between 2 and 16 physical servers.

Image Credit – Microsoft

So its basically a traditional Hyper-V Failover Cluster with a new name, right? Wrong, its much more than that. Because it ships from Azure, the billing for your nodes and usage come as part of your Azure Subscription charges. You are also required to register your Azure Stack HCI cluster with Azure within 30 days of installation. This can be done by using Windows Admin Center or Azure PowerShell modules.

Why Azure Stack HCI?

There are lots of great reasons for choosing Azure Stack HCI:

  • Familiar tools and skillset for exsiting Hyper-V and server admins
  • Integration with existing tools such as Microsoft System Center, Active Directory, Group Policy, and PowerShell scripting.
  • Integration with majoriy of mainstream backup, security, and monitoring tools.
  • Wide range of vendor hardware choices allow customers to choose the vendor with the best service and support in their geography.
  • You get full integration with Azure Arc for managing your workloads centrally from Azure alongside other Azure services.

Use Cases

  • Branch office and edge – for branch office and edge workloads, you can minimize infrastructure costs by deploying two-node clusters with inexpensive witness options, such as Cloud Witness or a USB drive–based file share witness.
  • Virtual desktop infrastructure (VDI) – Azure Stack HCI clusters are well suited for large-scale VDI deployments with RDS or equivalent third-party offerings as the virtual desktop broker.
  • Highly performant SQL Server – Azure Stack HCI provides an additional layer of resiliency to highly available, mission-critical Always On availability groups-based deployments of SQL Server.
  • Trusted enterprise virtualization – Azure Stack HCI satisfies the trusted enterprise virtualization requirements through its built-in support for Virtualization-based Security (VBS).
  • Azure Kubernetes Service (AKS) – You can leverage Azure Stack HCI to host container-based deployments, which increases workload density and resource usage efficiency.
  • Scale-out storage – Using Storage Spaces Direct results in significant cost reductions compared with competing offers based on storage area network (SAN) or network-attached storage (NAS) technologies.
  • Disaster recovery for virtualized workloads- Stretched clustering provides automatic failover of virtualized workloads to a secondary site following a primary site failure. Synchronous replication ensures crash consistency of VM disks.
  • Data center consolidation and modernization – Refreshing and consolidating aging virtualization hosts with Azure Stack HCI can improve scalability and make your environment easier to manage and secure. It’s also an opportunity to retire legacy SAN storage to reduce footprint and total cost of ownership.
  • Run Azure services on-premises – Integration with Azure Arc allows you to run Azure services anywhere. This allows you to build consistent hybrid and multicloud application architectures by using Azure services that can run in Azure, on-premises, at the edge, or at other cloud providers.

Azure Stack Hub

Azure Stack Hub is similar to Azure stack HCI in that you install a cluster of between 4-16 physical servers from an approved Microsoft vendor hardware list in your on-premises environment. However, Azure Stack Hub is essentially an extension of the full Azure platform that brings the following services:

  • Azure VMs for Windows and Linux
  • Azure Web Apps and Functions
  • Azure Key Vault
  • Azure Resource Manager
  • Azure Marketplace
  • Containers
  • Admin tools (Plans, offers, RBAC, and so on)

All looks very familiar, but here’s where it gets interesting – Azure Stack Hub is used to provide Azure consistent services to an on-premises environment that is either connected to the internet (and Azure) or disconnected environments with no internet connectivity. When we look at the comparison below, we can see that while Azure Stack Hub contains all of the features offered by Azure Stack HCI, it also includes a full set of IaaS, PaaS and cloud platform admin tools:

Image Credit – Microsoft

The PaaS offering is optional because Azure Stack Hub isn’t operated by Microsoft, its operated by you when you deploy Azure Stack Hub in your environment. So lets say for example if you are a small MSP, you can use Azure Stack Hub to host a multi-tenant environment that services your own customers with a PaaS offering which abstracting away the underlying infrastructure and processes. These are some of the PaaS services you can offer:

  • App Service
  • SQL databases
  • MySQL databases
  • Service Fabric
  • Kubernetes Container Service
  • Ethereum Blockchain
  • Cloud Foundry

Azure Stack Edge

The last member of the family is Azure Stack Edge. This is a family of Azure -managed appliances and was originally a Data Box solution for importing data into Azure. It acted as a network storage gateway to performs high-speed transfers to Azure.

Now, Azure Stack Edge is used as a AI-enabled device that can be used on remote locations to enable data analytics and create machine learning models that can be integrated with Azure Machine Learning. The data all stays locally cached on the device in order for you to create and train your ML modelling before uploading the data to your Azure Subscription.

Image Credit – Neal Analytics

You can also use the full capabilities of VM and Containerized Compute workloads on these devices, and can run a maximum of 2 devices as a 2-node cluster with a Scale out file server option.


So thats a brief overview of the Azure Stack portfolio and some of the benefits it can bring to your on-premises and edge computing environments. You can find full details and documentation at the links below:

Hope you enjoyed this post, until next time!