Michael Durkan

Featured

Every new beginning comes from some other beginning’s end – a quick review of 2023

Today is a bit of a “dud day” – post Xmas, post birthdays (me and my son) , but before the start of a New Year and the inevitable return to work.

So, its a day for planning for 2024. And naturally, any planning requires some reflection and a look back on what I achieved over the last year.

Highlights from 2023

If I’m being honest my head was in a bit of a spin at the start of 2023. I was coming off the high of submitting my first pre-recorded content session to Festive Tech Calendar, but also in the back of my mind I knew a change was coming as I’d made the decision to change jobs.

I posted the list of goals above on LinkedIn and Twitter (when it was still called that…) on January 2nd, so lets see how I did:

Present at both a Conference and User Group – check!
Mentor others, work towards MCT – Mentoring was one of the most fulfilling activities I undertook over the last year. The ability to connect with people in the community who need help, advice or just an outsiders view. Its something I would recommend anyone to do. I also learned that mentoring and training are not connected (I may look at the MCT in 2024) – mentoring is more about asking the right questions, being on the same wavelength as your mentees, and understanding their goals to ensure you are aligning and advising them on the correct path.
Go deep on Azure Security, DevOps and DevOps Practices – starting a new job this year with a company that is DevSecOps and IAC focused was definitely a massive learning curve and one that I thoroughly enjoyed!
AZ-400 and SC-100 Certs – nope! The one certification I passed this year was AZ-500 but to follow on from the previous point, its not all about exams and certifications. I’d feel more confident have a go at the AZ-400 exam now that I have nearly a year’s experience in DevOps, and its something I’ve been saying for a while now – hiring teams aren’t (well, they shouldn’t be!) interested in tons of certifications, they want to see actual experience in the subject which backs the certification.
Create Tech Content – check! I was fortunate to be able to submit sessions to both online events and also present live at Global Azure Dublin and South Coast Summit this year. It was also the year when my first LinkedIn Learning course was published (shameless plug, check it out at this link).
Run Half Marathon – Sadly no to this one, I made a few attempts and was a week away from my first half-marathon back in March when my knee decided to give up the ghost. Due to work and family commitments, I never returned to this but its back on the list for 2024.
Get back to reading books to relax – This is something we all need to do, turn off that screen at night and find time to relax. I’ve done a mix of Tech and Fiction books and hope to continue this trend for 2024.

By far though, the biggest thing to happen for me this year was when this email landed in my inbox on April Fools Day …..

I thought it was an April Fools joke. And if my head was spinning, you can imagine how fast it was spinning now!

For anyone involved in Microsoft technologies or solutions, being awarded the MVP title is a dream that we all aspire to. It’s recognition from Microsoft that you are not only a subject matter expert in your field, but someone who is looked up to by other community members for content. If we look at the official definition from Microsoft:

The Microsoft Most Valuable Professionals (MVP) program recognizes exceptional community leaders for their technical expertise, leadership, speaking experience, online influence, and commitment to solving real world problems.

I’m honoured to be part of this group, getting to know people that I looked up and still looked up to, who push me to be a better person each and every day.

Onwards to 2024!

So what are my goals for 2024? Well unlike last year where I explicitly said what I was going to do and declared it, this year is different as I’m not entirely sure. But ultimately, it boils down to 3 main questions:

What are my community goals?

The first goal is to do enough to maintain and renew my MVP status for another year. I hope I’ve done enough and will keep working up to the deadline, but you never really know! I have another blog post in the works where I’ll talk about the MVP award, what its meant to me and some general advice from my experiences of my first year of the award.

I’ve gotten the bug for Public Speaking and want to submit some more sessions to conferences and user groups over the next year. So plan to submit to some CFS, but if anyone wants to have me on a user group, please get in touch!

I’ve enjoyed mentoring others on their journey, and the fact that they keep coming back means that the mentees have found me useful as well!

Blogging – this is my 3rd blog post of the year, and my last one was in March! I want get some consistency back into blogging as its something I enjoy doing.

What are my learning goals?

I think like everyone, the last 12 months have been a whirlwind of Copilots and AI. I plan to immerse myself in that over the coming year, while also growing my knowledge of Azure. Another goal is to learn some Power Platform – its a topic I know very little about, but want to know more! After that, the exams and the certs will come!

What are my personal goals?

So unlike last year, I’m not going to declare that I’ll do a half marathon – at least not in public! The plan is to keep reading both tech and fiction books, keep making some time for myself, and to make the most of my time with my family. Because despite how much the job and the community pulls you back in, there is nothing more important and you’ll never have enough family time.

So thats all from me for 2023 – you’ll be hearing from me again in 2024! Hope you’ve all had a good holiday, and Happy New Year to all!

Featured

Control your Azure Virtual Desktop costs with Scaling Plans

Cloud Computing has changed the way we approach our enterprise infrastructure.

The amount of options available to us now means that we can finally ditch that dusty old server sitting the the bottom of the server rack (or in some cases at the back of a cupboard) for a modern secure solution that we don’t need to sit and pray in front of every time we need to restart it.

The Problem with the Cloud

But …. some people would prefer to keep old “Dusty Springfield” alive because the effort to migrate and in some cases re-architect the service is too much and too costly. And thats the thing we hear the most when a suggestion to migrate to a cloud service is raised – “the cloud is very expensive…”.

And lets be honest, it is …..

There, I said it. Out Loud. In Print. Cloud Computing is expensive. There’s a helicopter hovering over my house at the minute but I’m sure its nothing to worry about ……

In all seriousness though, when scoping out a Cloud solution the first thing that is looked at is cost. You can argue as much as you want about the redundancy, the lower power and cooling costs, lack of hardware costs etc. The bean counters will look at the bottom line and say “we’re not paying that much now….”. And “Dusty Springfield” limps on defiantly in corner.

Of course, your cloud computing costs are defined by the options you select and what level of redundancy you need. Scale Sets, Storage redundancy across zones and regions. Or just keep it as locally redundant storage? Then you get into the sizing of your solutions.

How the Costs add up

Azure Virtual Desktop is one of those cool technologies that can help you provide a secure environment for your users to access Cloud or Hybrid environments in a consistent and unified experience. But because its built on underlying VMs which you need to size based on your requirements, the costs can mount up.

Lets take a look at an example of a standard Azure Virtual Desktop host pool that contains 10 Session Hosts which are delivering Remote Apps to 100 users. The Session Hosts are generally sized from the General Purpose VM type and the most common one used is the “Standard_D4s_v3”, which has 4 vCPU’s and 16GB memory.

The base cost for this VM if you create a standard Azure Virtual machine comes in at approx $160 per month.

However, if we use this VM type for our Azure Virtual Desktop Session Hosts with Windows 10 Enterprise Multi-Session version 21H2 with Microsoft 365 Apps installed, the cost then jumps to $290 per month.

Azure Virtual Desktop Virtual Machine Type

So, lets go back to our 10 Session Hosts – at that price we’re talking $2900 per month, or just under $35000 per year. And thats for just 10 VMs in the environment. And thats why Cloud Computing is expensive! Of course, this doesn’t take into account reserved instances or spot instances, but you get the idea.

The $290 per month cost for a VM isn’t based on a cost per month – its based on 730 hours of usage or 24 hours multiplied by just over 30. This where you can start cutting into that $35000 per year cost, and where Scaling Plans applied to your Azure Virtual Desktop Host Pools can help.

Scaling Plans

Scaling Plans lets you scale your session host virtual machines (VMs) in a host pool up or down to optimize deployment costs. You can create a scaling plan based on:

Time of day
Specific days of the week
Session limits per session host

You follow the guidelines below when creating your scaling plan:

At the time of writing, you can only configure autoscale with existing Pooled host pools. This won’t work with Personal host pools
You must create the scaling plan in the same Azure region as the host pool you assign it to.
All host pools you use with autoscale must have a configured MaxSessionLimit parameter. Don’t use the default value.
You must grant Azure Virtual Desktop access to manage the power state of your session host VMs.

Create a custom RBAC role

Now that we know the benefits and rules, the first thing we need to do is create a custom RBAC role. This custom role and assignment will allow Azure Virtual Desktop to manage the power state of any VMs in those subscriptions. It will also let the service apply actions on both host pools and VMs when there are no active user sessions.

The steps for creating the Custom RBAC Role are as follows (this is the same for creating any Custom RBAC Role):

First, create a json file using whatever your favourite editor is (I’m using Sublime in this example). Save the file as avdscale.json and add the following information into it:

Open the Azure portal and go to Subscriptions and select a subscription that contains a host pool and session host VMs you want to use with autoscale. Select Access control (IAM). Select the + Add button, then select Add custom role from the drop-down menu.

On the “Basics” screen, go to Baseline permissions and browse to the avdscale.json file that you just created.

This will import all of your settings, so on the next screen you will see the permissions that you had specified in your json file.

Next, we have “Assignable Scopes”. You want to assign this at subscription level as assigning this custom role at any level lower than your subscription, such as the resource group, host pool, or VM, will prevent autoscale from working properly.

We can now skip to the “Review and Create” screen, as this will validate and list out our permissions for the RBAC role. Review these and then click “Create”:

And once thats created, we can see its been created as a Custom Role:

Now we need to add a Role Assignment for our RBAC Role. So we click on “Add role assignment”

We select our Custom RBAC role and in the members screen, we choose to assign access to a User, group or service principal. From the select members screen, search for “Windows Virtual Desktop”

Go to “Review and Assign” and click create:

And we can see that at subscription level the role has been assigned:

Create our Scaling Plan

Now that our RBAC role is done, we can create our scaling plan.

Open the Azure portal. In the search bar, type Azure Virtual Desktop and select the matching service entry. Select Scaling Plans, then select Create.

On the Basics screen, provide the following:
- Subscription and Resource Group where the Scaling Plan will be created
- Name
- Location (remember this needs to be in the same region as your Host Pool)
- Time Zone

The other entries are optional, however an important one to note is Exclusion Tags – you can use this in conjunction with Tags to excluse certain VMs from autoscaling operations

Click next and this will bring you to the Schedules screen. Click on Add Schedule

In the General screen, we enter a Schedule Name and also select the days we want the schedule to apply to.

In the Ramp-up screen, we specify a default starting point.
- So in this instance, we want to have 20% (or 2 out of our 10 Session Hosts) powered on and ready to accept connections at 08:00.
- We’ve selected “Breadth First” for Load balancing – this means users will be spread evenly across available hosts and is recommended for consistent performance.
- Finally, we have set a Capacity threshold of 80%. If you recall, we set our hosts to accept a maximum of 10 connections. We have 2 hosts powered on, so once we reach 16 users across those 2 hosts, the next host will automatically power on.

Next up is Peak hours. For this we specify a starting time (which is normally when the majority of your users will be logging on) and we’ve also flipped the Load Balancing to “Depth-first”, which will load up all available hosts with user sessions (up to our 80% threshold) before bringing another one online. This is really up to you as to how you want to load balance, but as a reminder:
- Breadth-first load balancing distributes new user sessions across all available session hosts in the host pool.
- Depth-first load balancing distributes new sessions to any available session host with the highest number of connections that hasn’t reached its session limit yet.

Next up is Ramp-down, this is where we start deallocating hosts at the end of the working day and as you can see, the target is to get back down to 20% of the hosts. The important point to make here is the “Force logoff users” option. If this is enabled then the following applies:
- This will choose the session host with the lowest number of user sessions to shut down. Autoscale will put the session host in drain mode, send all active user sessions a notification telling them they’ll be signed out, and then sign out all users after the specified wait time is over. After autoscale signs out all user sessions, it then deallocates the VM.
- During ramp-down, autoscale will only shut down VMs if all existing user sessions in the host pool can be consolidated to fewer VMs without exceeding the capacity threshold.

Finally, we get to “Off-peak hours” which is the end of the “Ramp-down” period.

And thats our weekday schedule created. You can also go back in and create a weekend schedule where you can bring the number of hosts down to 10% and have a higher capacity threshold at weekends:

Once the schedules are created, we assign the Scaling Plan to our Host pool and click on “Enable autoscale”:

And now we can validate our options and click on “Review and create”:

Give all of this about an hour to kick in and you will see your Azure Virtual Desktop session hosts automatically deallocated as per your schedules if not in use!

Money money money ….

Earlier in this post, I gave a yearly figure of approx $35000 to run our 10 Session Host VMs. However, that figure is based on full consumption. So lets do some very quick calculations to see how our scaling plan affects that figure:

As we said, a single VM running at full consumption (or the full 730 hours) will cost us $290 per month.
Based on our schedules created above, we’re going to have 1 VM running full time for both weekdays and weekends. So thats $290 per month, or $3,480 per year.
We’re then guaranteed to have 1 VM running from Monday until Friday for 24 hours, and also on weekends for 12 hours each day (depending on how schedule is created). Thats effectively 6 days a week instead of 7. So we need to calculate that over a year which is a case of getting 6/7ths of our full price figure. Thats coming in at $2,983 per year for that VM.
Now, its back to the other 8 VMs and the 100 users who are using this. “If” those 100 users are logged on, the other 8 VMs will be up for 12 hours a day from Monday to Friday only as per our schedule. So for that, we need to get 5/7ths of our full price figure (which is $2,486) and then half it because we’re only using for 12 hours a day (and thats coming in at $1,243 per VM).

In summary, what we’ve got is:

$3,480 – 1 VM at full consumption
$2,983 – 1 VM at slightly reduced consumption for weekdays and weekends
$9,944 – 8 VMs running for 12 hours a day from Monday to Friday

Add those figures up and you get a total of $16,407. And we need to remember, that figure doesn’t available cost reductions like Reserved Instances or Hybrid Benefit.

Conclusion

So by implementing a Scaling Plan for the Host pool above, we’ve saved ourselves nearly $20,000. Again I’m going to stress the figures I’m quoting here are approximate, may not represent what you see in your own personal or enterprise subscriptions, and should not be taken as exact savings. Make sure to speak to your Microsoft TAM or Cloud Service Provider for more details. You can find out more about scaling plans here.

Hope you enjoyed this post, until next time!

Azure Networking Zero to Hero – Network Security Groups

In this post, I’m going to stay within the boundaries of our Virtual Network and briefly talk about Network Security Groups, which filter network traffic between Azure resources in an Azure virtual network.

Overview

So, its a Firewall right?

NOOOOOOOOOO!!!!!!!!

While a Network Security Group (or NSG for short) contains Security Rules to allow or deny inbound/outbound traffic to/from several types of Azure Resources, it is not a Firewall (it may be what a Firewall looked like 25-30 years ago, but not now). NSG’s can be used in conjunction with Azure Firewall and other network security services in Azure to help secure and shape how your traffic flows between subnets and resources.

Default Rules

When you create a subnet in your Virtual Network, you have the option to create an NSG which will be automatically associated with the subnet. However, you can also create an NSG and manually associate it with either a subnet, or directly to a Network Interface in a Virtual Machine.

When an NSG is created, it always has a default set of Security Rules that look like this:

The default Inbound rules allow the following:

65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
65001 — Allows Azure Load Balancer to communicate with the Hosts/resources
65500 — Deny all other Inbound traffic

The default Outbound rules allow the following:

65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
65001 — Allows all Internet Traffic outbound
65500 — Deny all other Outbound traffic

The default rules cannot be edited or removed. NSG’s are created initially using a Zero-Trust model. The rules are processed in order of priority (lowest numbered rule is processed first). So you would need to build you rules on top of the default ones (for example, RDP and SSH access if not already in place).

Configuration and Traffic Flow

Some important things to note:

The default “65000” rules for both Inbound and Outbound – this allows all virtual network traffic. It means that if we have 2 subnets which each have a virtual machine, these would be able to communicate with each other without adding any additional rules.
As well as IP addresses and address ranges, we can use Service Tags which represents a group of IP address prefixes from a range of Azure services. These are managed and updated by Microsoft so you can use these instead of having to create and manage multiple Public IP’s for each service. You can find a full list of available Service Tags that can be used with NSG’s at this link. In the image above, “VirtualNetwork” and “AzureLoadBalancer” are Service Tags.
A virtual network subnet or interface can only have one NSG, but an NSG can be assigned to many subnets or interfaces. Tip from experience, this is not a good idea – if you have an application design that uses multiple Azure Services, split these services into dedicated subnets and apply NSG’s to each subnet.
When using a NSG associated with a subnet and a dedicated NSG associated with a network interface, the NSG associated with the Subnet is always evaluated first for Inbound Traffic, before then moving on to the NSG associated with the NIC. For Outbound Traffic, it’s the other way around — the NSG on the NIC is evaluated first, and then the NSG on the Subnet is evaluated. This process is explained in detail here.
If you don’t have a network security group associated to a subnet, all inbound traffic is blocked to the subnet/network interface. However, all outbound traffic is allowed.
You can only have 1000 Rules in an NSG by default. Previously, this was 200 and could be raised by logging a ticket with Microsoft, but the max (at time of writing) is 1000. This cannot be increased. Also, there is a max limit of 5000 NSG’s per subscription.

Logging and Visibility

Important – Turn on NSG Flow Logs. This is a feature of Azure Network Watcher that allows you to log information about IP traffic flowing through a network security group, including details on source and destination IP addresses, ports, protocols, and whether traffic was permitted or denied. You can find more in-depth details on flow logging here, and a tutorial on how to turn it on here.
To enhance this, you can use Traffic Analytics, which analyzes Azure Network Watcher flow logs to provide insights into traffic flow in your Azure cloud.

Conclusion

NSGs are fundamental to securing inbound and outbound traffic for subnets within an Azure Virtual Network, and form one of the first layers of defense to protect application integrity and reduce the risk of data loss prevention.

However as I said at the start of this post, an NSG is not a Firewall. The layer 3 and layer 4 port-based protection that NSGs provide has significant limitations and cannot detect other forms of malicious attacks on protocols such as SSH and HTTPS that can go undetected by this type of protection.

And that’s one of the biggest mistakes I see people make – they assume that NSG’s will do the job because Firewalls and other network security sevices are too expensive.

Therefore, NSG’s should be used in conjunction with other network security tools, such as Azure Firewall and Web Application Firewall (WAF), for any devices presented externally to the internet or other private networks. I’ll cover these in detail in later posts.

Hope you enjoyed this post, until next time!!

Azure Networking Zero to Hero – Routing in Azure

In this post, I’m going to try and explain Routing in Azure. This is a topic that grows in complexity the more you expand your footprint in Azure in terms of both Virtual Networks, and also the services you use to both create your route tables and route your traffic.

Understanding Azure’s Default Routing

As we saw in the previous post when a virtual network is created, this also creates a route table. This contains a default set of routes known as System Routes, which are shown here:

Source	Address prefixes	Next hop type
Default	Virtual Network Address Space	Virtual network
Default	0.0.0.0/0	Internet
Default	10.0.0.0/8	None (Dropped)
Default	172.16.0.0/12	None (Dropped)
Default	192.168.0.0/16	None (Dropped)

Lets explain the “Next hop types” is in a bit more detail:

Virtual network: Routes traffic between address ranges within the address space of a virtual network. So lets say I have a Virtual Network with the 10.0.0.0/16 address space defined. I then have VM1 in a subnet with the 10.0.1.0/24 address range trying to reach VM2 in a subnet with the 10.0.2.0/24 address range. It know to keep this within the Virtual Network and routes the traffic successfully.
Internet: Routes traffic specified by the address prefix to the Internet. If the destination address range is not part of a Virtual Network address space, its gets routed to the Internet. The only exception to this rule is if trying to access an Azure Service – this goes across the Azure Backbone network no matter which region the service sits in.
None: Traffic routed to the None next hop type is dropped. This automatically includes all Private IP Addresses as defined by RFC1918, but the exception to this is your Virtual Network address space.

Simple, right? Well, its about to get more complicated …..

Additional Default Routes

Azure adds more default system routes for different Azure capabilities, but only if you enable the capabilities:

Source	Address prefixes	Next hop type
Default	Peered Virtual Network Address Space	VNet peering
Virtual network gateway	Prefixes advertised from on-premises via BGP, or configured in the local network gateway	Virtual network gateway
Default	Multiple	VirtualNetworkServiceEndpoint

So lets take a look at these:

Virtual network (VNet) peering: when a peering is created between 2 VNets, Azure adds the address spaces of each of the peered VNets to the Route tables of the source VNets.
Virtual network gateway: this happens when S2S VPN or Express Route connectivity is establised and adds address spaces that are advertised from either Local Network Gateways or On-Premises gateways via BGP (Border Gateway Protocol). These address spaces should be summarized to the largest address range coming from On-Premises, as there is a limit of 400 routes per route table.
VirtualNetworkServiceEndpoint: this happens when creating a direct service endpoint for an Azure Service, enables private IP addresses in the VNet to reach the endpoint of an Azure service without needing a public IP address on the VNet.

Custom Routes

The limitations of sticking with System Routes is that everything is done for you in the background – there is no way to make changes.

This is why if you need to make change to how your traffic gets routed, you should use Custom Routes, which is done by creating a Route Table. This is then used to override Azure’s default system routes, or to add more routes to a subnet’s route table.

You can specify the following “next hop types” when creating user-defined routes:

Virtual Appliance: This is typically Azure Firewall, Load Balancer or other virtual applicance from the Azure Marketplace. The appliance is typically deployed in a different subnet than the resources that you wish to route through the Virtual Appliance. You can define a route with 0.0.0.0/0 as the address prefix and a next hop type of virtual appliance, with the next hop address set as the internal IP Address of the virtual appliance, as shown below. This is useful if you want all outbound traffic to be inspected by the appliance:

Virtual network gateway: used when you want traffic destined for specific address prefixes routed to a virtual network gateway. This is useful if you have an On-Premises device that inspects traffic an determines whether to forward or drop the traffic.
None: used when you want to drop traffic to an address prefix, rather than forwarding the traffic to a destination.
Virtual network: used when you want to override the default routing within a virtual network.
Internet: used when you want to explicitly route traffic destined to an address prefix to the Internet

You can also use Service Tags as the address prefix instead of an IP Range.

How Azure selects which route to use?

When outbound traffic is sent from a subnet, Azure selects a route based on the destination IP address, using the longest prefix match algorithm. So if 2 routes exist with 10.0.0.0/16 and a 10.0.0.0/24, Azure will select the /24 as it has the longest prefix.

If multiple routes contain the same address prefix, Azure selects the route type, based on the following priority:

User-defined route
BGP route
System route

So, the initial System Routes are always the last ones to be checked.

Conclusion and Resources

I’ve put in some links already in the article. The main place to go for a more in-depth deep dive on Routing is this MS Learn Article on Virtual Network Traffic Routing.

As regards people to follow, there’s no one better than my fellow MVP Aidan Finn who writes extensively about networking over at his blog. He also delivered this excellent session at the Limerick Dot Net Azure User Group last year which is well worth a watch for gaining a deep understanding of routing in Azure.

Hope you enjoyed this post, until next time!!

Azure Networking Zero to Hero – Intro and Azure Virtual Networks

Welcome to another blog series!

This time out, I’m going to focus on Azure Networking, which covers a wide range of topics and services that make up the various networking capabilities available within both Azure cloud and hybrid environments. Yes I could have done something about AI, but for those of you who know me, I’m a fan of the classics!

The intention is to have this blog series serve as both a starting point for anyone new to Azure Networking who is looking to start a learning journey towards that AZ-700 certification, or as an easy reference point for anyone looking for a list of blogs specific to the wide scope of services available in the Azure Networking family.

There isn’t going to be a set number of blog posts or “days” – I’m just going to run with this one and see what happens! So with that, lets kick off with our first topic, which is Virtual Networks.

Azure Virtual Networks

So lets start with the elephant in the room. Yes, I have written a blog post about Azure Virtual Networks before – 2 of them actually as part of my “100 Days of Cloud” blog series, you’ll find Part 1 and Part 2 at these links.

Great, so thats todays blog post sorted!!! Until next ti …… OK, I’m joking – its always good to revise and revisit.

After a Resource Group, a virtual network is likely to be the first actual resource that you create. Create a VM, Database or Web App, the first piece of information it asks you for is what Virtual Network to your resource in.

But of course if you’ve done it that way, you’ve done it backwards because you really should have planned your virtual network and what was going to be in it first! A virtual network acts as a private address space for a specific set of resource groups or resources in Azure. As a reminder, a virtual network contains:

Subnets, which allow you to break the virtual network into one or more dedicated address spaces or segments, which can be different sizes based on the requirements of the resource type you’ll be placing in that subnet.
Routing, which routes traffic and creates a routing table. This means data is delivered using the most suitable and shortest available path from source to destination.
Network Security Groups, which can be used to filter traffic to and from resources in an Azure Virtual Network. Its not a Firewall, but it works like one in a more targeted sense in that you can manage traffic flow for individual virtual networks, subnets, and network interfaces to refine traffic.

A lot of wordy goodness there, but the easiest way to illustrate this is using a good old diagram!

Lets do a quick overview:

We have 2 Resource Groups using a typical Hub and Spoke model where the Hub contains our Application Gateway and Firewall, and our Spoke contains our Application components. The red lines indicate peering between the virtual networks so that they can communicate with each other.
Lets focus on the Spoke resource group – The virtual network has an address space of 10.1.0.0/16 defined.
This is then split into different subnets where each of the components of the Application reside. Each subnet has an NSG attached which can control traffic flow to and from different subnets. So in this example, the ingress traffic coming into the Application Gateway would then be allows to pass into the API Management subnet by setting allow rules on the NSG.
The other thing we see attached to the virtual network is a Route Table – we can use this to define where traffic from specific sources is sent to. We can use System Routes which are automatically built into Azure, or Custom Routes which can be user defined or by using BGP routes across VPN or Express Route services. The idea in our diagram is that all traffic will be routed back to Azure Firewall for inspection before forwarding to the next destination, which can be another peered virtual network, across a VPN to an on-premises/hybrid location, or straight out to an internet destination.

Final thoughts

Some important things to note on Virtual Networks:

Planning is everything – before you even deploy your first resource group, make sure you have your virtual networks defined, sized and mapped out for what you’re going to use them for. Always include scaling, expansion and future planning in those decisions.
Virtual Networks reside in a single resource group, but you technically can assign addresses from subnets in your virtual network to resources that reside in different resource groups. Not really a good idea though – try to keep your networking and resources confined within resource group and location boundaries.
NSG’s are created using a Zero-Trust model, so nothing gets in or out unless you define the rules. The rules are processed in order of priority (lowest numbered rule is processed first), so you would need to build you rules on top of the default ones (for example, RDP and SSH access if not already in place).

Hope you enjoyed this post, until next time!!

The A-Z of Azure Policy

I’m delighted to be contributing to Azure Spring Clean for the first time. The annual event is organised by Azure MVP’s Joe Carlyle and Thomas Thornton and encourages you to look at your Azure subscriptions and see how you could manage it better from a Cost Management, Governance, Monitoring and Security perspective. You can check out all of the posts in this years Azure Spring Clean here. For this year, my contribution is the A-Z of Azure Policy!

Azure Policy is one of the key pillars of a Well Architected Framework for Cloud Adoption. It enables you to enforce standards across either single or multiple subscriptions at different scope levels and allows you to bring both existing and new resources into compliance using bulk and automated remediation.

These policies enforce different rules and effects over your resources so that those resources stay compliant with your corporate standards and service level agreements. Azure Policy meets this need by evaluating your resources for noncompliance with assigned policies.

Image Credit: Microsoft

Policies define what you can and cannot do with your environment. They can be used individually or in conjunction with Locks to ensure granular control. Let’s look at some simple examples where Policies can be applied:

If you want to ensure resources are deployed only in a specific region.
If you want to use only specific Virtual Machine or Storage SKUs.
If you want to block any SQL installations.
If you want to enforce Tags consistently across your resources.

So that’s it – you can just apply a policy and it will do what you need it to do? The answer is both Yes and No:

Yes, in the sense that you can apply a policy to define a particular set of business rules to audit and remediate the compliance of existing resources against those rules.
No in the sense that there is so much more to it than that.

There is much to understand about how Azure Policy can be used as part of your Cloud Adoption Framework toolbox. And because there is so much to learn, I’ve decided to do an “A-Z” of Azure Policy and show the different options and scenarios that are available.

Before we start on the A-Z, a quick disclaimer …. There’s going to be an entry for every letter of the alphabet, but you may have to forgive me if I use artistic license to squeeze a few in (Letters like Q, X and Z spring to mind!).

So, grab a coffee (or whatever drink takes your fancy) and let’s start on the Azure Policy alphabet!

Append is the first of our Policy Effects and is used to add extra fields to resources during update or creation, however this is only available with Azure Resource Manager (ARM). The example below sets IP rules on a Storage Account:

"then": {
    "effect": "append",
    "details": [{
        "field": "Microsoft.Storage/storageAccounts/networkAcls.ipRules",
        "value": [{
            "action": "Allow",
            "value": "134.5.0.0/21"
        }]
    }]
}

Assignment is the definition of what resources or scope your Policy is being applied to.

Audit is the Policy Effect that evaluates the resources and report a non-compliance in the logs. It does not take any actions; this is report-only.

"then": {
    "effect": "audit"
}

AuditIfNotExists is the Policy Effect that evaluates whether a property is missing. So for example, we can say if the type of Resource is a Virtual Machine and we want to know if that Virtual Machine has a particular tag or extension present. If yes, the resource will be returned as Compliant, if not, it will return a non-compliance. The example below evaluates Virtual Machines to determine whether the Antimalware extension exists then audits when missing:

{
    "if": {
        "field": "type",
        "equals": "Microsoft.Compute/virtualMachines"
    },
    "then": {
        "effect": "auditIfNotExists",
        "details": {
            "type": "Microsoft.Compute/virtualMachines/extensions",
            "existenceCondition": {
                "allOf": [{
                        "field": "Microsoft.Compute/virtualMachines/extensions/publisher",
                        "equals": "Microsoft.Azure.Security"
                    },
                    {
                        "field": "Microsoft.Compute/virtualMachines/extensions/type",
                        "equals": "IaaSAntimalware"
                    }
                ]
            }
        }
    }
}

Blueprints – Instead of having to configure features like Azure Policy for each new subscription, with Azure Blueprints you can define a repeatable set of governance tools and standard Azure resources that your organization requires. This allows you to scale the configuration and organizational compliance across new and existing subscriptions with a set of built-in components that speed the development and deployment phases.

Built-In –Azure provides hundreds of built-in Policy and Initiative definitions for multiple resources to get you started. You can find then both on the Microsoft Learn site or on GitHub.

Compliance State shows the state of the resource when compared to the policy that has been applied. Unsurprisingly this has 2 states, Compliant and Non-Compliant

Costs – if you are running Azure Policy on Azure resources, then its free. However, you can use Azure Policy to cover Azure Arc resources and there are specific scenarios where you will be charged:

Azure Policy guest configuration (includes Azure Automation change tracking, inventory, state configuration): $6/Server/Month
Kubernetes Configuration: First 6 vCPUs are free, $2/vCPU/month

Custom Policy definitions are ones that you create yourself when a Built-In Policy doesn’t meet the requirements of what you are trying to achieve.

Dashboards in the Azure Portal give you a graphical overview of the compliance state of your Azure environments:

Definition Location is the scope to where the Policy or Initiative is assigned. This can be Management Group, Subscription, Resource Group or Resource.

Deny is the Policy Effect used to prevent a resource request or action that doesn’t match the defined standards.

"then": {
    "effect": "deny"
}

DeployIfNotExists is the Policy Effect used to apply the action defined in the Policy Template when a resource is found to be non-compliant. This is used as part of a remediation of non-compliant resources. Important point to note – policy assignments that use a DeployIfNotExists effect require a managed identity to perform remediation.

Docker Security Baseline is a set of default configuration settings which ensure that Docker Containers in Azure are running based on a recommended set of regulatory and security baselines.

Enforcement Mode is a property that allows you to enable/disable enforcement of policy effects while still evaluating compliance.

Evaluation is the process of scanning your environment to determine the applicability and compliance of assigned policies.

Fields are used in policy definitions to specify a property or alias. In the example below, the field property contains “location” and “type” at different stages of the evaluation:

"if": {
        "allOf": [{
                "field": "location",
                "notIn": "[parameters('listOfAllowedLocations')]"
            },
            {
                "field": "location",
                "notEquals": "global"
            },
            {
                "field": "type",
                "notEquals": "Microsoft.AzureActiveDirectory/b2cDirectories"
            }
        ]
    },
    "then": {
        "effect": "Deny"
    }
}

GitHub – you can use GitHub to build an “Azure Policy as Code” workflow to manage your policies as code, control the lifecycle of updating definitions, and automate the process of validating compliance results.

Governance Visualizer – I have to include this because I think its an awesome tool – Julian Hayward’s AzGovViz tool is a PowerShell script which captures Azure governance capabilities such as Azure Policy, RBAC and Blueprints and a lot more. If you’re not using it, now is the time to start.

Group – within an Initiative, you can group policy definitions for categorization. The Regulatory Compliance feature uses this to group definitions into controls and compliance domains.

Hierarchy – this sounds simple but is important. The location that you assign the policy should contain all resources that you want to target under that resource hierarchy. If the definition location is a:

Subscription – Only resources within that subscription can be assigned the policy definition.
Management group – Only resources within child management groups and child subscriptions can be assigned the policy definition. If you plan to apply the policy definition to several subscriptions, the location must be a management group that contains each subscription.

Initiative (or Policy Set) is a set of Policies that have been grouped together with the aim of either targeting a specific set of resources, or to evaluate and remediate a specific set of definitions or parameters. For example, you could group several tagging policies into a single initiative that is targeted at a specific scope instead of applying multiple policies individually.

JSON – Policy definitions are written in JSON format. The policy definition contains elements for:

mode
parameters
display name
description
policy rule
- logical evaluation
- effect

An example of the “Allowed Locations” built-in policy is shown below

{
  "properties": {
    "displayName": "Allowed locations",
    "policyType": "BuiltIn",
    "description": "This policy enables you to restrict the locations...",
    "mode": "Indexed",
    "parameters": {
      "listOfAllowedLocations": {
        "type": "Array",
        "metadata": {
          "description": "Locations that can be specified....",
          "strongType": "location",
          "displayName": "Allowed locations"
        }
      }
    },
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "location",
            "notIn": "[parameters('listOfAllowedLocations')]"
          },
          {
            "field": "location",
            "notEquals": "global"
          },
          {
            "field": "type",
            "notEquals": "Microsoft.AzureActiveDirectory/b2cDirectories"
          }
        ]
      },
      "then": {
        "effect": "Deny"
      }
    }
  },
  "id": "/providers/Microsoft.Authorization/policyDefinitions/e56962a6-4747-49cd-b67b-bf8b01975c4c",
  "type": "Microsoft.Authorization/policyDefinitions",
  "name": "e56962a6-4747-49cd-b67b-bf8b01975c4c"
}

Key Vault – you can integrate Key Vault with Azure Policy to audit the key vault and its objects before enforcing a deny operation to prevent outages. Current built-ins for Azure Key Vault are categorized in four major groups: key vault, certificates, keys, and secrets management.

Kubernetes – Azure Policy uses Gatekeeper to apply enforcements and safeguards on your clusters (both Azure Kubernetes Service (AKS) and Azure Arc enabled Kubernetes). This then reports back into your centralized Azure Policy Dashboard on the following:

Checks with Azure Policy service for policy assignments to the cluster.
Deploys policy definitions into the cluster as constraint template and constraint custom resources.
Reports auditing and compliance details back to Azure Policy service.

After installing the Azure Policy Add-on for AKS, you can apply individual policy definitions or initiatives to your cluster.

Lighthouse – for Service Providers, you can use Azure Lighthouse to deploy and manage policies across multiple customer tenants.

Linux Security Baseline is a set of default configuration settings which ensure that Linux VMs in Azure are running based on a recommended set of regulatory and security baselines.

Logical Operators are optional condition statements that can be used to see if resources have certain configurations applied. There are 3 logical operators – not, allOf and anyOf.

Not means that the opposite of the condition should be true for the policy to be applied.
AllOf requires all the conditions defined to be true at the same time.
AnyOf requires any one of the conditions to be true for the policy to be applied.

"policyRule": {
  "if": {
    "allOf": [{
        "field": "type",
        "equals": "Microsoft.DocumentDB/databaseAccounts"
      },
      {
        "field": "Microsoft.DocumentDB/databaseAccounts/enableAutomaticFailover",
        "equals": "false"
      },
      {
        "field": "Microsoft.DocumentDB/databaseAccounts/enableMultipleWriteLocations",
        "equals": "false"
      }
    ]
  },
  "then": {

Mode tells you the type of resources for which the policy will be applied. Allowed values are “All” (where all Resource Groups and Resources are evaluated) and “indexed” (where policy is evaluated only for resources which support tags and location)

Modify is a Policy Effect that is used to add, update, or remove properties or tags on a subscription or resource during creation or update. Important point to note – policy assignments that use a Modify effect require a managed identity to perform remediation. If you don’t have a managed identity, use Append instead. The example below is replacing all tags with a value of environment with a value of test:

"then": {
    "effect": "modify",
    "details": {
        "roleDefinitionIds": [
            "/providers/Microsoft.Authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
        ],
        "operations": [
            {
                "operation": "addOrReplace",
                "field": "tags['environment']",
                "value": "Test"
            }
        ]
    }
}

Non-Compliant is the state which indicates that a resource did not conform to the policy rule in the policy definition.

OK, so this is my first failure. Surprising, but lets keep going!

Parameters are used for providing inputs to the policy. They can be reused at multiple locations within the policy.

{
    "properties": {
        "displayName": "Require tag and its value",
        "policyType": "BuiltIn",
        "mode": "Indexed",
        "description": "Enforces a required tag and its value. Does not apply to resource groups.",
        "parameters": {
            "tagName": {
                "type": "String",
                "metadata": {
                    "description": "Name of the tag, such as costCenter"
                }
            },
            "tagValue": {
                "type": "String",
                "metadata": {
                    "description": "Value of the tag, such as headquarter"
                }
            }
        },
        "policyRule": {
            "if": {
                "not": {
                    "field": "[concat('tags[', parameters('tagName'), ']')]",
                    "equals": "[parameters('tagValue')]"
                }
            },
            "then": {
                "effect": "deny"
            }
        }
    }
}

Policy Rule is the part of a policy definition that describes the compliance requirements.

Policy State describes the compliance state of a policy assignment.

Query Compliance – While the Dashboards in the Azure Portal (see above) provide you with a visual method of checking your overall compliance, there are a number of command line and automation tools you can use to access the compliance information gnerated by your policy and initiative assignments:

GitHub Action using the Azure Policy Compliance Scan action to trigger an on-demand evaluation scan from your GitHub workflow.
Azure CLI using the following command:

az policy state trigger-scan --resource-group "MyRG"

Azure PowerShell using the following command:

Start-AzPolicyComplianceScan -ResourceGroupName 'MyRG'

Regulatory Compliance describes a specific type of initiative that allows grouping of policies into controls and categorization of policies into compliance domains based on responsibility (Customer, Microsoft, Shared). These are available as built-in initiatives (there are built-in initiatives from CIS, ISO, PCI DSS, NIST, and multiple Government standards), and you have the ability to create your own based on specific requirements.

Remediation is a way to handle non-compliant resources. You can create remediation tasks for resources to bring these to a desired state and into compliance. You use DeployIfNotExists or Modify effects to correct violating policies.

Security Baseline for Azure Security Benchmark – this is a set of policies that comes from guidance from the Microsoft cloud security benchmark version 1.0. The full Azure Policy security baseline mapping file can be found here.

Scope is the location where the policy definition is being assigned to. This can be Management Group, Subscription, Resource Group or Resource.

Tag Governance is a crucial part of organizing your Azure resources into a taxonomy. Tags can be the basis for applying your business policies with Azure Policy or tracking costs with Cost Management. The template shown below shows how to enforce Tag values across your resources:

{
   "properties": {
      "displayName": "Require tag and its value",
      "policyType": "BuiltIn",
      "mode": "Indexed",
      "description": "Enforces a required tag and its value. Does not apply to resource groups.",
      "parameters": {
         "tagName": {
            "type": "String",
            "metadata": {
               "description": "Name of the tag, such as costCenter"
            }
         },
         "tagValue": {
            "type": "String",
            "metadata": {
               "description": "Value of the tag, such as headquarter"
            }
         }
      },
      "policyRule": {
         "if": {
            "not": {
               "field": "[concat('tags[', parameters('tagName'), ']')]",
               "equals": "[parameters('tagValue')]"
            }
         },
         "then": {
            "effect": "deny"
         }
      }
   },
   "id": "/providers/Microsoft.Authorization/policyDefinitions/1e30110a-5ceb-460c-a204-c1c3969c6d62",
   "type": "Microsoft.Authorization/policyDefinitions",
   "name": "1e30110a-5ceb-460c-a204-c1c3969c6d62"
}

Understanding how Effects work is key to understanding Azure Policy. By now, we’ve listed all the effects out above. The key thing to remember is that each policy definition has a single effect, which determines what happens when an evaluation finds a match. There is an order in how the effects are evaluated:

Disabled is checked first to determine whether the policy rule should be evaluated.
Append and Modify are then evaluated. Since either could alter the request, a change made may prevent an audit or deny effect from triggering. These effects are only available with a Resource Manager mode.
Deny is then evaluated. By evaluating deny before audit, double logging of an undesired resource is prevented.
Audit is evaluated.
Manual is evaluated.
AuditIfNotExists is evaluated.
denyAction is evaluated last.

Once these effects return a result, the following 2 effects are run to determine if additional logging or actions are required:

AuditIfNotExists
DeployIfNotExists

Visual Studio Code contains an Azure Policy code extension which allows you to create and modify policy definitions, run resource compliance and evaluate your policies against a resource.

Web Application Firewall – Azure Web Application Firewall (WAF) combined with Azure Policy can help enforce organizational standards and assess compliance at-scale for WAF resources.

Windows Security Baseline is a set of default configuration settings which ensure that Windows VMs in Azure are running based on a recommended set of regulatory and security baselines.

X is for ….. ah come on, you’re having a laugh ….. fine, here you go (artistic license taken!):

Xclusion – this of course should read Exclusion ….. when assigned, the scope includes all child resource containers and child resources. If a child resource container or child resource shouldn’t have the definition applied, each can be excluded from evaluation by setting notScopes.

Xemption – this of course should read Exemption …. this is a feature used to exempt a resource hierarchy or individual resource from evaluation. These resources are therefore not evaluated and can have a temporary waiver (expiration) period where they are exempt from evaluation and remediation.

YAML – You can use Azure DevOps to check Azure Policy Compliance using using YAML Pipelines. However, you need to use the AzurePolicyCheckGate@0 task. The syntax is shown below:

# Check Azure Policy compliance v0
# Security and compliance assessment for Azure Policy.
- task: AzurePolicyCheckGate@0
  inputs:
    azureSubscription: # string. Alias: ConnectedServiceName. Required. Azure subscription. 
    #ResourceGroupName: # string. Resource group. 
    #Resources: # string. Resource name.

Zero Non-Compliant – which is exactly the position you want to get to!

Z is also for Zzzzzzzz, which may be the state you’re in if you’ve managed to get this far!

Summary

So thats a lot to take in, but it gives you an insight into the different options that are available in Azure Policy to ensure that your Azure environments can meet both governance and cost management objectives for your organization.

In this post, I’ve stayed with the features of Azure Policy and apart from a few examples didn’t touch on the many different methods you can use to assign and manage policies which are:

Azure Portal
Azure CLI
Azure PowerShell
.NET
JavaScript
Python
REST
ARM Template
Bicep
Terraform

As always, check out the official Microsoft Learn documentation for a more in-depth deep dive on Azure Policy.

Hope you enjoyed this post! Be sure to check out the rest of the articles in this years Azure Spring Clean.

Can we prevent Cloud Repatriation in Azure?

I’ve seen a lot of articles in the last few months talking about Cloud Repatriation, so I’ve decided to look into this more and find out more about:

What is Cloud Repatriation?
Why is it suddenly a topic?
Why its not as easy as it sounds?
How did this happen in the first place?
Why it should never become an issue?

What is Cloud Repatriation?

Lets start with the easy question and look for the definition of what it is. Repatriation is a term that has been around for a while and is defined in its simplest form as:

“the process of returning a thing or a person to its place of origin”

So if we take that definition and apply it to technology, Cloud Repatriation is the process of companies moving their services out of Microsoft Azure (or other Public Cloud providers such as AWS or GCP) and relocating those services back to the On-Premises or Private Cloud environments that they originated from.

Why is it suddenly a topic?

One word – cost. The cost of running a Cloud Computing environment isn’t the same as running an On-Premises environment.

In an On-Premises environment, we work with predictable cost models when it comes to Equipment, Licensing and Staffing costs. The only variable is Power which is in a constant state of flux and change. This leads us down the CapEx route which forces companies into predicting the costs involved over a 3-5 year period. Finance people love this as it means they can safely predict future costs and budgets, and not have to worry about unexpected charges affecting their balance sheets.

The first part of that previous paragraph is ambiguous. Unless your company is static with zero growth projections (and lets be honest, no company is), its going to be difficult to predict costs or a period of years:

How many servers will you need to run your estate? If you order too little, you’ll need to buy more and your CFO won’t like that after you told them that these were the only costs needed for the next 3 years.
If you order too much, its overspend and equipment/license wastage and you may not be approved for additional equipment in your next Budget cycle (which leads you to use unsupported and out of warranty equipment that may lead to more costs to keep that operational).
You may have also hired either too few staff (leading to overwork and burnout) or too many staff (which leads to idleness and ultimately reducing the workforce).

Cloud Computing environments use the OpEx which works differently in that it uses a Pay-As-You-Use model. You use a Cloud Service and are billed monthly for the cost of using it. You have options to scale the service up or down as required, and you can also purchase Reserved Instances or Savings Plans over a 3/5 year period in order to reduce the costs and have that “CapEx-feel” to Cloud Computing.

The problem is that there is no clearly defined way of keeping those costs consistent, and Microsoft’s recent announcement on price increases for European Customers (and depending on your currency, this was as much as 15%) has meant that CFOs and CTOs are scrambling to look at alternative solutions to the Cloud.

And in some cases, the word “Repatriation” has been thrown about and the question being asked is “were we wrong to move to Azure/AWS/GCP, and should we look to move our servers and data back?”

Why its not as easy as it sounds?

So you want to move back? It sounds easy, and if your Cloud Migration involved only a “Lift And Shift” or Rehost (where you migrated your VMs as-is and made no modifications to them), then fire away! Buy your equipment, install your favourite hypervisor and off you go! There are 3rd party products (such as Carbon) on the market that will bring your VMs back to either VMware or Hyper-V.

You can also migrate Office365 mailboxes back to On-Premises Exchange Servers by setting up a migration batch in EAC, so that process is simple.

But what if you did more than just Rehost? Lets remind ourselves of the 5 R’s of Cloud Rationalization:

Rehost – also known and Lift and Shift.
Refactor – customizing your apps and infrastructure to align with the Cloud.
Rearchitect – divides your app into different parts or MicroServices.
Rebuild – completely rebuild and redevelop your app.
Replace – completely replace the app with a cloud-native SaaS application.

If you’ve done anything more than Rehost during your migration to Azure, then you have a bit of work on your hands getting it back. It’s not impossible by any means but as with all Cloud Services, it’s a lot easier to get them into the Cloud than it is to get them out. If you’ve redesigned your app to make it Cloud-Native using any of the other 4 “R’s”, then you need to realise that you need to recreate that environment on your On-Premises, and that may not be easy and cost a lot more than it is running the service in Azure in the first place!

How did this happen in the first place?

To work out why this should never have become an issue, we need to go back through the mists of time and work out why the migrations happened in the first place. It was most likely down to either:

Running old and unsupported hardware.
Complex systems that were difficult to manage and maintain.
Enhanced Security.
Easier Scalability of services.

And if you moved to Azure, its likely that you used either :

Azure Site Recovery (and were using Azure as a DR platform to initially test how your VMs would work).
Azure Migrate (where you ran a discovery assessment on the load of your VMs over a period of time up to 30 days, and used that assessment as a means of sizing your target Azure VMs).

The original version of Azure Migrate only supported migration of VMware VM workloads to Azure. The new version (released in November 2019) included Database and Web Server migration features, and Application Discovery.

In all likelihood, some companies went down the same route as the initial Office365 migrations (where they only migrated Email and never used any of the other underlying services included in their licenses), and in doing their Cloud Migrations to Azure decided to effectively “Rehost-only” and not use the additional benefits that were available. So instead of running Web Servers or Applications as part of an Azure App Service, they may have been left running on VMs with underlying Web or App Services.

Another good example here is the Finance or Warehouse Management Application that ran on a VM and also required a dedicated SQL backend (that also ran on a VM). Instead of refactoring that into an App Service or a Serverless SQL Database, it was left running on VMs in Azure. We all know that these VMs have spikes at certain times every month, so in that case the scalability that could have offered cost savings wasn’t implemented.

Why it should never have become an issue?

There are a number of contributing factors why Cloud Computing costs can spiral out of control. I’ve made the case for these below, and in some cases what can be done to address them:

Azure Reserved Instances – this is what Finance people love as they immediate savings and some semblance of how they can “CapEx their OpEx” costs over a longer period of time.
Azure Cost Management – Setting a budget or at least budget alerts on monthly spend can at least give you an indication of where you are each month. If you’re getting budget alerts emails on the 10th of each month, then you haven’t got either your budget or your Service SKU’s and Sizing right.
Azure Policy – have you set policies to say that you can only have certain VM SKUs, running on certain disk types, in certain regions?
RBAC Roles – this is the most important one and the biggest factor in “spend-creep”. Who can do what in your Azure Subscription? For example, have you granted developers Owner access in their own Resource Group so they can spin up what they want? Changing a SKU on a VM is single click operation, as is changing Disk type from HDD to SSD, redundancy from LRS to GRS etc. And do the policies you have set above apply across the subscription or have you exclusions set somewhere? Having control of your environemnt and assigning the correct roles.
Assessments – OK, this is a “after the horse has bolted” scenario, but its never too late to do it. Asking questions like why did you move in the first place, does it align with business goals, strategy and governance objectives.
Azure Advisor – its there, on every resource you are running in Azure and also as its own page in the portal, giving you recommendations based on over/under consumption and how you can address this.
Backup/DR- this has long been a bone of contention for some companies and I’ve experienced some who see Cloud-based backup solutions as either unnecessary or too expensive (because being in the cloud means we don’t need Backup or DR, right?).

Conclusion

I’ve based this article purely on costs and how you can utilize the various Tools, Policies and Governance tools available in Azure that can help make final decisions on whether Cloud Repatriation is the right choice for your business.

Hope you enjoyed this post, until next time!

Is it the (long overdue) end of the road for on-premises Exchange Servers?

A few weeks ago, I posted a Wired.com article on my LinkedIn feed entitled “Your Microsoft Exchange Server Is a Security Liability” by Andy Greenburg.

It was a great article that was released on the back of the most recent Exchange security vulnerability: this time the ProxyNotShell Zero-Day which oddly enough took almost 2 months to patch correctly. This has been released as part of the November Patch Tuesday release, and there are a few pre-requisites required (basically, be at the latest CU version for your Exchange environments and then apply the patch).

Its the latest in a long line of Exchange Server vulnerabilities. And its interesting to note this line in the Microsoft Tech Community Article that states:

These vulnerabilities affect Exchange Server. Exchange Online customers are already protected from the vulnerabilities addressed in these SUs and do not need to take any action other than updating any Exchange servers in their environment.

Well, of course Exchange Online isn’t affected. And in his Wired article, Andy Greenburg makes the point that Microsoft are happy to put all of their security efforts into protecting their Exchange Online services and customers as that makes up the majority of their customer base.

A brief history of Exchange Online

If we look back on the history of Exchange Online, it all started with BPOS way back in 2008. At the time of release, Microsoft had been privately offering customers a hosted email service since early 2007. That was around the time that Exchange Server 2007 was released, and it was also the time when Exchange started to get really complicated as regards the amount of different server roles involved and the overhead involved in maintaining them.

Now lets just put one thing on record. I would never dream of believing that Microsoft would conspire to over-complicate an on-premises solution with the intention of pushing more customers towards a cloud offering. I mean, they wouldn’t, would they?

There was always an option for having a Front-End sever separate, and the solution could sometimes be integrated with the long gone but not forgotten ISA Server.

A look at the diagram below shows us the evolution of how Exchange roles have changed since 2000/2003 versions, and have pretty much rolled back into less complicated instances with the release of 2016/2019 versions:

Whether Microsoft intended to make Exchange Server more complicated or not, segregation of those roles was was needed due to the evolution of security threats and the rate of attacks that were happening on Exchange Server installations. What it did though was make Exchange a monster to manage from an adminstration perspective. Almost to the point that it made the decision to migrate to Exchange Online easier, as it offset the cost for some organisations of hiring a full time Exchange Administrator to manage that environment.

So I should Migrate?

The easy answer to that is yes, you should migrate. There’s a number of factors to take into consideration in answering that question:

As we saw in the recent ProxyNotShell Zero-Day and the length of time it took to remediate, Microsoft really doesn’t care about on-premises Exchange anymore. From Andy’s Wired article, the quote from Microsoft states that: "We strongly recommend customers migrate to the cloud to take advantage of real-time security and instant updates to help keep their systems protected from the latest threats".
The recent announcement that the next CU release will only be for Exchange Server 2019 (CU13). Because 2013 (which goes EOL in April 2023) and 2016 are now in Extened support, there will only be Security Updates released as required (such as the patch for the Zero-Day). But in order to install that and to get support from Microsoft, you must be in the most recent (and last) CU version.
There hasn’t been an Exchange Server 2022 release yet. This was touted as being released in late 2021, and early indication were that this would be a subscription based service. The latest update on this was released in this post in June 2022, where the updated roadmap is to release the next Exchange Server version in 2025. Are we really prepared to wait that long if the vulnerabilities continue at this rate? Again, the interesting quote to take ouit of this release is: The next version will require Server and CAL licenses and will be accessible only to customers with Software Assurance, similar to the SharePoint Server and Project Server Subscription Editions.
If you decide to migrate to Exchange Online, what does your business want to get out of the migration? Its the question thats rarely asked but its the most important one for any migration scenario. Because unlike 15 years ago when it was hosted Email and SharePoint with Live Meetings thrown in, Microsoft 365 is an extensive offering of Apps, Services and Licencing options and can open a gateway to a full cloud migration if planned correctly.
You can go for the Basic plans such as Business Basic or Office 365 E1 and “just” have Email, Sharepoint and Teams if you want. But go a little further, you take Office licensing into the equation, and maybe Defender, and then maybe Azure Virtual Desktop rights. The opportunities are there, it’s not just about lifting and shifting the tech anymore. You can check out my previous post on the different licensing options here.

Why can’t everyone just migrate to Exchange Online?

The majority of companies have already migrated to Exchange – nearly 350 million Office365 users running over 7 billion (yes, billion) mailboxes running on 300,000 Exchange Online instances on servers running in Microsoft Datacenters across the world.

There are those special cases who still need Exchange Servers On-Premises, and those servers need to be hardened or have specialist teams supporting them.

Then there are those companies that have specific Data Residency requirements. And thats really all they say ….. "We're not moving our data into the Cloud". It shows a lack of understanding of how Data Residency in Exchange Online works. Depending on where you are in the world, you can find out on this site the different options for where your Microsoft 365 data would be stored post migration, depending on the options you select at tenant creation and also in what datacenters the services are available around the world (for example, Forms is not available in all datacenters, only some US ones).

Conclusion

Having your data secured by Microsoft is better than having your data potentially exposed because of a mistrust or misunderstanding of what the cloud can offer as regards data residency. You also have the admin overhead of managing and securing your Exchange environment.

I think its the end of the road for Exchange Server – while a migration amy sound painful to some, a compromised server is much worse.

Hope you enjoyed this post, until next time!

Microsoft Ignite 2022 – Highlights of the Announcements (with a few personal opinions thrown in)!

For this year’s Microsoft Ignite, in-person conferences were held in cities around the world after two years of being online and I was fortunate enough to attend the Manchester Spotlight event last week.

At the conference Microsoft had their usual presentations, ‘Ask the Expert’ sessions, exhibition areas and a Cloud Skills Challenge. But of course it’s the announcements that everyone looks forward to the most, where improvements, changes and updates to the various technologies in the Microsoft product portfolio are revealed.

I’ve picked out my top highlights below!

Azure Stack HCI

I’m on both sides of the fence about the Azure Stack HCI announcements.

I love the Azure Stack HCI product and have been using it since the days when it was called Storage Spaces Direct and ran on Hyper-Converged Infrastructure in on-premises datacenters. As it has evolved, Microsoft has invested heavily in the Azure Stack HCI product, which allows you to run Azure Managed Infrastructure in your own datacentres and combine on-premises infrastructure with Azure Cloud Services.

One of the big announcements was around licensing, and gives Enterprise Agreement customers with Software Assurance the ability to exchange their existing licensed cores of Windows Server Datacentre to get Azure Stack HCI at no additional cost. This includes the right to run unlimited Azure Kubernetes Service and unlimited Windows Server guest workloads on the Azure Stack HCI cluster.

Speaking of Kubernetes, support for Azure Kubernetes Service on Azure Stack HCI is now available, meaning you can deploy and manage containerised apps side-by-side with your VMs on the same physical server or cluster. You can also now make provisioning for hybrid AKS clusters directly from Azure onto your Azure Stack HCI using Azure Arc

On the hardware side, you could previously purchase validated hardware for multiple vendors but in early 2023, Microsoft will begin offering an Azure Stack HCI integrated system based on hardware that’s designed, shipped, and supported by Microsoft (in partnership with Dell).

This will be available in several configurations:

I mentioned both sides of the fence above, and the licensing announcement is one of the worrying ones, because like the recent announcements that Defender for Servers requires an Azure Subscription (Microsoft Defender for Endpoint (Server Version) is no longer available on the EA price list), we’re now potentially going down the route of Microsoft only allowing Windows Server Datacenter to run on Azure Stack HCI accredited hardware. Or potentially getting rid of the Windows Server Datacenter SKU entirely and having it as a “cloud-connected only” product. Only time will tell.

Azure Savings Plan for Compute

Azure Savings Plan for Compute is based on consumption, and allows you to by a one- or three-year savings plan and commit to a spend of $5 per hour per virtual machine (VM). This is based on Azure Advisor Recommendations in the Cost Management and Billing section of the Azure Portal.

Once purchased, this is applied on a hourly basis based on consumption and even if you go above the $5 spend, the initial commitment is still billed at the lower rate and any additional consumption is billed at a Pay-As-You-Go rate.

The main difference between this and Reserved Instances is that Reserved Instances is an up-front commitment whether the VM is powered on or not. Azure Savings Plan for Compute unlocks those lower savings based on consumption.

You can find more details in this article on the Microsoft Community Hub.

Azure Virtual Machine Scale Sets – Mixing Standard and Spot instances

Staying on the Cost Savings topic, you can now specify a % of Spot Instance VMs that you wish to run in a VM Scale Set.

This feature (which is in Preview) allows you to reduce compute infrastructure costs by leveraging the deep discounts that Spot VMs can provide while maintaining the compute capacity your workload needs.

More information can be found here.

Microsoft 365 updates

A huge number of announcements were made about Microsoft 365 at this year’s Ignite, most notably:

The release of the Microsoft 365 app, which will replace the Office Mobile and Office for Windows App for all Microsoft 365 customers who use this as part of their subscriptions.
Teams Premium, which will be available to E5 subscriptions and will bring enhanced meeting features such as insights and live translation in more than 40 languages so that participants can read captions in their own language.
Microsoft Places, which will assist with the hybrid working model and let everyone know who will be in the office at what times, where colleagues are sitting, what meetings to attend in person; and how to book space on the days your team is planning to go into the office.

The Teams announcements are great, in particular the live translation option. For us as a multi-national and multi-language organisation, this is a massive step in fostering the inclusion of all users. There is an assumption in the world that spoken English is the native language of Tech, but it’s not everyone’s first language.

Microsoft Intune

Microsoft Endpoint Manager is being renamed to Microsoft Intune, which is what it was called before it was renamed to Endpoint Manager. This effectively bundles all Endpoint Management tools under a single brand, including Microsoft Configuration Manager. Some of the main features announced were:

ServiceNow Integration
Cloud LAPS for Azure Virtual Machines
Update Policies or MacOS and Linux Support
Endpoint Privileged Management – no more permanent admin permissions on devices!

For me, Endpoint Privileged Management is huge addition which removes the need for any permanent administrative permissions on devices. Cloud LAPS is also a huge security step.

Security

Finally on to Security, which was a big focus this year. This year’s updates to the Microsoft Security portfolio coincided with the announcement that Microsoft is now recognised as a leader in the Gartner Magic Quadrant for Security Information and Event Management.

First and foremost is Microsoft’s announcement of a limited-time sale of 50% off Defender for Endpoint Plan 1 and Plan 2 licenses, allowing organisations to do more and spend less by modernising their security with a leading endpoint protection platform. The offer runs until June 2023.

Microsoft 365 Defender now automatically disrupts ransomware attacks. This is possible because Microsoft 365 Defender collects and correlates signals across endpoints, identities, emails, documents and cloud apps into unified incidents and uses the breadth of signal to identify attacks early with a high level of confidence. Microsoft 365 Defender can automatically contain affected assets, such as endpoints or user identities. This helps stop ransomware from spreading laterally.

A number of new capabilities have been announced for Defender for Cloud:

Microsoft Defender for DevOps: A new solution that will provide visibility across multiple DevOps environments to centrally manage DevOps security, strengthen cloud resource configurations in code and help prioritise remediation of critical issues in code across multi-pipeline and multicloud environments. With this preview, leading platforms like GitHub and Azure DevOps are supported and other major DevOps platforms will be supported shortly.
Microsoft Defender Cloud Security Posture Management (CSPM): This solution, available in preview, will build on existing capabilities to deliver integrated insights across cloud resources, including DevOps, runtime infrastructure and external attack surfaces, and will provide contextual risk-based information to security teams. Defender CSPM provides proactive attack path analysis, built on the new cloud security graph, to help identify the most exploitable resources across connected workloads to help reduce recommendation noise by 99%.
Microsoft cloud security benchmark: A comprehensive multicloud security framework is now generally available with Microsoft Defender for Cloud as part of the free Cloud Security Posture Management experience. This built-in benchmark maps best practices across clouds and industry frameworks, enabling security teams to drive multicloud security compliance.
Expanded workload protection capabilities: Microsoft Defender for Servers will support agentless scanning, in addition to an agent-based approach to VMs in Azure and AWS. Defender for Servers P2 will provide Microsoft Defender Vulnerability Management premium capabilities.

If you’d like to read more about Microsoft’s Ignite announcements from the conference, then go to Microsoft’s Book of News here.

Hope you enjoyed this post, until next time!

MFA and Conditional Access alone won’t save us from Threat Actors

In the end of a week where we have had 2 very different incidents at high profile organisations across the globe, its interesting to look at these and compare them from the perspective of incident response and the “What we could have done to prevent this from happening” question.

Lets analyze that very question – in the aftermath of the majority of cases, the “What could we have done to prevent this from happening” question invariably leads in to the next question of “What measures can we put in place to prevent this from happening in the future”.

The problem with the 2 questions is that they are reactive and come about only because the incident has happened. And it seems that in both incidents, the required security systems were in place.

Or were they?

A brief analysis of the attacks

Holiday Inn

If we take the Holiday Inn attack, the hackers (TeaPea) have said in a statement that:

"Our attack was originally planned to be a ransomware but the company's IT team kept isolating servers before we had a chance to deploy it, so we thought to have some funny [sic]. We did a wiper attack instead," one of the hackers said.

This is interesting because it suggests that the Holiday Inn IT team had a mechanism to isolate the servers in an attempt to contain the attack. The problem was that once the attackers were inside their systems and they realized that the initial scope that their attack was based on wasn’t going to work, their focus changed from Cybercriminals who were trying to make a profit to Terrorism, where they decided to just destroy as much data as they could.

Image Credit – Northern Ireland Cyber Security Centre

Essentially, the problem here is two-fold – firstly, you can have a Data Loss Prevention system in place but its not going to report on or block “Delete” actions until its too late or in some cases not at all.

Second, they managed to access the systems using a weak password. So (am I’m making assumptions here), while the necessary defences and intrusion-detection technologies may have been in place, that single crack in the foundations was all it took.

So the how did they get in? The 2 part of their statement shown below explains it all:

TeaPea say they gained access to IHG's internal IT network by tricking an employee into downloading a malicious piece of software through a booby-trapped email attachment.

The criminals then say they accessed the most sensitive parts of IHG's computer system after finding login details for the company's internal password vault. The password was Qwerty1234.

Ouch ….. so the attack originated as a Social Engineering attack.

Uber

We know a lot more about the Uber hack and again this is a case of an attack that originated with Social Engineering. Here’s what we know at this point:

The attack started with a social engineering campaign on Uber employees, which yielded access to a VPN, in turn granting access to Uber’s internal network *.corp.uber.com.
Once on the network, the attacker found some PowerShell scripts, one of which contained hardcoded credentials for a domain admin account for Uber’s Privileged Access Management (PAM) solution.
Using admin access, the attacker was able to log in and take over multiple services and internal tools used at Uber: AWS, GCP, Google Drive, Slack workspace, SentinelOne, HackerOne admin console, Uber’s internal employee dashboards, and a few code repositories.

Again, we’re going to work off the assumption (and we need to make this assumption as Uber had been targeted in both 2014 and 2016) that the necessary defences and intrusion detection was in place.

Once the attackers gained access, the big problem here is the one thats highlighted above – hardcoded domain admin credentials. Once they had those, they could then move across the network doing whatever they pleased. And undetected as well, as its not unusual for a domain admin account to have multiple access across the network. And it looks like Uber haven’t learned from their previous mistakes, because as Mackenzie Jackson of GitGuardian reported:

“There have been three reported breaches involving Uber in 2014, 2016, and now 2022. It appears that all three incidents critically involve hardcoded credentials (secrets) inside code and scripts”

So what can we learn?

What these attacks teach us is that we can put as much technology, intrusion and anomaly detection into our ecosystem as we like, but the human element is always going to be the one that fails us. Because as humans, we are fallible. Its not a stick to beat us with (and like most, I do have a lot of sympathy for those users in Uber, Holiday Inn and all of the other companies who have been victim to attakcs that began with Social Engineering).

Do we need constant training and CyberSecurity programmes in our organisations to ensure that our users are aware of these sorts of attacks? Well, they do now at Uber and Holiday Inn but as I said at the start of the article, this will be a reactive measure for these companies.

The thing is though, most of these programmes are put in as “one-offs” in response to an audit where a checkbox is required to say that such user training has been put in place. And once the box has been checked, they’re forgotten about until the next audit is needed.

We can also say that the priveleged account management processes failed in both companies (weak passwords in one, hardcoded credentials in another).

Conclusion

Multi-Factor Authentication. Conditional Access. Microsoft Defender. Anomaly Detection. EDR and XDR. Information Protection. SOC. SIEM. Priveleged Identity Management. Strong Password Policies.

We can tech the absolute sh*t out of our systems and processes, but don’t forget to train and protect the humans in the chain. Because ultimately when they break, the whole system breaks down.

And the Threat Actors out there know this all too well. They know the systems are there, but they need a human to get them past those walls. MFA and Conditional Access can only save us for so long.