Three Azure Networking Assumptions That Will Burn You in Production

Azure networking documentation covers a lot of ground. What it is less good at is surfacing the assumptions embedded in common configurations — the things that appear safe on paper but create real risk in production environments.

This post is about three of those assumptions:

  • NSG service tags and when to use them instead of IP ranges
  • The impact of default routes on Azure service connectivity
  • The behaviour of Private Endpoints in relation to NSG enforcement.

These are not edge cases — they appear in standard Azure architectures and are the source of a disproportionate number of production networking incidents.

NSG Service Tags vs IP Ranges and CIDRs — Getting the Choice Right

Network Security Groups are the primary mechanism for controlling traffic in Azure virtual networks. When writing NSG rules, you have two options for specifying the source or destination of traffic: you can use explicit IP addresses and CIDR ranges, or you can use service tags.

Image Credit – Microsoft Learn

Both approaches have a place. The mistake is using one when you should be using the other.

What Service Tags Actually Are

Image Credit – Microsoft Learn

A service tag is a named group of IP address prefixes associated with a specific Azure service. When you reference a service tag in an NSG rule, Azure resolves it to the underlying IP ranges automatically — and manages and updates those ranges on a weekly basis. When you use a service tag, you are delegating IP list management to Microsoft. For Azure-managed services like Storage, Key Vault, and Azure Monitor, that is exactly the right trade-off. You should not be manually maintaining and updating those ranges.

When to Use Service Tags

Service tags belong in NSG rules wherever you are controlling traffic to or from an Azure-managed service whose IP ranges change over time: Azure Monitor, Key Vault, Container Registry, SQL, Service Bus, Event Hub. They are also the right choice for platform-level rules — using the AzureLoadBalancer tag for health probe allowances, for example, is far more reliable than trying to maintain a list of probe IPs.

When IP Ranges and CIDRs Are the Right Choice

Explicit CIDRs belong in NSG rules when you are controlling traffic between resources you own — on-premises ranges, partner network CIDRs, specific application subnets within your own VNet, or third-party services with stable published IP ranges. When your security team needs to audit exactly which addresses a rule permits, a CIDR answers that definitively. A service tag defers the answer to a Microsoft-managed list that changes weekly.

The Service Tag Scope Problem

The most common service tag mistake is using broad global tags when regional variants are available and appropriate.

Consider AzureCloud. Using this tag in an NSG rule opens access to the IP ranges associated with all Azure services globally — and critically, that includes IP addresses used by other Azure customers, not just Microsoft’s own infrastructure. This means AzureCloud is a much broader permission than most engineers assume. Microsoft’s own documentation explicitly warns that in most scenarios, allowing traffic from all Azure IPs via this tag is not recommended. If your workload only needs to communicate with services in West Europe, using AzureCloud.WestEurope instead gives you the same coverage for your actual traffic pattern while dramatically reducing the permitted address space.

TagScopeRecommendation
AzureCloudAll Azure IP ranges globally — very broadAvoid. Use regional variant or specific service tag.
AzureCloud.WestEuropeAzure IP ranges for West Europe onlyUse when regional scoping is sufficient.
StorageAll Azure Storage endpoints globallyPrefer Storage.<region> where possible.
Storage.WestEuropeAzure Storage in West Europe onlyPreferred for regionally scoped workloads.
AzureMonitorAzure Monitor endpointsAppropriate for monitoring agent outbound rules.
AzureLoadBalancerAzure Load Balancer probe IPsAlways use for health probe allow rules.

The practical enforcement approach is to use Azure Policy to flag or deny NSG rules that reference broad global tags where regional equivalents exist. This moves the governance left — catching overly permissive rules before they reach production rather than after.

# Verify current service tag ranges for a region
az network list-service-tags \
--location westeurope \
--output json \
--query "values[?name=='AzureCloud.WestEurope']"

Default Routes, UDRs, and What Force Tunnelling Actually Breaks

Routing all outbound traffic through a central firewall via a 0.0.0.0/0 UDR is a standard hub-and-spoke pattern. Security teams require it, and it works — but it consistently catches engineers out in one area the documentation does not make obvious enough.

The problem is not that a default route intercepts too much traffic at the network layer. The problem is that most force-tunnelling configurations are deployed without firewall rules to permit the Azure service traffic that workloads silently depend on, and the symptoms that follow are rarely traced back to the routing change quickly.

168.63.129.16 — The Platform IP You Need to Understand

Before going further, it is worth being precise about 168.63.129.16. Microsoft documents this as a virtual public IP address — not a link-local address, but a special public IP owned by Microsoft and used across all Azure regions and national clouds. It provides DNS name resolution, Load Balancer health probe responses, DHCP, VM Agent communication, and Guest Agent heartbeat for PaaS roles.

The important thing to know about 168.63.129.16 in the context of UDRs is this: Microsoft Learn explicitly states that this address is a virtual IP of the host node and as such is not subject to user defined routes. Azure’s DHCP system injects a specific classless static route for 168.63.129.16/32 via the subnet gateway, ensuring platform traffic bypasses UDRs at the platform level. A 0.0.0.0/0 default route does not intercept traffic to this address.

What a 0.0.0.0/0 UDR does intercept is everything else: general internet-bound traffic, and outbound traffic to Azure service public endpoints — including Azure Monitor, Azure Key Vault, Microsoft Entra ID, Azure Container Registry, and any other PaaS service your workloads communicate with. These services are reachable via public IPs, and those IPs are subject to UDRs.

⚠️ What actually breaks when you deploy a 0.0.0.0/0 UDR without firewall rules Workloads that depend on Azure Monitor for diagnostics stop sending telemetry. Managed identity token acquisition fails if traffic to the Entra ID and IMDS endpoints is not permitted by the firewall. Applications pulling images from Azure Container Registry, secrets from Key Vault, or messages from Service Bus will lose connectivity. The common thread is that your firewall is now in the path of traffic it has not been told to permit. The workloads fail; the routing change is often not the first place anyone looks.

The Right Approach to Force Tunnelling

Deploying a 0.0.0.0/0 UDR and a corresponding firewall must be treated as a single unit of change, not two separate steps. The firewall rules need to be in place before the UDR is applied, not after symptoms appear.

Before any UDR deployment, inventory the Azure service dependencies of every workload on the affected subnet and verify that the firewall policy explicitly permits outbound traffic to the corresponding service tags. AzureMonitor, AzureActiveDirectory, AzureKeyVault, Storage, AzureContainerRegistry — each service your workloads depend on must have a corresponding firewall application or network rule. Then verify Effective Routes on the affected subnets and NICs to confirm what will happen before it happens.

# Review effective routes before any UDR change
az network nic show-effective-route-table \
--resource-group <rg-name> \
--name <nic-name> \
--output table

A route table change that appears clean from a routing perspective can still break applications if the firewall has gaps. Effective Routes verification and firewall rule review should both be mandatory steps in any network change process that involves UDRs.

Private Endpoints and the NSG Enforcement Gap

Private Endpoints are one of the most effective controls for locking down access to Azure PaaS services. When you deploy a Private Endpoint for a storage account, Key Vault, or SQL database, that service gets a private IP on your VNet subnet and traffic travels within the Azure backbone. The assumption that naturally follows — that an NSG on the Private Endpoint’s subnet controls access to it — is incorrect by default.

How NSG Evaluation Works for Private Endpoints

By default, network policies are disabled for a subnet in a virtual network — which means NSG rules are not evaluated for traffic destined for Private Endpoints on that subnet. This is a platform default, not a misconfiguration. A deny rule you expect to block access from a specific source will be silently ignored.

⚠️ Why this matters An application or workload with network connectivity to the Private Endpoint’s subnet can reach the private IP of the endpoint regardless of the NSG rules you have defined. There is no error, no alert, and no indication in the portal that the NSG is not being evaluated. The exposure is silent.

Enabling NSG Enforcement on Private Endpoint Subnets

Microsoft added the ability to restore NSG evaluation for Private Endpoint traffic through a subnet property called privateEndpointNetworkPolicies. Setting this property to Enabled causes NSG rules to be evaluated for traffic destined for Private Endpoints on that subnet, in the same way they would be for any other resource.

Azure CLI Method:

# Enable NSG enforcement for Private Endpoints on a subnet
az network vnet subnet update \
--resource-group <rg-name> \
--vnet-name <vnet-name> \
--name <subnet-name> \
--disable-private-endpoint-network-policies false

Portal Method:

This change should be applied to all subnets where Private Endpoints are deployed, and it should be part of your standard subnet configuration in IaC rather than something applied reactively after deployment. In Terraform, the equivalent property is private_endpoint_network_policies = “Enabled” on the subnet resource.

NSG Rules Are Not Enough on Their Own

Enabling NSG enforcement is necessary, but it is not sufficient as the only access control for sensitive data services. Network controls restrict which sources can reach an endpoint — they cannot govern what those sources do once connected. Managed Identity with scoped RBAC assignments should be the minimum access model for any workload reaching Azure data services through a Private Endpoint.

✅ The defence-in-depth model for Private Endpoints Enable privateEndpointNetworkPolicies on the subnet so that NSG rules are enforced. Write NSG rules that restrict inbound access to the Private Endpoint’s private IP to only the sources that need it. Require Managed Identity authentication with scoped RBAC assignments for all service access. Disable public network access on the backing service entirely. These controls work together — removing any one of them weakens the posture.

Summary: Three Controls, Three Gaps

AreaCommon AssumptionThe RealityThe Fix
NSG Service TagsAzureCloud is a safe, conservative choiceAzureCloud is a broad, dynamic tag covering all Azure IPs globallyUse regional tags (AzureCloud.WestEurope). Enforce with Azure Policy.
Default Routes / UDRsA 0.0.0.0/0 UDR and a firewall are all you need to control outbound traffic168.63.129.16 is not subject to UDRs, but all Azure service endpoint traffic is — if the firewall has no rules for it, workloads break silentlyDefine firewall rules for all Azure service dependencies before applying the UDR. Check Effective Routes first.
Private EndpointsAn NSG on the PE subnet controls access to the endpointNSG rules are not evaluated for PE traffic by defaultEnable privateEndpointNetworkPolicies on the subnet. Require Managed Identity + RBAC.

Conclusion

The three patterns in this post — service tag scoping, force-tunnelling configuration, and Private Endpoint NSG enforcement — share a common characteristic: they are not wrong configurations. They are default behaviours, or natural-seeming choices, whose consequences the documentation does not surface at the point where those choices are made.

The goal is to move the point of discovery earlier. Understanding that AzureCloud covers other customers’ IP ranges before writing that NSG rule. Knowing that a 0.0.0.0/0 UDR puts your firewall in the path of all Azure service traffic before applying that route table. Checking Private Endpoint network policies before writing the first NSG rule for a PE subnet. These are the things that turn reactive incident investigation into proactive design decisions.

Azure networking is not inherently complex. But it rewards engineers who take the time to understand what the defaults are doing, and why.

Azure Networking Zero to Hero – Network Security Groups

In this post, I’m going to stay within the boundaries of our Virtual Network and briefly talk about Network Security Groups, which filter network traffic between Azure resources in an Azure virtual network.

Overview

So, its a Firewall right?

NOOOOOOOOOO!!!!!!!!

While a Network Security Group (or NSG for short) contains Security Rules to allow or deny inbound/outbound traffic to/from several types of Azure Resources, it is not a Firewall (it may be what a Firewall looked like 25-30 years ago, but not now). NSG’s can be used in conjunction with Azure Firewall and other network security services in Azure to help secure and shape how your traffic flows between subnets and resources.

Default Rules

When you create a subnet in your Virtual Network, you have the option to create an NSG which will be automatically associated with the subnet. However, you can also create an NSG and manually associate it with either a subnet, or directly to a Network Interface in a Virtual Machine.

When an NSG is created, it always has a default set of Security Rules that look like this:

The default Inbound rules allow the following:

  • 65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
  • 65001 — Allows Azure Load Balancer to communicate with the Hosts/resources
  • 65500 — Deny all other Inbound traffic

The default Outbound rules allow the following:

  • 65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
  • 65001 — Allows all Internet Traffic outbound
  • 65500 — Deny all other Outbound traffic

The default rules cannot be edited or removed. NSG’s are created initially using a Zero-Trust model. The rules are processed in order of priority (lowest numbered rule is processed first). So you would need to build you rules on top of the default ones (for example, RDP and SSH access if not already in place).

Configuration and Traffic Flow

Some important things to note:

  • The default “65000” rules for both Inbound and Outbound – this allows all virtual network traffic. It means that if we have 2 subnets which each have a virtual machine, these would be able to communicate with each other without adding any additional rules.
  • As well as IP addresses and address ranges, we can use Service Tags which represents a group of IP address prefixes from a range of Azure services. These are managed and updated by Microsoft so you can use these instead of having to create and manage multiple Public IP’s for each service. You can find a full list of available Service Tags that can be used with NSG’s at this link. In the image above, “VirtualNetwork” and “AzureLoadBalancer” are Service Tags.
  • A virtual network subnet or interface can only have one NSG, but an NSG can be assigned to many subnets or interfaces. Tip from experience, this is not a good idea – if you have an application design that uses multiple Azure Services, split these services into dedicated subnets and apply NSG’s to each subnet.
  • When using a NSG associated with a subnet and a dedicated NSG associated with a network interface, the NSG associated with the Subnet is always evaluated first for Inbound Traffic, before then moving on to the NSG associated with the NIC. For Outbound Traffic, it’s the other way around — the NSG on the NIC is evaluated first, and then the NSG on the Subnet is evaluated. This process is explained in detail here.
  • If you don’t have a network security group associated to a subnet, all inbound traffic is blocked to the subnet/network interface. However, all outbound traffic is allowed.
  • You can only have 1000 Rules in an NSG by default. Previously, this was 200 and could be raised by logging a ticket with Microsoft, but the max (at time of writing) is 1000. This cannot be increased. Also, there is a max limit of 5000 NSG’s per subscription.

Logging and Visibility

  • Important – Turn on NSG Flow Logs. This is a feature of Azure Network Watcher that allows you to log information about IP traffic flowing through a network security group,  including details on source and destination IP addresses, ports, protocols, and whether traffic was permitted or denied. You can find more in-depth details on flow logging here, and a tutorial on how to turn it on here.
  • To enhance this, you can use Traffic Analytics, which analyzes Azure Network Watcher flow logs to provide insights into traffic flow in your Azure cloud.

Conclusion

NSGs are fundamental to securing inbound and outbound traffic for subnets within an Azure Virtual Network, and form one of the first layers of defense to protect application integrity and reduce the risk of data loss prevention.

However as I said at the start of this post, an NSG is not a Firewall. The layer 3 and layer 4 port-based protection that NSGs provide has significant limitations and cannot detect other forms of malicious attacks on protocols such as SSH and HTTPS that can go undetected by this type of protection.

And that’s one of the biggest mistakes I see people make – they assume that NSG’s will do the job because Firewalls and other network security sevices are too expensive.

Therefore, NSG’s should be used in conjunction with other network security tools, such as Azure Firewall and Web Application Firewall (WAF), for any devices presented externally to the internet or other private networks. I’ll cover these in detail in later posts.

Hope you enjoyed this post, until next time!!

100 Days of Cloud — Day 9: Azure Network Security Groups (NSG)

It’s Day 9, and today I’m delving into NSG’s, or Network Security Groups.

During previous posts when I was deploying Virtual Machines, you would have noticed that the deployment created a number of resources in the Resource Groups:

  • Virtual Network
  • Subnet
  • Public IP Address
  • Interface
  • Virtual Machine
  • NSG or Network Security Group

I’ve pretty much flogged Virtual Machines to death at this stage (I can hear the screams, NOOOOOO PLEASE, NO MORE VIRTUAL MACHINES!!!!). Fear not, I’m not going to return to Virtual Machines …. just yet. I’ll deal with Virtual Networks and Subnets in my next post, but today I want to give an overview of NSG’s, how important they are and how useful they can be.

Overview

Network Security Groups in Azure can be used to filter traffic to and from resources in an Azure Virtual Network. It contains Security Rules to allow or deny inbound/outbound traffic to/from several types of Azure Resources. NSG’s can be applied to either Subnets within a Virtual Network, or else directly to a Network Interface in a Virtual Machine.

When an NSG is created, it always has a default set of Security Rules that look like this:

The default Inbound rules allow the following:

  • 65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
  • 65001 — Allows Azure Load Balancer to communicate with the Hosts/resources
  • 65500 — Deny all other Inbound traffic

The default Outbound rules allow the following:

  • 65000 — All Hosts/Resources inside the Virtual Network to Communicate with each other
  • 65001 — Allows all Internet Traffic outbound
  • 65500 — Deny all other Outbound traffic

It’s pretty restrictive. This is because Azure NSG’s are created initially using a Zero-Trust model. The rules are processed in order of priority (lowest numbered rule is processed first). So you would need to build you rules on top of the default ones (for example, RDP and SSH access if not already in place).

Also, an important thing to remember. I mentioned that you can have an NSG associated with a Subnet or a NIC. You can also have both — a Subnet NSG will always be created automatically with the first Subnet that is created in a Resource Group, you can also create a dedicated NSG for a NIC in a VM that’s sitting in that subnet. In this instance, the NSG associated with the Subnet is always evaluated first for Inbound Traffic, before then moving on to the NSG associated with the NIC. For Outbound Traffic, it’s the other way around — the NSG on the NIC is evaluated first, and then the NSG on the Subnet is evaluated.

Example of an NSG in Action

I’ve created a VM in Azure (as promised, I won’t torture you with this process again 😉….

I click into the VM to look at the settings:

Let’s click on the “Connect” button — this will give us the option to use RDP, SSH or Bastion. I’ll choose RDP:

And this will give us a link to download an RDP File:

Click Connect:

I get prompted for credentials:

And I’m in!!

Now, lets take a look “under the hood”. Back in the Portal, and on the side menu, I click “Networking”. This brings me into the Network Security Group for the VM:

I can see that RDP is set to Allow, so I’m going to click on “Allow” in the Action Column, and set the RDP policy to “Deny”:

Now, I’ll try to connect to the VM again:

Exactly what I wanted to see. That’s shows an NSG in action and how you can allow or deny rules.

Some Important Considerations

There are a few things you need to be aware of when using Network Security Groups:

  • You can use the same NSG on multiple Subnets and NICs
  • You can only have 1000 Rules in an NSG by default. Previously, this was 200 and could be raised by logging a ticket with Microsoft, but the max (at time of writing) is 1000. This cannot be increased.
  • Security Rules can affect Traffic between Resources in the same Subnet. So you recall our first default rules for both Inbound and Outbound are to do with “AllowVnetInBound” and “AllowVnetOutBound”. This is a default rule because it allows intra-subnet traffic. If you create a “Deny” Rule above either of these with a lower priority, it can cause communication issues. Of course, there may be a good reason to do this, but just be careful and understand the implications — the default rules exist for a reason!

Conclusion

Now you can use Network Security Groups to filter and manage incoming and outgoing traffic for your virtual network. Network Security Groups provide a simple and effective way to manage network traffic.

Hope you enjoyed this post, until next time!!