Once we start running workloads in Azure, there will always be the question of security. Azure provides methods for the customer to manage security on the vNets and other resources. By default VM’s when created are locked down to block all incoming traffic. Let’s look at how this is done.
Network security groups
Virtual networks (VNets) are the foundation of the Azure networking model, and provide isolation and protection. Network security groups (NSGs) are the primary tool you use to enforce and control network traffic rules at the networking level. NSGs are an optional security layer that provides a software firewall by filtering inbound and outbound traffic on the VNet. However, the traffic analytics are captured by the NSG, so if you do not have a NSG you do not get flow logs.
Security groups can be associated to a network interface (for per host rules), a subnet in the virtual network (to apply to multiple resources), or both levels.
During VM creation, a network security group is created by default. You can also select none, or an already existing NSG. While you can have a NSG applied on the NIC and the subnet, this can be very cumbersome for administration. NSG’s do allow you to create flow logs and conduct traffic analysis.
NSGs use rules to allow or deny traffic moving through the network. Each rule identifies the source and destination address (or range), protocol, port (or range), direction (inbound or outbound), a numeric priority, and whether to allow or deny the traffic that matches the rule.
Each security group has a set of default security rules to apply the default network rules. These default rules cannot be modified but can be overridden.
How NSG rules get used
For inbound traffic, Azure processes the security group associated to the subnet, and then the security group applied to the network interface. Outbound traffic is handled in the opposite order (the network interface first, followed by the subnet).
Keep in mind that security groups are optional at both levels. If no security group is applied, then all traffic is allowed by Azure. If the VM has a public IP, this could be a serious risk, particularly if the OS doesn’t provide a built-in firewall.
The rules are evaluated in priority order, starting with the lowest priority rule. Deny rules always stop the evaluation.
The last rule is always a Deny All rule. This is a default rule added to every security group for both inbound and outbound traffic with a priority of 65500. That means to have traffic pass through the security group, you must have an allow rule, or the final default rule will block it.
SMTP (port 25) is a special case. Depending on your subscription level and when your account was created, outbound SMTP traffic may be blocked. You can request to remove this restriction with business justification.
NSG’s are just one way on controlling traffic flow, they act as the inspection point and control for east/west traffic. The east/west and north/south traffic idea is that traffic that is already inside the environment is considered east/west. Traffic that is coming from outside the network would be considered north/south traffic. Ie it needs to come through a firewall.
The security aspect is handled through adding layers that will inspect the traffic. So let’s look at how traffic would be inspected.
Microsoft maintains a global backbone network that connects the regions and datacenters. While regions are configured to have pairs, to allow Disaster recovery, they can also connect to other regions. Although, not all regions have routes to connect to each other.
The backbone network is how Azure services route traffic. If possible, traffic is kept on the backbone network. This means that traffic is not going out on the public internet. This is where you see service endpoints when you create a subnet. You can specify which service endpoints can connect to the subnet. These service endpoints correlate to Azure services. If there is no service endpoint allowed, then you will have to route traffic on the public internet to the nearest point of entry to the backbone network. Private endpoints can also be used, if you need to create a custom route. These are typically seen when you are doing things like AKS private clusters or private ACR’s.
In Microsoft documentation, you will often see the backbone network labelled as the “Microsoft Wide Area Network”. The regional gateways provide the path for traffic to get to the different availability zones. These Availability zones are collections of data centers.
The datacenters maintain clusters of physical blades / nodes with ~1000 blades/node in each cluster. The clusters are managed by a fabric controller. Each data center is replicated to other datacenters in the availability zone. The number of datacenters can vary but is usually 3.
A regional pair consists of two regions within the same geography. Azure serializes platform updates (planned maintenance) across regional pairs, ensuring that only one region in each pair updates at a time. If an outage affects multiple regions, at least one region in each pair will be prioritized for recovery.
So, if you are setting up disaster recovery between regions, you will need to make sure you are checking the latency between those regions and what the paired region is.
And on to traffic flow!
When traffic is coming from the public internet to a VM in Azure, the traffic will be routed to the closest point of entry to get enter the backbone network. Microsoft maintains firewalls and routers with ACL’s to control traffic and filter unwanted network traffic. Internet traffic for Azure is routed to the nearest datacenter, a connection is established to the access routers. These access routers serve to isolate traffic between Azure nodes and customer instantiated VMs. Network infrastructure devices at the access and edge locations are the boundary points where ingress and egress filters are applied. These routers are configured through a tiered access-control list (ACL) to filter unwanted network traffic and apply traffic rate limits, if necessary. Traffic that is allowed by ACL is routed to the load balancers. Distribution routers are designed to allow only Microsoft-approved IP addresses, provide anti-spoofing, and establish TCP connections that use ACLs. External load-balancing devices are located behind the access routers to perform network address translation (NAT) from internet-routable IPs to Azure internal IPs. The devices also route packets to valid production internal IPs and ports, and they act as a protection mechanism to limit exposing the internal production network address space.
Once we enter the customer side of the tenant, vNet’s come into play. Each virtual network or vNet can have its own firewall and each subnet can have its own route table with custom routes or user-defined routes. This can be useful for routing traffic to a 3rd party firewall.
Traffic leaves the public internet and enters the backbone network (1st layer of security) then travels along the backbone to the regional gateway (2nd layer of security). The traffic is then sent on to the nearest datacenter (3rd layer of security) here is where the traffic is passed to the customer managed vNet (optional 4th layer of security) which could be protected by a firewall. The traffic is then sent on to the subnet (Optional 5th layer of security) where a NSG could inspect the traffic and apply NSG rules before sending it on to the network interface of the VM (Optional 6th layer of security) If the traffic is for an application, there is another security group called an application security group that could also be applied.
So, we have potentially 6 layers of protection from internet traffic to the VM. Three of these layers are up to the customer to implement.
Hopefully this helps to explain things a bit, but what questions are still lurking in your mind?