Service endpoints and private endpoints hands-on: including Azure Backbone, storage account firewall, DNS, VNET and NSGs

Towards Data Science
Connected Network — image by Nastya Dulhiier on Unsplash

Storage accounts play a vital role in a medallion architecture for establishing an enterprise data lake. They act as a centralized repository, enabling seamless data exchange between producers and consumers. This setup empowers consumers to perform data science tasks and build machine learning (ML) models. Furthermore, consumers can use the data for Retrieval Augmented Generation (RAG), facilitating interaction with company data through Large Language Models (LLMs) like ChatGPT.

Highly sensitive data is typically stored in the storage account. Defense in depth measures must be in place before data scientists and ML pipelines can access the data. To do defense in depth, multiple measurement shall be in place such as 1) advanced threat protection to detect malware, 2) authentication using Microsoft Entra, 3) authorization to do fine grained access control, 4) audit trail to monitor access, 5) data exfiltration prevention, 6) encryption, and last but not least 7) network access control using service endpoint or private endpoints.

This article focuses on network access control of the storage account. In the next chapter, the different concepts are explained (demystified) on storage account network access. Following that, a hands-on comparison is done between service endpoint and private endpoints. Finally, a conclusion is drawn.

A typical scenario is that a virtual machine needs to have network access to a storage account. This virtual machine often acts as a Spark cluster to analyze data from the storage account. The image below provides an overview of the available network access controls.

2.1 Overview of networking between virtual machine and storage account — image by author

The components in the image can be described as follows:

Azure global network — backbone: Traffic always goes over Azure backbone between two regions (unless customer forces to not do it), see also Microsoft global network — Azure | Microsoft Learn. This is regardless of what firewall rule is used in the storage account and regardless whether service endpoints or private endpoints are used.

Azure storage firewalls: Firewall rules can restrict or disable public access. Common rules include whitelisting VNET/subnet, public IP addresses, system-assigned managed identities as resource instances, or allowing trusted services. When a VNET/subnet is whitelisted, the Azure Storage account identifies the traffic’s origin and its private IP address. However, the storage account itself is not integrated into the VNET/subnet — private endpoints are needed for that purpose.

Public DNS storage account: Storage accounts will always have a public DNS that can be access via network tooling, see also Azure Storage Account — Public Access Disabled — but still some level of connectivity — Microsoft Q&A. That is, even when public access is disabled in the storage account firewall, the public DNS will remain.

Virtual Network (VNET): Network in which virtual machines are deployed. While a storage account is never deployed within a VNET, the VNET can be whitelisted in the Azure storage firewall. Alternatively, the VNET can create a private endpoint for secure, private connectivity.

Service endpoints: When whitelisting a VNET/subnet in the Storage account firewall, the service endpoint must be turned on for the VNET/subnet. The service endpoint should be Microsoft.Storage when the VNET and storage account are in the same region or Microsoft.Storage.Global when the VNET and storage are in different regions. Note that service endpoints is also used as an overarching term, encompassing both the whitelisting of a VNET/subnet on the Azure Storage Firewall and the enabling of the service endpoint on the VNET/subnet.

Private endpoints: Integrating a Network Interface Card (NIC) of a Storage Account within the VNET where the virtual machine operates. This integration assigns the storage account a private IP address, making it part of the VNET.

Private DNS storage account: Within a VNET, a private DNS zone can be created in which the storage account DNS resolves to the private endpoint. This is to make sure that virtual machine can still connect to the URL of the storage account and the URL of the storage account resolves to a private IP address rather than a public address.

Network Security Group (NSG): Deploy an NSG to limit inbound and outbound access of the VNET where the virtual machine runs. This can prevent data exfiltration. However, an NSG works only with IP addresses or tags, not with URLs. For more advanced data exfiltration protection, use an Azure Firewall. For simplicity, the article omits this and uses NSG to block outbound traffic.

In the next chapter, service endpoints and private endpoints are discussed.

The chapter begins by exploring the scenario of unrestricted network access. Then the details of service endpoints and private endpoints are discussed with practical examples.

3.1 Not limiting network access — public access enabled

Suppose the following scenario in which a virtual machine and a storage account is created. The firewall of the storage account has public access enabled, see image below.

3.1.1 virtual machine and storage account with public access created

Using this configuration, a the virtual machine can access the storage account over the network. Since the virtual machine is also deployed in Azure, traffic will go over Azure Backbone and will be accepted, see image below.

3.1.2 Traffic not blocked — public network access enabled

Enterprises typically establish firewall rules to limit network access. This involves disabling public access or allowing only selected networks and whitelisting specific ones. The image below illustrates public access being disabled and traffic being blocked by the firewall.

3.1.3 Traffic blocked — blocking traffic in storage account firewall

In the next paragraph, service endpoints and selected network firewall rules are used to grant network access to storage account again.

3.2 Limiting network access via Service endpoints

To enable virtual machine VNET access to the storage account, activate the service endpoint on the VNET. Use Microsoft.Storage for within the regions or Microsoft.Storage.Global for cross region. Next, whitelist the VNET/subnet in the storage account firewall. Traffic is then blocked again, see also image below.

3.2.1 Traffic not blocked — service endpoint enabled and added to in storage account firewall

Traffic is now accepted. When VNET/subnet is removed from Azure storage account firewall or public access is disabled, then traffic is blocked again.

In case an NSG is used to block public outbound IPs in the VNET of the virtual machine, then traffic is also blocked again. This is because the public DNS of the storage account is used, see also image below.

3.2.2 Traffic blocked — NSG of virtual machine blocking public outbound traffic

In that case, private endpoints shall be used to make sure that traffic does not leave VNET. This is discussed in the next chapter.

3.3 Limiting access via Private endpoints

To reestablish network access for the virtual machine to the storage account, use a private endpoint. This action creates a network interface card (NIC) for the storage account within the VNET of the virtual machine, ensuring that traffic remains within the VNET. The image below provides further illustration.

3.3.1 Traffic not blocked — Private endpoint created to Storage account, public access disabled

Again, an NSG can be used again to block all traffic, see image below.

3.3.2 Traffic blocked — NSG of virtual machine blocking all outbound traffic

This is however counterintuitive, since first a private endpoint is created in the VNET and then traffic is blocked by NSG in the same VNET.

Enterprise always requires network rules in place to limit network access to their storage account. In this blog post, both service endpoints and private endpoint are considered to limit access.

Both is true for service endpoints and private endpoints:

For service endpoints, the following hold:

  • Requires to enable service endpoints on VNET/subnet and whitelisting of VNET/subnet in Azure storage account firewall.
  • Requires that traffic leaves the VNET of the virtual machine that is connecting to the storage account. See above, the traffic stays on the Azure backbone.

For private endpoints, the following hold:

  • Public access can be disabled in the Azure Storage firewall. See above, public DNS entry of storage account will remain.
  • Traffic does not leave the VNET in which the virtual machine also runs.

There are a lot of other things to consider whether to use service endpoints or private endpoints (costs, migration effort since service endpoints have been out there longer than private endpoints, networking complexity when using private endpoints, limited service endpoint support of newer Azure services, hard limit of number private endpoints in storage account of 200).

However, in case it is required (“must have”) that 1) traffic shall never leave VNET/subnet of virtual machine or 2) it is not allowed to create firewall rules in Azure storage firewall and must be locked down, then service endpoint is not feasible.

In other scenarios, it’s possible to consider both solutions, and the best fit should be determined based on the specific requirements of each scenario.