Exploring – Azure Firewall Analytics

Azure Firewall is ever growing in popularity as a choice when it comes to perimeter protection for Azure networking. The introduction of additional SKUs (Premium and Basic) since its launch have made it both more functional while also increasing its appeal to a broader environment footprint.

For anyone who has used Azure Firewall since the beginning, troubleshooting and analysis of your logs has always had a steep-ish learning curve. On one hand, the logs are stored in Log Analytics and you can query them using Kusto, so there is familiarity. However, without context, their formatting can be challenging. The good news is, this is being improved with the introduction of a new format.

Previously logs we stored using the Azure Diagnostics mode, with this update, we will now see the use of Resource-Specific mode. This is something that will become more common across many Azure resources, and you should see it appear for several in the Portal already.

Destination table in the Portal

What difference will this make for Azure Firewall? This will mean individual tables in the selected workspace are created for each category selected in the diagnostic setting. This offers the following improvements:

  • Makes it much easier to work with the data in log queries
  • Makes it easier to discover schemas and their structure
  • Improves performance across both ingestion latency and query times
  • Allows you to grant Azure RBAC rights on a specific table

For Azure Firewall, the new resource specific tables are below:

  • Network rule log – Contains all Network Rule log data. Each match between data plane and network rule creates a log entry with the data plane packet and the matched rule’s attributes.
  • NAT rule log – Contains all DNAT (Destination Network Address Translation) events log data. Each match between data plane and DNAT rule creates a log entry with the data plane packet and the matched rule’s attributes.
  • Application rule log – Contains all Application rule log data. Each match between data plane and Application rule creates a log entry with the data plane packet and the matched rule’s attributes.
  • Threat Intelligence log – Contains all Threat Intelligence events.
  • IDPS log – Contains all data plane packets that were matched with one or more IDPS signatures.
  • DNS proxy log – Contains all DNS Proxy events log data.
  • Internal FQDN resolve failure log – Contains all internal Firewall FQDN resolution requests that resulted in failure.
  • Application rule aggregation log – Contains aggregated Application rule log data for Policy Analytics.
  • Network rule aggregation log – Contains aggregated Network rule log data for Policy Analytics.
  • NAT rule aggregation log – Contains aggregated NAT rule log data for Policy Analytics.

So, let’s start with getting logs enabled on your Azure Firewall. You can’t query your logs if there are none! And Azure Firewall does not enable this by default. I’d generally recommend enabling logs as part of your build process and I have an example of that using Bicep over on Github, (note this is Diagnostics mode, I will update it for Resource mode soon!) However, if already built, let’s look at simply doing this via the Portal.

So on our Azure Firewall blade, head to the Monitoring section and choose “Diagnostic settings”

Azure Firewall blade

We’re then going to choose all our new resource specific log options

New resource specific log categories in Portal

Next, we choose to send to a workspace, and make sure to switch to Resource specific.

Workspace option with Resource option chosen in Portal

Finally, give your settings a name, I generally use my resource convention here, and click Save.

It takes a couple of minutes for logs to stream through, so while that happens, let’s look at what is available for analysis on Azure Firewall out-of-the-box – Metrics.

While there are not many entries available, what is there can be quite useful to see what sort of strain your Firewall is under.

Metrics dropdown for Azure Firewall

Hit counts are straight forward, they can give you an insight into how busy the service is. Data Processed and Throughput are also somewhat interesting from an analytics perspective. However, it is Health State and SNAT that are most useful in my opinion. These are metrics you should enable alerts against.

For example, an alert rule for SNAT utilisation reaching an average of NN% can be very useful to ensure scale is working and within limits for your service and configuration of IPs.

Ok, back to our newly enabled Resource logs. When you open the logs tab on your Firewall, if you haven’t disabled it, you should see a queries screen pop-up as below:

Sample log queries

You can see there are now two sections, one specifically for Resource Specific tables. If I simply run the following query:

AZFWNetworkRule

I get a structured and clear output:

NerworkRule query output

To get a comparative output using Diagnostics Table, I need to run a query similar to the below:

// Network rule log data 
// Parses the network rule log data. 
AzureDiagnostics
| where Category == "AzureFirewallNetworkRule"
| where OperationName == "AzureFirewallNatRuleLog" or OperationName == "AzureFirewallNetworkRuleLog"
//case 1: for records that look like this:
//PROTO request from IP:PORT to IP:PORT.
| parse msg_s with Protocol " request from " SourceIP ":" SourcePortInt:int " to " TargetIP ":" TargetPortInt:int *
//case 1a: for regular network rules
| parse kind=regex flags=U msg_s with * ". Action\\: " Action1a "\\."
//case 1b: for NAT rules
//TCP request from IP:PORT to IP:PORT was DNAT'ed to IP:PORT
| parse msg_s with * " was " Action1b:string " to " TranslatedDestination:string ":" TranslatedPort:int *
//Parse rule data if present
| parse msg_s with * ". Policy: " Policy ". Rule Collection Group: " RuleCollectionGroup "." *
| parse msg_s with * " Rule Collection: "  RuleCollection ". Rule: " Rule 
//case 2: for ICMP records
//ICMP request from 10.0.2.4 to 10.0.3.4. Action: Allow
| parse msg_s with Protocol2 " request from " SourceIP2 " to " TargetIP2 ". Action: " Action2
| extend
SourcePort = tostring(SourcePortInt),
TargetPort = tostring(TargetPortInt)
| extend 
    Action = case(Action1a == "", case(Action1b == "",Action2,Action1b), split(Action1a,".")[0]),
    Protocol = case(Protocol == "", Protocol2, Protocol),
    SourceIP = case(SourceIP == "", SourceIP2, SourceIP),
    TargetIP = case(TargetIP == "", TargetIP2, TargetIP),
    //ICMP records don't have port information
    SourcePort = case(SourcePort == "", "N/A", SourcePort),
    TargetPort = case(TargetPort == "", "N/A", TargetPort),
    //Regular network rules don't have a DNAT destination
    TranslatedDestination = case(TranslatedDestination == "", "N/A", TranslatedDestination), 
    TranslatedPort = case(isnull(TranslatedPort), "N/A", tostring(TranslatedPort)),
    //Rule information
    Policy = case(Policy == "", "N/A", Policy),
    RuleCollectionGroup = case(RuleCollectionGroup == "", "N/A", RuleCollectionGroup ),
    RuleCollection = case(RuleCollection == "", "N/A", RuleCollection ),
    Rule = case(Rule == "", "N/A", Rule)
| project TimeGenerated, msg_s, Protocol, SourceIP,SourcePort,TargetIP,TargetPort,Action, TranslatedDestination, TranslatedPort, Policy, RuleCollectionGroup, RuleCollection, Rule

Obviously, there is a large visual difference in complexity! But there are also all of the benefits as described earlier for Resource Specific. I really like the simplicity of the queries. I also like the more structured approach. For example, take a look at the set columns that are supplied on the Application Rule table. You can now predict, understand, and manipulate queries with more detail than ever before. You can check out all the new tables by searching “AZFW” on this page.

Finally, a nice sample query to get you started. One that I use quite often when checking on new services added, or if there are reports of access issues. The below gives you a quick glance into web traffic being blocked and can allow you to spot immediate issues.

AZFWApplicationRule
| where Action == "Deny"
| distinct Fqdn
| sort by Fqdn asc

As usual, if there are any questions, get in touch!

How To – Enable Azure Firewall Resource Specific Diagnostics

There is a new format of logs coming to Azure resources. Currently most people are familiar with what is called Diagnostics Table logs. The resource log for each Azure service has a unique set of columns. The AzureDiagnostics table includes the most common columns used by Azure services. If a resource log includes a column that doesn’t already exist in the AzureDiagnostics table, that column is added the first time that data is collected. If the maximum number of 500 columns is reached, data for any additional columns is added to a dynamic column.

Resource Specific logs however are platform logs that provide insight into operations that were performed within an Azure resource. The content of resource logs varies by the Azure service and resource type. Resource logs aren’t collected by default.

So onto enabling them. Via the Portal, this is straight forward in terms of choice and is well documented here. However, when I went to include this enablement in a Bicep build that I have, I noticed there wasn’t anything clearly documented. So, here is an example using Azure Firewall.

Normally, my diagnostics resource looks like the below and this enables Diagnostics table logs:

resource azfwDiags 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: '${afwName}-diags'
  scope: azFW
  properties: {
    logs: [
      {
        category: 'AzureFirewallApplicationRule'
        enabled: true
        retentionPolicy: {
          days: 90
          enabled: true
        }
      }
      {
        category: 'AzureFirewallNetworkRule'
        enabled: true
        retentionPolicy: {
          days: 90
          enabled: true
        }
      }
    ]
    workspaceId: log
  }
}

However, to enable Resource Specific, a few changes are required. Obviously the category names are different however you also need to include the Property – logAnalyticsDestinationType as you see below on line 5.

resource azfwDiags 'Microsoft.Insights/diagnosticSettings@2021-05-01-preview' = {
  name: '${afwName}-diags'
  scope: azFW
  properties: {
    logAnalyticsDestinationType: 'Dedicated'
    logs: [
      {
        category: 'AZFWApplicationRule'
        enabled: true
        retentionPolicy: {
          days: 90
          enabled: true
        }
      }
      {
        category: 'AZFWNetworkRule'
        enabled: true
        retentionPolicy: {
          days: 90
          enabled: true
        }
      }
    ]
    workspaceId: log
  }
}

Using the resource above within your Bicep code will allow you to deploy Resource Specific diagnostics settings as needed.

As usual, if there are any questions get in touch!

How to – Deploy Azure Firewall Premium using Bicep

With the launch of a premium SKU for Azure Firewall, many people became interested in both testing and migrating to the SKU for new features. However, the migration path requires that you stop/deallocate your Azure Firewall (AFW), modify the SKU, then start up again.

With the outage required, it’s nice to have an infrastructure-as-code solution to allow for quick testing in advance/parallel. It’s also a worthwhile piece of effort to have your rules and config documented, as these can be kept somewhat separate to your SKU requirements, especially rules.

This post will go through what is required to have Azure Firewall Premium ready to deploy using Bicep. The repo includes some bare minimums to complete testing, but you can modify as required by simply modifying and reusing the AFW module for your own environment. An example would be adding your own certificate for TLS inspection testing.

All of the code discussed in this post is located on Github in this repo.

So I have split out the test resources into several modules, which allows for better organisation, but also allows me to reuse code blocks when I want to. As I go, I am building up a set of bicep files I can reuse as needed on other environment with minimal changes.

Code directory structure

As this post is about Azure Firewall, lets’ start there – afw.bicep

afw.bicep in visualiser

As you can see, the Azure Firewall module is quite simple. The rest of the core network resources required to actually build an Azure Firewall are in network.bicep. This module can be viewed as a grouping for your AFW settings.

The Firewall Policy resource – azFWPol – is where some complexity comes into play, especially the differences for Premium. You may need to consider conditional deployments here, if you wanted your code to be flexible depending on your tier. For example, if Dev, deploy Standard SKU etc. Although SKU is set within the Firewall resource – azFW.

Now, you might be asking (if you have looked at the code!) but where are my rules? I have moved these out to a separate module to allow for simplicity of changes. Which means we will rarely have to modify the AFW module, and our edit risks are reduced.

The module – rules.bicep – simply contains a single rule collection, with a single rule. But the premise is this, using Bicep, you can control and document all rules as code, making operation much easier. Where it can become quite complex is where you have complex, large scale rule collection groups. If this is the case, you may consider splitting these out into their own individual modules. This will depend on your environment.

However, the beauty of having this setup, and one of the reasons behind this post, is so that you can quickly test things if/when required. This whole build consistently takes under ten minutes deploying to North Europe using Microsoft hosted agents

Pipeline deploy runtime example

And that is it, the repo contains all you need to deploy Azure Firewall Premium, and edit to your specific requirements. Good luck with your testing, and as always if there are any questions – just ask!

How to – Control your Network with Azure Policy

In many Azure environments, Network is core to the solution. As such, it is one of the most critical components to design correctly and with the required configuration relative to posture and security.

For example, you create a well architected network, with a focus on security and performance. It works well, but then changes occur outside of your original design – and it starts to come apart.

Azure Policy is one solution to ensure that this does not happen. For this specific topic, Azure Policy will focus on two areas:

  1. Enforce standards
  2. Systematically assess compliance

It can also action remediation using DINE or deploy-if-not-exists policies. That may be your preference, but for me that’s not something I like at the network layer 🙂

So first, my preference is to create a core policy set, derived from your core network design. On Azure Policy, this will be policy definitions grouped as an initiative. Where possible, it is efficient to use existing built-in definitions. However, if required you can create custom definitions and also include them in your initiative.

For this post, I have created a simple example initiative that includes several built-in policies based on some network and security requirements for an app architecture.

Within this set, you can see “Not allowed resource types”, in here I have included several public or connection focussed resources such as:

  • Public IP addresses
  • VNET Peerings
  • Local Network Gateways

The idea behind the above aligns with the purpose of Azure Policy, restrict and define the platform. This core initiative obviously has a focus on network type resources, but the same basic principal applies. Similar again in approach is the ability to allow exclusions where necessary. For example, allowing VNET peering in a network resource group. However, don’t forget/ignore RBAC, this should at minimum compliment your policy requirements.

While this example is quite simple, as I mentioned, you can layer complexity in via custom policies (creation tutorial here). One I have always liked is a policy to ensure that your subnets have the appropriate Route Table applied to ensure traffic is being directed appropriately, for example, to a firewall. The linked example is just an audit for subnets without ANY route table, but you can expand on this to include a specific route table.

{
  "properties": {
    "displayName": "AINE Route Table for subnets in vnets",
    "policyType": "Custom",
    "mode": "All",
    "description": "Custom policy to audit if a route table exists on appropriate subnets in all vnets",
    "metadata": {
      "version": "1.1.0",
      "category": "Network"
    },

    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "type",
            "equals": "Microsoft.Network/virtualNetworks/subnets"
          },
          {
            "field": "name",
            "notEquals": "AzureBastionSubnet"
          },
          {
            "field": "Microsoft.Network/virtualNetworks/subnets/routeTable.id",
            "exists": false
          }
        ]
      },
      "then": {
        "effect": "audit"
      }
    }
  }
}

The above is stored on a repo in Github for you to re-use if needed, and don’t forget if you are using VS code to install the Azure Policy extension to make life a bit simpler!

How to – Create an Outbound Static IP for App Services

Azure App Services are an excellent way to run all sorts of applications within Azure. Multiple tiers offer several sets of performance and functionality, generally meaning there is something to suit all requirements and budgets.

As these are PaaS there are a few specific network elements to consider and be aware of. One that I have seen become more common is the request to safelist an App Service. This request regularly requires IPs and this is when we start to run into an issue.

First up, your App Service will be assigned quite a few IPs. When you deploy an App Service Plan, it’s placed in a deployment unit, or you might see this referred to as a webspace. Each one of these is assigned a set of outbound IPs.

An example is below:

Many Outbound IPs assigned to an App Service

This generally causes concern with network/security teams as they are hoping to simply allow a small range, maybe a second for DR. Whereas this outbound range is actually the range for all App Service Plans deployed in that deployment unit, not just your App Service.

Next, you might make a change to your App Service that could invoke a change to these IPs. While you cannot simply change the deployment unit you are in, and therefore the IPs we discussed above, there are cases that it can happen

  • Delete an app and recreate it in a different resource group (deployment unit may change).
  • Delete the last app in a resource group and region combination and recreate it (deployment unit may change).
  • Scale your app between the lower tiers (Basic, Standard, and Premium), the PremiumV2, and the PremiumV3 tier (IP addresses may be added to or subtracted from the set).

Obviously the most common here is most likely scaling. And either this, or simply the requirement to avoid sharing a large, shared set justifies the ask.

Thankfully, the answer is relatively straight forward, but there are a few prerequisites. The two elements of configuration required are:

  • Regional Vnet Integration
  • NAT Gateway

However, vnet integration requires the use of one of the following App Service Plans – Standard, Premium, PremiumV2 and PremiumV3. If you do have one of those plans, or can justify the cost to add the feature, implementation is quite simple and well documented here.

Once integrated, our App Service now has the ability to reach resources within the vnet. However, what we want is to now manipulate the routing and the integration subnet to force traffic outbound via a NAT Gateway. There are a couple of steps, and one item to note before pulling the trigger.

First, the item of note, once this is done, access to Storage Accounts for the app will have to be via Private Endpoint or Service Endpoint, not difficult, just be aware of it.

In terms of configuration, we need to modify routing and enable a NAT Gateway. Modifying the routing is straight forward, it’s a tick box! However, I like to reverse the order here, build the NAT, ensure any ACLs, or allow lists are updated, then tick the box. Don’t forget any Storage Accounts!

For our NAT Gateway, when building, it should be the same region as your app and vnet. Choose either a Public IP or a Prefix, the choice is yours, and when creating, associate it to the subnet our app is integrated with, like the below:

For options on building a NAT Gateway, here is Portal driven and here is a Bicep option.

Now that we have our NAT Gateway and it is associated to our subnet, let’s modify the integration routing. As I said, it is simply a tick box, but it does change the entire flow of traffic, so this is your impact point. On the vnet integration blade, tick “Enabled” for Route All as below:

And that’s it. Your app will now use the outbound IP(s) configured on your NAT Gateway. The additional benefit is that this remains through regardless of what operations you perform on your App Service. Also, NAT Gateway can scale and handle multiple subnets within the same vnet, therefore allowing multiple integrations depending on your specific use case.