Understanding Azure Reserved Virtual Machine Instances

One of the main benefits of Azure’s billing model is that it offers per minute billing. This means that if you have an application/service/environment that isn’t required 24/7 you can reduce your costs by using Automation so that you will only pay for what you consume.

However, if your environment requires you run a VM constantly, the cost can start to mount up. To help alleviate this, Microsoft offer a solution in the form of long-term fixed price Virtual Machine instances.

These Reserved Instances (RI) help save money by allowing you to pre-pay for a one-year or three-year VM size. The fact that you pay up front, allows you to make significant savings on the Pay-As-You-Go pricing.

RIexample

The most common subscription offers have the ability to purchase RIs, but there are some restrictions in terms of how it is approached. The options are the below:

  • Enterprise agreement subscriptions. To purchase reservations in an enterprise enrollment, the enterprise administrator must enable reservation purchases in the EA portal.
  • Pay-As-You-Go but you must have the “Owner” role on the subscription to buy a reservation.
  • Cloud Solution Provider subscriptions. However, the providing partner must make the purchase on behalf of the customer.

Once purchased, the discount is then applied to the resource usage that matches up with the RI capacity purchased. For example, if you purchase a one-year RI for a DS4v3 size VM, and you are using a DS4v3 the discount will apply against that usage.

A good strategy is to determine the sizing before purchasing the RI. So my advice would be to run your VMs without an RI for a few months to ensure your sizing is suitable and therefore correct. However, if this is something that is proving difficult, there is a range of flexibility offered within your RI scope.

With instance size flexibility, you don’t have to deploy the exact same VM size to get the benefit of your purchased Azure Reserved Instances (RI) as other VM sizes within the same VM group also get the RI discount. As a rough example, see the below table from the Microsoft announcement.

VM name VM group Ratios

Standard_D2s_v3

DSv3 Series

1

Standard_D4s_v3

DSv3 Series

2

Standard_D8s_v3

DSv3 Series

4

Standard_D16s_v3

DSv3 Series

8

Standard_D32s_v3

DSv3 Series

16

Standard_D64s_v3

DSv3 Series

32

This means that if you buy an RI for a D2sV3, it would cover half of an D4sV3 instance etc. More on how this can be applied and options available to you are here.

In general, I think an RI purchase is something that most deployments should be taking advantage of. Once sized correctly and with the ability to leverage flexibility, there are huge savings to be made with relatively low amounts of administrative effort.

More on how to buy an RI here

More on how the discount is applied here

 

What is Azure B-Series Compute?

images

If you’ve provisioned a Virtual Machine (VM) via the Azure Portal in recent months, I am sure you have noticed a new series of VM that sits at a cheaper price point to some of the regular alternatives. This is the B-series and it has some special features.

Immediately you would think based on the specifications presented that this series was a relative bargain in comparison to the D series. Take a look at the table below to see a comparison of prices presented for B and Dv3 series VMs: (NOTE: These are from an EA subscription, prices are relative)

pricecompareBvD
B-Series vs Dv3 Series VMs

Comparing a B2ms and a D2s_V3, there is a saving of approximately ten euro per month. You can see they have the same amount of vCPU and RAM. Which is the most common deciding factor when sizing a VM. However, if you look closer you’ll notice the B series actually has more Max IOPS, how or why is that possible? (Read a previous post on iops etc here)

The B-series VMs are designed to offer “burstable” performance. They leverage flexible CPU usage, suitable for workloads that will run for a long time using a small fraction of the CPU performance possible and then spike to needing the full power of the CPU due to incoming traffic or required work.

This burst isn’t unlimited though. While B-Series VMs are running in the low-points and not fully utilizing the baseline performance of the CPU, your VM instance builds up credits. When the VM has accumulated enough credit, you can burst your usage, up to 100% of the vCPU for the period of time when your application requires the higher CPU performance.

Here is a great example from Microsoft Docs of how credits are accumulated and spent.

I deploy a VM using the B1ms size for my application. This size allows my application to use up to 20% of a vCPU as my baseline, which is .2 credits per minute I can use or bank.

My application is busy at the beginning and end of my employees work day, between 7:00-9:00 AM and 4:00 – 6:00PM. During the other 20 hours of the day, my application is typically at idle, only using 10% of the vCPU. For the non-peak hours I earn 0.2 credits per minute but only consume 0.l credits per minute, so my VM will bank .1 x 60 = 6 credits per hour. For the 20 hours that I am off-peak, I will bank 120 credits.

During peak hours my application averages 60% vCPU utilization, I still earn 0.2 credits per minute but I consume 0.6 credits per minute, for a net cost of .4 credits a minute or .4 x 60 = 24 credits per hour. I have 4 hours per day of peak usage, so it costs 4 x 24 = 96 credits for my peak usage.

If I take the 120 credits I earned off-peak and subtract the 96 credits I used for my peak times, I bank an additional 24 credits per day that I can use for other bursts of activity.”

So, there was quite a bit of maths there, what are the important points?

  • Baseline vCPU performance. This dictates your earn/spend threshold. There is a chart here.
  • Peak utilisation consumption. If this is not allowing you to bank credits, you will eventually end up in a situation where you cannot burst. Size up your VM.
  • Automation doesn’t work here, you only earn credits when the VM is allocated.

Overall I think the B-series is a good option, but only in specific scenarios. A different way of thinking is required to size and manage your application. If you get it right, you can make some savings on running a standard VM.

There is a Q&A on some common topics here.

Azure App Service and Windows Containers

Containerisation of applications is something that is becoming more and more common. Allowing developers to “wrap” all requirements into an individual element which the infrastructure team can then deploy where resources are available opens a door to the most modern options in application deployment and management.

Enter Azure App Service, which for years now has been removing the need for an infrastructure management layer and allowing teams to focus on deployment and performance. Traditionally, you had to deploy your apps within the allowed parameters of your App Service Plan (ASP). However, you can now run containers as part of this platform.

Combine this with a Container Registry, such as Azure Container Registry and you can deploy images within minutes. These images can then be scaled within your ASP to meet demand and can be updated as required using your current CI/CD processes.

This had been limited to Linux based containers, but Microsoft have recently announced a public preview of the ability to run Windows containers within your ASP. This is targeted towards customers interested in migrating .NET applications to Azure, and hoping to avail of a PaaS service to get the many productivity benefits such as high availability within and across Azure regions. This can also increase application redundancy options by using integrated backup/restore and app cloning options.

WebAppForContainers
Example deployment scenario

The preview capabilities are appropriate for testing and POC environments, but there are of course some limitations and preview deployments are not recommended for production workloads in any scenario.

Within the preview the following is supported:

  • Deploy containerized applications using Docker Hub, Azure Container Registry, or private registries.
  • Incrementally deploy apps into production with deployment slots and slot swaps.
  • Scale out automatically with auto-scale.
  • Enable application logs and use the App Service Log Streaming feature to see logs from your application.
  • Use PowerShell and Win-RM to remotely connect directly into your containers.

For a quick start/how-to see the following link.

First Impressions – Azure Firewall Preview

Recently Microsoft announced that a new Azure Firewall service was entering a managed public preview. Azure Firewall is a managed, network security service that protects your Azure Virtual Network resources. It is a fully stateful firewall as a service with built-in high availability and scalability.

firewall-overview.png

The services uses a static public IP meaning that your outbound traffic can be identified by third party services as/if required. Worth nothing, that only outbound rules are active within this preview. Inbound filtering will hopefully be available by GA.

The following capabilities are all available as part of the preview:

  • Stateful firewall as a Service
  • Built-in high availability with unrestricted cloud scalability
  • FQDN filtering
  • Network traffic filtering rules
  • Outbound SNAT support
  • Centrally create, enforce, and log application and network connectivity policies across Azure subscriptions and VNETs
  • Fully integrated with Azure Monitor for logging and analytics

As with all previews it should not be used for production environments, but for testing purposes this is how to register your tenant for deployment.

To enable the Azure Firewall public preview follow the guide here: Enabling the preview

Once enabled, follow this tutorial for a sample implementation: Deployment Tutorial

Now that you’re familiar with the deployment, you should apply to your specific test scenarios. Be wary of some operations that could be limited by applying a default route to your VM. There is an updated FAQ for the service here: Azure Firewall FAQ

Overall, this is a welcome addition to Azure networking. As the preview progresses and more service options are added, especially inbound options, I see this being as common as deploying an NSG in your environment. Combining it with peering and the right set of rule collections for your environment allows for an easily managed, scalable, and most importantly, secure environment within Azure with minimal cost and infrastructure footprint.

New Azure Certifications

As many of you are probably aware, there is already a Microsoft certification path for those looking for Azure skill set recognition. Recently at the Microsoft Inspire partner conference, Microsoft announced that this would be replaced.

There will now be three roles defined with new qualification to earn.

  1. Azure Adminitstrator
  2. Azure Developer
  3. Azure Solutions Architect

new certs

These new roles and certifications are an attempt to better line-up with industry demands and standards.

overall certs

As with most current MCSA qualifications, it will take two exams to earn a certification. The Beta exams for Azure Administrator are live right now. AZ100 and AZ101 test a broad range of skills across the Azure sphere including, Compute, Networking, Identity, Storage among others.

If you already have the previous exam, 70-533, you can take a transition exam to earn an Azure Administrator certification – AZ102. Again, this is in Beta so places are limited.

timeline

The time frame for the remaining roles and certifications was also shown, so expect a follow-up post on this later this year.

Also, thanks to Microsoft MVP Thomas Maurer, here is a link to discount codes should you wish to book any of the Beta exams. The link is the Microsoft Learning blog, so I will paste it here in full in case anyone is worried about dodgy code pages! 🙂

https://www.microsoft.com/en-us/learning/community-blog-post.aspx?BlogId=8&Id=375147

If you would like to watch the announcement in full, you can do so here – https://myinspire.microsoft.com/videos/fb7c3db2-1c65-4a69-aceb-fd06c19bf971

Securing Azure PaaS

When considering Azure as a platform, part of the conversation should revolve around transformation. That is, how do we transform our approach from what is viewed as traditional to something more modern. Often this could lead to redesigning how your application/service is deployed, but with some workflows, a simple change from IaaS to PaaS is viewed as a quick win.

This change isn’t suitable in all scenarios, but depending on your specific requirement it could allow for greater resiliency, a reduction in costs, and a simpler administration requirement. One service that is often considered is SQL. Azure has its own PaaS SQL offering which removes the need for you to manage the underlying infrastructure. That alone makes the transformation a worthy consideration.

However, what isn’t often immediately apparent to some administrators is that PaaS offerings are, by their nature, public facing. For Azure SQL to be as resilient as possible and scale responsively, it sits behind a public FQDN. Therefore, how this FQDN is secured must be taken into consideration as a priority to ensure your data is protected appropriately.

Thankfully, Azure SQL comes with a built in firewall service. Initially, all Transact-SQL access to your Azure SQL server is blocked by the firewall. To allow traffic, you must specify one or more server-level firewall rules that enable access. The firewall rules specify which IP address ranges from the Internet are allowed. There is also the ability to choose whether Azure applications can connect to your Azure SQL server.

The ability to grant access to just one of the databases within your Azure SQL server is also possible. You simply create a database-level rule for the required database. However, while this limits the traffic to specific IP ranges, the traffic still flows via the internet.

To communicate with Azure SQL privately, you will first need an Azure V-Net. Once in place, you must enable the service endpoint for Azure SQL, see here. This will allow communication directly between listed subnets within your v-net and Azure SQL via the Azure backbone. This traffic is more secure and possibly faster than via the internet.

Once your endpoint is enabled, you can then create a v-net firewall rule on Azure SQL for the subnet which had a service endpoint enabled. All endpoints within the subnet will have access to all databases. You can repeat these steps to add additional subnets. If adding your v-net replaces the previous IP rules, remember to remove them from your Azure SQL firewall rules.

Also worth noting is the option for “Allow all Azure Services”, the presumption here is that this somehow would only access from Azure Services within your subscription, but this is not the case. It means every single Azure service in all subscriptions, even mine! My recommendation is to avoid this whenever possible, however, there are some cases where this required and this access should be noted as a risk.

More on Azure SQL Firewall – https://docs.microsoft.com/en-us/azure/sql-database/sql-database-firewall-configure

More on Azure SQL with V-Nets – https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vnet-service-endpoint-rule-overview

 

Azure IAAS Disaster Recovery

The ability to recover your IAAS VMs in Azure to a different region has been a logical requirement within Azure for quite some time. Microsoft made the feature available in preview last year and this week have made it GA.

Azure DR allows you to recover your IAAS VMs in a different Azure region should their initial region become unavailable. For example, you run your workloads in North Europe, the region experiences significant downtime, you are now able to recover your workloads in West Europe.

In this post I will go through setting up an individual VM to replicate from North Europe to West. However, it’s worth pointing out that DR should be a business discussion, not just technical. All scenarios that could occur, within reason, should be discussed to decide whether DR is warranted. For example, if your business entirely relies on your premises for production, if you lose the premises, you don’t need DR as there is no production capability regardless of system recovery etc. The idea is to scope what DR actually means for your business and remember, DR is only valid if it is tested!

Enabling DR for a VM is straight forward. Open your VM blade and scroll down to Operations, you will see an option for Disaster Recovery

DRvmBlade

You select a Target Region that must be different from your current region, you can then choose the default settings for a POC. In my screen shot, I have created a Resource Group and Recovery Services Vault in WE already so will use those. Once submitted, replication for your VM will be enabled. You can then view the configured options:

DRvmSettings.PNG

And that’s it! Once synchronisation completes, you now have your VM protected in a different region. However, for it to be valid, you need to design and confirm your Recovery Plan then complete both a Test Failover and Complete Failover and Failback.

More reading on the overall concept and Azure-Azure DR specifics here.

Optimising Azure Disk Performance

When deploying a VM, there are several aspects of configuration to consider to ensure you are achieving the best possible performance for your application.  The most common are vCPU and RAM, however, I recommend equal consideration also be given to your disks.

In Azure, the are multiple options available when provisioning a disk for a VM. The recommendation from Microsoft is to use Managed Disks and depending on performance required choose either Standard or Premium tier. As this post is about performance, I am going to discuss Premium tier Managed Disks (PMD).

For most applications/services, if you choose the closest sizing for the VM and provision the disk as PMD, this will more than meet your requirements. However, should you have heavy read/write requirements, you may need to maximise your disk performance to ensure you are also maximising your other resources.

When talking about disk performance there will be references to disk IOPS and throughput, it is important to have an understanding of both concepts.

IOPS is the number of requests that your application is sending to the storage disks in one second. An input/output operation could be read or write, sequential or random.

Throughput or Bandwidth is the amount of data that your application is sending to the storage disks in a specified interval. If your application is performing input/output operations with large IO unit sizes, it requires high Throughput.

How actual disk performance is calculated for your VM is slightly complex. There are several variables that effect what performance is available, achievable and when you should expect throttling to occur. The key aspects however are:

  1. Virtual Machine Size
  2. Managed Disk Size

As you move up VM sizes, you don’t just increase the amount of vCPU and RAM available, you also increase the allocated IOPS and Throughput. In the following examples I’ll be comparing a DS12v2 and a DS13v2, below are their listed specifications:

 

Size vCPU Memory: GiB Temp storage (SSD) GiB Max data disks Max cached and temp storage throughput: IOPS / MBps Max uncached disk throughput: IOPS / MBps
Standard_DS12_v2 4 28 56 16 16,000 / 128 (144) 12,800 / 192
Standard_DS13_v2 8 56 112 32 32,000 / 256 (288) 25,600 / 384

As you can see there are significant differences across the board in terms of specifications, however, we’ll focus on the final two columns which are disk related. There are two channels of performance for disks attached to VMs. This relates to whether you choose to enabling caching on your disks. For the OS disk, Microsoft recommends read/write cache and enables it by default however for data disks, the choice is up to you. IOPS are higher when caching but throughput is lower, so it will be application/service dependent.

Similarly, as you move up PMD sizes, you increase the IOPS and Throughput available. See below for table highlighting this:

Premium Disks Type P4 P6 P10 P20 P30 P40 P50
Disk size 32 GB 64 GB 128 GB 512 GB 1024 GB (1 TB) 2048 GB (2 TB) 4095 GB (4 TB)
IOPS per disk 120 240 500 2300 5000 7500 7500
Throughput per disk 25 MB per second 50 MB per second 100 MB per second 150 MB per second 200 MB per second 250 MB per second 250 MB per second

You can see that Throughput will max out at 250mb however, on a DS13v2 if uncached you can get a maximum of 384mb. A similar restriction applies with IOPS. To achieve performance above the listed for PMD and in line with your VM size, you need to combine the disks.

For example, two P20s will give you 1TB of storage and roughly 300mb Throughput in comparison to a P30 which gives same storage but only 200mb Throughput. Obviously there is a cost consideration, but performance may justify this. To achieve this disk combination, it is best to use Storage Spaces in Windows to perform disk striping. More on that here – (Server 2016 – https://docs.microsoft.com/en-us/windows-server/storage/storage-spaces/deploy-storage-spaces-direct).

As a demo I completed the above and below are captures of testing throughput for the same scenario on a DS13v2. Firstly is our striped virtual disk. Same storage, maximum throughput.

DiskStripe

On the same VM, our single disk, same storage available, lower throughput.

DiskSimple

A simple change to how your storage is attached produces a 50% increase in performance.

While not always applicable, changes such as the above could prove vital when considering how to size a VM for a database and at the same time ensure you are maximisng that cost/performance ratio.

More links on sizing etc.

Premium Storage Performance
VM Sizes

 

Resource Locks and Policies

When considering production workloads for your Azure environment there are some simple features that ensure the safety of your workloads that are being overlooked. The features I’m referring to are Resource Locks and Resource Manager Policies (RMPs).

Both features allow you greater control over your environment with minimal administrative effort. In my opinion, regardless of whether you are running production workloads or not, you should at the very least be using Locks and RMPs as a preventative method of control over your deployments.

Locks are a very simple and quick tool that can prevent changes to your environment in an instant. They can be applied at different tiers of your environment. Depending on your governance model, you might want to apply at the subscription, resource group or resource level, all are possible. Locks have only two basic operations:

  • CanNotDelete means authorized users can still read and modify a resource, but they can’t delete the resource.
  • ReadOnly means authorised users can read a resource, but they can’t delete or update the resource. Applying this lock is similar to restricting all authorised users to the permissions granted by the Reader role.

Locks obey inheritance, so if you apply at resource group level, all resources contained within will receive the applied lock, the same is true for subscription level assignments.

Of the built-in roles, only Owner and User Access Administrator are granted the ability to apply and remove locks. In general, my recommendation is that all production resources are assigned a CanNotDelete lock. Environments such as UAT where performance etc is being monitored are more suited to a ReadOnly lock to ensure consistent environment results.

RMPs can be used individually or in conjunction with Locks to ensure even more granular control of your environment. RMPs define what you can and cannot do with your environment. For example, all resources created must be located in the European datacentres, or, all resources created must have a defined set of tags applied.

In terms of scope, RMPs can be applied exactly the same as Locks and also obey inheritance. A common scenario here is to apply a policy at subscription level to specify your allowed datacentres, then if you have a traditional IT Resource Group design, specify policies at RG level allowing only specific VM sizes for dev/test to manage cost.

There are many combinations that can be put to use to allow you greater control of your environment. At the end of the day, Azure allows for huge flexibility by design, but it is important for many companies for both security and cost management reasons to be able to exercise a degree of control over that flexibility.

A little tip if you are using both features, make sure you apply a CanNotDelete Lock to your important RMPs!

Windows Update Management

Update management is a necessary evil in the IT world. Some admins enjoy “Patch Tuesday” and for some it’s the most dreaded day of the month. Microsoft have made strides in relieving the stress that can be associated with patching certain core VMs but good management still requires a lot of administration.

Within Azure, every time you deploy a Windows server VM from a Marketplace image, you are getting the latest available patches, but what options do you have should those VMs need to run for an extended period of time? How do you keep them patched so they adhere to company security policies?

Traditionally linking Azure to your existing on-premise solution, or building a WSUS or SCCM implementation were options. Both of these obviously work well but for smaller sites could be considered cumbersome. Now, within Azure itself, making use of some platform objects that you may already be using, you can get a central console view of all of your machine updates.

The two requirements, outside of a VM to manage, are:

  1. Automation Account
  2. Log Analytics Workspace

Both of these implementations are basically free,  (see latest pricing details for limits etc.) and relatively easy to set up separately. However, part of the process for enabling update management can also set these up for you should that be required.

To enable update management for a single VM, open the VM blade and choose the Update Management button from the left action menu, it is part of the Operations section. This will run a validation operation to see if the feature is enabled and assess whether there are automation accounts and log analytics workspaces available. The validation process also checks to see if the VM is provisioned with the Microsoft Monitoring Agent (MMA) and Automation hybrid runbook worker. This agent is used to communicate with the VM and obtain information about the update status. This information is stored in the log analytics workspace.

Once you choose your current available options, or request to have new ones created, the solution takes roughly 15 minutes to enable. Once enabled, you will now see a management page, it will take some time for the live data to be collected from the server, but once that completes, this page will display information regarding the status of updates available/missing. You can click on individual updates for more information. You can also analyse the log search queries that run for checking updates, these can be modified to suit your environment if/as required.

Now that your management pane is displaying what updates are missing, you need to install them. You can schedule the installation of the updates you require from the same management pane. To install updates, schedule a deployment that follows your release schedule and service window. You can choose which update types to include in the deployment. For example, you can include critical or security updates and exclude update rollups. One thing to note that is important, if an update requires a restart the VM will reboot automatically.

The scheduling process is very simple. You choose a name for the deployment, the classification of updates you would like to install, your scheduled time to begin the process of installation and finally a maintenance window to ensure compliance with your defined service windows.

Once the scheduled deployment runs, you can then view its status. Again, this is via the Update Managment blade. This reports on all stages of the deployment from “In Progress” to “Partially Failed” etc. You can then troubleshoot any issues should they arise.

Overall, I really like this solution. It also scales, you can add several machines using the same automation etc. From the Automation Account, you can then access the Update Management blade and manage multiple enabled VMs at once, including scheduling mutli-VM deployments of updates.

While I haven’t covered it here, this solution also works with Linux distributions and can be integrated with SCCM.

More here:

Update Management Overview

Patching Linux

Manage multiple VMs

SCCM Integration