Options for Azure Migrations
There are two deployment models in Azure, the older being the Service Management model, aka classic mode. The newer model is Azure Resource Manager (ARM). For reasons that extend beyond this post, Microsoft is moving away from the classic mode and adopting ARM wherever possible. Up until a few months ago, the two models had not yet reached feature parity and so classic was still required for some deployments. At this point the two models are at feature parity, and in fact ARM has pulled ahead. That gap is only going to widen as Microsoft continues to pour investment into ARM and leave classic to die on the vine.
If you are looking into migrating your Azure classic virtual machines to ARM, you might be wondering what your options are. There are several potential solutions, Microsoft supported and otherwise. Each has a set of limitations and gotchas, and in this post I intend to review them and provide a guide for using Azure Site Recovery to get the job done.
The solutions I have considered are the following:
The Microsoft supported migration process allows you to migrate either an Azure VM that is not associated with a Vnet or an entire classic Vnet, including all the VMs on it. The nice part about their service is that it manipulates the management plane of Azure, without impacting the data plane. The data plane includes the storage, network, and actual running VMs, and since it is non-disruptive to the data plane, all of your VMs stay up during the migration process. Of course there are a bunch of caveats, for instance you cannot migrate your network gateway. So if you have site-to-site VPNs connecting to that Vnet, they are all going to be disconnected until you provision a new gateway in resource manager. Since the data plane is not changing, you cannot move those VMs to an existing Vnet or select a subset of VMs to move. This is an all or nothing migration. The first time I tried to use the service, the migration failed with a mysterious error. I posted in the forums, and Microsoft responded that the bug had been addressed. I retried and was successful, but it’s obvious this is very new code that still has some bugs to shake out. I think that this solution works best if you have not yet deployed anything into ARM, and you do not want to make any significant changes when moving over to ARM.
The community supported scripts break down in to a PowerShell module for migration (asm2arm), and a GUI tool called MigAz. Aidan Finn has done an excellent write up on MigAz, so please check that out that post, I won’t rehash it here. Either of the tools will clone the VM by creating a json deployment template and a script to copy the VHDs from a source storage account to a target storage account. In order to execute the full migration, the source VM will need to be stopped, then the VHDs are copied over, and the new VM is spun up. The VHD copy time is going to be pretty minimal, since you are copying within the same datacenter, but there will be downtime for the VM. The PowerShell module has some limitations as well. You cannot select a target virtual network, a target storage account, or place the storage account and network in separate resource groups. Again, that’s okay if you have nothing in ARM already, but if you are trying to migrate to an existing virtual network, then you’ve got some problems. I cloned the Github repo and updated the script to take arguments for all those items, so you can use my version of the script to migrate a classic VM to an existing virtual network and storage account.
Azure Site Recovery (ASR) is a service in Azure available in both classic and ARM mode. In classic, it targets Backup Vaults and in ARM it targets the Recovery Services vault. ASR uses scheduled replication to protect virtual and physical machines both on premise and in Azure. Machines have an agent installed which runs the replication process and sends the data to a process server, which in turn sends the rolled up replication to the vault. From the vault you can initiate an unplanned failover of the machine. In the plan for the unplanned failover, you specify the target virtual network, the target machine name, and target machine size. Before running the failover, the source machine needs to be powered down, so just like the community script, the migration process will require downtime. The ASR option natively supports more than just the basic classic to ARM migration. You can also do the following:
- Migrate to new Azure subscription
- Migrate to a different Azure region
- Migrate within an Azure region
- Migrate from on premise to Azure
In my opinion, if you are going to be migrating a significant number of VMs and some downtime is acceptable, then ASR is your best bet. The process is straightforward, supported by Microsoft, and has a lot more flexibility. If you would like to perform the migration, then the steps below will get you there.
Prior to performing the migration, you will need to provision a Recovery Services vault in the region where the VMs will be migrating to. Then you will need to stand up a management server in the region and virtual network that has the source VMs to be migrated.
Once the basic ASR infrastructure is in place, you can follow the steps below to migrate a VM.
Azure virtual machines are treated as physical machines because there is no access to the hypervisor. Prior to enabling replication, the target machine should have the Windows firewall configured to allow the File and Printer Sharing and WMI management as shown below:
The applet can be found by opening the Control Panel and selecting Windows Firewall, not the Advanced Firewall. Make sure to select each feature for all three profiles: Domain, Private, and Public. Additionally, the account that was added to the local configuration server for client installation should be added as a local administrator. This will enable the automatic push of the replication agent. The target machine should also not have any pending reboots. If a pending reboot exists, the agent installation will fail. If a manual installation is desired, it should occur prior to attempting to enable replication. Instructions for manual installation can be found here.
In the Recovery Services vault under Settings select Protected Items -> Replicated Items. Then click on the + Replicate button. This will start the Enable replication wizard.
Select the configuration server, a Machine type of Physical Machines, and a Process server, typically collocated on the Management Server. If the site has more than one process server, select the one with the least load or the one on the same LAN of the physical machine.
Fill out the values for the target virtual machine including the subscription, storage account, network, and subnet. The deployment model should be Resource Manager.
Note: The storage account selected will be used when the virtual machine is restored. It must be a general storage account and not a Blob specific account, and it must be located in the same region as the vault.
Select an existing physical machine or click the + Physical Machine button to add a new one. When adding a new physical machine, provide the Name, IP Address and OS Type.
Select which account should be used to replicate the machine. The account should have local administrator permissions to install the replication agent and perform the replication back to the process server.
Select which replication policy should be used to protect the machine.
Once all settings are complete, click on Enable replication to start the replication process.
The machine will now begin the replication process. If there is a pending reboot on the target machine, the agent installation will fail. Once the replication process has completed, the next step is to perform the actual migration.
Shut down the source machine and note the time; it will be used to compare against the latest synchronization time in the vault. In the Recovery Services vault under Settings, select Replicated items in the Protected Items section.
Confirm that the last data sync corresponds with the time the VM was shut down.
Note: It may take up to 15 minutes for the last data sync time to change.
In the Computer and Network section of the replicated item Settings, verify that the Name, Size, and Network properties are correct.
On the replicated item blade select Unplanned failover to initiate the migration. Choose a Recovery Point and uncheck the Shut down machine option. Physical machines and Azure virtual machines cannot be shut down by the configuration server.
Monitor the status of the Unplanned failover task until it completes.
Verify that the target virtual machine is running.
If desired, allocate a public IP address for the target virtual machine. In the settings for the virtual machine navigate to the network interface blade and then the IP addresses blade. Enable the Public IP address setting and then allocate a public IP address either by selecting an existing public IP address object or by creating a new one. When finished click the Save button to complete the operation.
Associate a Network security group with the target virtual machine. NSG settings do not migrate with the virtual machine. In the settings for the network interface, click on Network security group and select an existing NSG or create a new one.
Log into the target virtual machine and validate that applications and services are working as expected.
Complete the migration. In the Recovery Services vault open the replicated item that is being migrated. Click on the Complete Migration button and then confirm on the pop-up blade. This will remove the item from replication protection and commit the point-in-time recovery point on the target virtual machine.
Once the failover is completed, the source Azure virtual machine is removed from protection. That virtual machine can be deleted once the target virtual machine is confirmed and validated. If backup protection was configured for the source virtual machine, it will need to be removed and re-enabled on the migrated VM. The migrated VM can also be enabled for protection for the purposes of disaster recovery.
Although this procedure only covers a single virtual machine, recovery plans can be created to orchestrate bigger migrations. Within the recovery plan it is possible to at pre and post script to perform actions like setting up availability sets, allocating public IP addresses, or configuring load balancing.
The Bottom Line
Before taking my leave, a few words about cost. The cost of Azure Site Recovery by itself is based on the number of protected instances. For the first 31 days each protected instance is free. For the purposes of migrating a machine, the cost for ASR is negligible since the machine will not be protected long enough to incur charges. After the initial 31 days there is a standard cost of $54 per month for each protected instance. Actual pricing may vary.
In addition to the costs of protecting each instance in ASR, the cost of storage, storage transactions, and outbound data transfer must be taken into account. The vault storage is considered blob type and follows the pricing model depending on the storage replication type. The recommended replication type is LRS, which has the lowest cost per GB. The replication traffic from on-site to ASR would be considered all ingress traffic, and thus not charged. Replication traffic between Azure sites is charged based on the cost structure detailed here. Replication traffic within the same Vnet is not charged.
During disaster recovery testing, new virtual machines will be created. The charges associated with the test virtual machines follows the standard cost for Azure virtual machines. This includes the cost of additional storage for the test virtual machine VHD files. If the test virtual machines are not being used, they should be shut off to avoid incurring charges.