Introduction
It is good to get grounded in how to onboard individual servers ad hoc, but for most organisations you will want to be able to onboard VMs at scale.
The portal can generate scripts to onboard multiple VMs using service principals. You can then upload the generated scripts to the target machines and execute them with elevated privileges. That is the aim of this challenge.
Terraform and Ansible
“Why are we having to learn Terraform and Ansible?" 😡
OK, yes, there is a little more Terraform in this lab - to spin up more “on prem” VMs for you to onboard.
And we will be using Ansible as the mechanism to automate the copy and script execution.
But don’t get hung up on the choice of tooling. The key thing to remember is that for you to experience onboarding at scale we obviously need something for you to onboard. If you had VMs on a test ESXi cluster somewhere then you could have used those instead. And you need something to enable you to copy those scripts and execute them. Ansible is just a means to an end in this challenge, but you could just as easily be using something else such as VMware PowerCLI.
The Azure Arc jumpstart site has information on a few alternative scale onboarding scenarios such as VMware ESXi hosted VMs or VMs in other clouds.
So in summary we will:
- prepare
- use Terraform to provision additional VMs
- check Ansible is working OK
- scale onboarding challenge
- create a service principal
- create a role assignment
- generate the scale onboarding scripts
- copy the scripts to the VMs
- execute the scripts on the VMs
- remediate any non-compliant resources
Again, this isn’t a test of your Terraform and Ansible skills so we’ll just give you those commands.
Let’s start!
Create more VMs with Terraform
Go back to the root module of your cloned Terraform repo.
-
Edit the terraform.tfvars to create more VMs
resource_group_name = "arc-hack" linux_count = 5 linux_prefix = "ubuntu" windows_count = 5 windows_prefix = "win" create_ansible_hosts = true
The last variable will generate an Ansible compliant hosts file.
Apply the new config.
Hint: If the apply fails then rerun
terraform apply
and Terraform should retry creating the failed resources.
If you cat hosts
then you should see a file similar to the one below.
Test Ansible
Why are we using Ansible? As per the introduction, it is simply to have a method to automate the upload and execution of scripts on multiple servers. It is quick and easy to use on Azure. This challenge is not a test of your Ansible skills so if you get stuck then reach out to your proctor.
If you are using Cloud Shell then you will already have the ansible binary installed available. If you are using Windows Subsystem for Linux (WSL) or MacOS or linux then you should have installed and configured Ansible as per the prereqs.
-
Export environment variables
export ANSIBLE_HOST_KEY_CHECKING=false export ANSIBLE_INVENTORY="~/arc-onprem-servers/hosts"
The first will save having to confirm the SSH keys for each of the linux VMs. The second uses the newly created inventory file, so you don’t have to use the
-i
switch with ansible all the time. -
Check Ansible can contact the VMs
ansible arc_hack_linux_vms -m ping ansible arc_hack_windows_vms -m win_ping
Both commands should show green success output for each VM, and the ping command should return a pong. If so then you’re now set for the scale onboarding challenge itself.
Checkpoint
Let’s check in and make sure you’re at the right point before you continue:
- you have ten VMs in your arc-hack-resources resource group
- only two have been onboarded into the arc-hack resource group
- Ansible is working and you can run modules on the VMs
If you have ticked all of the boxes then it is time to start on the scale onboarding challenge itself.
Service principal
-
Create a service principal
We’ll use the Azure CLI to create one called
http://arc-<uniq>
, where <uniq> is the eight character hash used in your FQDNs to make them globally unique.role="Azure Connected Machine Onboarding" scope=$(az group show --name arc-hack --query id --output tsv) uniq=$(cd ~/arc-onprem-servers; terraform output --raw uniq) name="http://arc-$uniq" az ad sp create-for-rbac --name $name --role "$role" --scope $scope
The Azure Connected Machine Onboarding role is all you need to onboard VMs to Azure Arc.
-
Copy the value of the password field
You’ll need this later when editing the scripts.
Scale onboarding scripts
-
Use the portal to generate scripts to onboard multiple servers
-
For both windows and linux
-
Include an additional tag - application: arc hack
-
Select the service principal you’ve just created
Note that the portal only shows service principals with the right role.
-
-
Move the scripts to your bash home directory
-
Cloud Shell: use the upload icon in the browser
-
Windows Subsystem for Linux: copy files from Downloads using
cp /mnt/c/Users/username/Downloads/filename ~
The tilde (
~
) symbol is expanded to your home directory in Bash.
-
-
Edit the files
Paste in the service principal’s password.
Copy and execute
Here are the ansible commands to copy and execute the scripts.
-
Export environment variables
You should have these from earlier in this challenge.
export ANSIBLE_HOST_KEY_CHECKING=false export ANSIBLE_INVENTORY="~/arc-onprem-servers/hosts"
-
copy the scripts up to the servers
ansible arc_hack_linux_vms -m copy -a 'src=OnboardingScript.sh dest=/tmp/ owner=root mode=0755' --become ansible arc_hack_windows_vms -m win_copy -a 'src=OnboardingScript.ps1 dest=C:\\'
You should see yellow output as Ansible is making a change.
-
execute the scripts as an elevated user
ansible arc_hack_linux_vms -a '/bin/bash /tmp/OnboardingScript.sh' --become ansible arc_hack_windows_vms -m win_shell -a 'C:\\OnboardingScript.ps1' --become --become-method runas --become-user System
More yellow output, but expect a slight pause in these as they initially download the azcmagent to each VM.
Policy
Check the activity log for the recent management plane activity. The Write Azure Arc machines entries should show
- the main Azure Arc entry
- a modify for the two tags
- a deployIfNotExists policy action
Check the arc-hack resource group
- modify the columns to include the individual tags
- the compliancy status for the tags and for the extensions
Note: Ignore the untagged VM extensions. These are mistakenly set as indexed and will be corrected.
The new VMs you onboarded at scale should be showing as non-compliant as they have been processed by the policy engine. They won’t automatically remediate the extensions, although there are plans for this to be introduced.
Remediation
In the last challenge you triggered a policy evaluation. An evaluation normally takes around 30 minutes. The existing resources - the first two VMs - that do not have extensions should now show as non-compliant. And now the new ones are also showing as noncompliant for the extension.
For safety, existing non-compliant resources will not be automatically remediated.
You can trigger remediations via the Policy area in the portal, or you can use Azure CLI commands. For example:
policyAssignmentName=$(az policy assignment list --resource-group arc-hack --query "[?displayName == 'Enable Azure Monitor for VMs'].name" --output tsv)
az policy remediation create --name loglin --policy-assignment $policyAssignmentName --definition-reference-id LogAnalyticsExtension_Linux_HybridVM_Deploy --resource-group arc-hack --resource-type Microsoft.HybridCompute/machines
az policy remediation create --name logwin --policy-assignment $policyAssignmentName --definition-reference-id LogAnalyticsExtension_Windows_HybridVM_Deploy --resource-group arc-hack --resource-type Microsoft.HybridCompute/machines
az policy remediation create --name deplin --policy-assignment $policyAssignmentName --definition-reference-id DependencyAgentExtension_Linux_HybridVM_Deploy --resource-group arc-hack --resource-type Microsoft.HybridCompute/machines
az policy remediation create --name depwin --policy-assignment $policyAssignmentName --definition-reference-id DependencyAgentExtension_Windows_HybridVM_Deploy --resource-group arc-hack --resource-type Microsoft.HybridCompute/machines
The default resourceDiscoveryMode for these commands is ExistingNonCompliant
, but you can also use the --resource-discovery-mode ReEvaluateCompliance
switch to force a policy evaluation.
Resource Graph query
Upgrade the Resource Graph query:
- include each of the tags as individual fields, with the field descriptions renamed to something shorter
- add a filter to only show the linux servers
Run the resource graph query as a CLI command to generate JSON.
Stretch Targets
If you found that too simple then here are some tricker objectives.
Use the CLI to:
- list the policy assignments at a scope
- list the policy definitions in an initiative
- show only the policy definition IDs and policy definition reference IDs
Tougher one combining filter and query:
- list the policy state for non-compliant resources
- only for the VM Monitor Agents initiative
- Azure Arc VMs only
- output a tsv of the policy definition reference ID and policy assignment name
(This could be used to automate the remediation task deployment.)
Success criteria
Screen share with your proctor to show that you achieved:
- Onboarding all servers
- CLI commands to initiate the remediation tasks
- Resource Graph Explorer report
- JSON query output
Resources
Help us improve
Azure Citadel is a community site built on GitHub, please contribute and send a pull request
Make a change