Fix Extension Certificate Errors on Your Azure Windows VM: A Troubleshooting Guide
Extension errors on Azure Windows Virtual Machines often stem from underlying issues related to certificates or cryptographic operations. This guide provides steps to identify and resolve common certificate-related problems encountered when using VM extensions. Understanding the role of certificates in securing extension configuration is crucial for effective troubleshooting.
Troubleshooting Checklist¶
When diagnosing extension certificate errors, examining the logs on the guest operating system is the primary step. These logs provide detailed information about the extension’s lifecycle and any failures encountered during processing, including decryption errors. Focusing on specific log files can quickly narrow down the potential causes of the issue.
View the Guest Logs¶
The following table highlights the most important log files on the Windows VM for troubleshooting extension certificate errors. These files contain crucial information about the Guest Agent’s operations and the execution status of individual extensions. Checking these logs should be the first action taken when an extension fails with a certificate-related error.
| Log | Description |
|---|---|
C:\WindowsAzure\Logs\WaAppAgent.log |
The primary guest agent log. It records details about the guest agent’s activities, including the state and results of extension operations like download, install, enable, and disable. |
Log files in the C:\WindowsAzure\Logs\Plugins\<ExtensionName> folder |
Specific logs generated by individual extensions. While logs vary by extension, common files include CommandExecution.log, timestamped execution logs, and handler-specific logs like CustomScriptHandler.log. |
Note: This table lists the most frequently used logs for this type of issue. Additional log files might exist depending on the specific extension and its configuration.
For comprehensive log collection, you can utilize the CollectGuestLogs.exe tool. This utility gathers all relevant guest logs into a single .zip archive file, simplifying the process of reviewing logs. You can find CollectGuestLogs.exe within one of these directories on the Windows VM: C:\WindowsAzure\Packages or C:\WindowsAzure\GuestAgent_<VersionNumber>_<Timestamp>. Running this tool can be particularly helpful when needing to analyze logs offline or share them for support purposes.
Symptoms¶
Several distinct error messages and behaviors indicate certificate-related issues with VM extensions. Recognizing these symptoms is key to applying the correct troubleshooting steps. These errors typically point to a problem with the transport certificate used by Azure to encrypt sensitive configuration settings passed to the extension.
One of the most frequent errors is the FailedToDecryptProtectedSettings exception. This error signifies that the necessary transport certificate, required by the Guest Agent to decrypt sensitive configuration data for the extension, is either missing from the VM’s certificate store or inaccessible. This prevents the extension from receiving its secure settings and initializing correctly.
A common variant of the FailedToDecryptProtectedSettings exception arises from incorrect permissions on the Crypto\RSA\MachineKeys folder. This folder stores the private keys associated with machine certificates, including the transport certificate. If the Guest Agent or the extension handler lacks the necessary permissions to access the private key, decryption will fail, resulting in errors such as:
System.Security.Cryptography.CryptographicException: Keyset does not exist
at System.Security.Cryptography.Pkcs.EnvelopedCms.DecryptContent(RecipientInfoCollection recipientInfos, X509Certificate2Collection extraStore)
at Microsoft.Azure.Plugins.Diagnostics.dll.PluginConfigurationSettingsProvider.DecryptPrivateConfig(String encryptedConfig)
Other variations of this permission issue might present similar messages indicating a problem accessing the key container or provider:
Failed to decode, decrypt, and deserialize the protected settings string. Error Message: Keyset does not exist"
Decrypting Protected Settings - Invalid provider type specified
Another related error points to issues during the generation or retrieval of the certificate itself:
[ERROR] Failed to get TransportCertificate. Error: Microsoft.WindowsAzure.GuestAgent.CertificateManager.CryptographyNative+PInvokeException: Self-signed Certificate Generation failed. Error Code: -2146893808.
Other symptoms can include a generic “Unable to retrieve the certificate” error message, which is less specific but still points to a certificate store access or presence issue. Furthermore, the VM diagnostics settings might show a CryptographicException with the message, “The enveloped-data message does not contain the specified recipient.” This specific error often occurs when the certificate used to encrypt the data on the Azure platform side does not match any certificate available on the VM for decryption, often indicating a mismatch or missing certificate.
An example of this specific CryptographicException in the logs might look like this, showing repeated attempts to find a certificate by thumbprint before ultimately failing:
DiagnosticsPluginLauncher.exe Information: 0 : [6/29/2020 1:32:20 PM] Decrypting private configuration
DiagnosticsPluginLauncher.exe Warning: 0 : [6/29/2020 1:32:20 PM] No certficate with given thumbprint found in the certificate store. Thumbprint:34C8CDC747693E0E33A9648703E3990EC4F2C484
DiagnosticsPluginLauncher.exe Information: 0 : [6/29/2020 1:32:20 PM] Retrying after 30 seconds. Retry attempt 1
DiagnosticsPluginLauncher.exe Warning: 0 : [6/29/2020 1:32:50 PM] No certficate with given thumbprint found in the certificate store. Thumbprint:34C8CDC747693E0E33A9648703E3990EC4F2C484
DiagnosticsPluginLauncher.exe Information: 0 : [6/29/2020 1:32:50 PM] Retrying after 30 seconds. Retry attempt 2
DiagnosticsPluginLauncher.exe Warning: 0 : [6/29/2020 1:33:20 PM] No certficate with given thumbprint found in the certificate store. Thumbprint:34C8CDC747693E0E33A9648703E3990EC4F2C484
DiagnosticsPluginLauncher.exe Information: 0 : [6/29/2020 1:33:20 PM] Retrying after 30 seconds. Retry attempt 3
DiagnosticsPluginLauncher.exe Error: 0 : [6/29/2020 1:33:50 PM] System.Security.Cryptography.CryptographicException: The enveloped-data message does not contain the specified recipient.
at System.Security.Cryptography.Pkcs.EnvelopedCms.DecryptContent(RecipientInfoCollection recipientInfos, X509Certificate2Collection extraStore)
at Microsoft.Azure.Plugins.Diagnostics.dll.PluginConfigurationSettingsProvider.DecryptPrivateConfig(String encryptedConfig)
Finally, sometimes the issue is not the absence of a certificate but the presence of a new, potentially interfering certificate pushed to the VM. While less common, a newly introduced certificate could disrupt the expected workflow or certificate selection process used by the Guest Agent and extensions.
Cause: Workflow and Dependency Code Changes¶
The root cause for many of these certificate issues lies in architectural changes implemented within the Azure platform, particularly around May 2020. These changes aimed to refine the workflow for VM extensions, reducing their dependencies on certain legacy Azure components and improving overall reliability and speed. The new architecture necessitates closer cooperation between the Azure platform’s Custom Resource Providers (CRPs), the Guest Agent running inside the VM, and the extensions themselves.
In this updated workflow, the Azure platform encrypts sensitive extension settings (referred to as “protected settings”) using a specific transport certificate. This certificate, including its corresponding private key, must be present and accessible to the Guest Agent on the VM for the decryption process to succeed. Minor bugs or misconfigurations in the rollout or interaction between these components have occasionally led to scenarios where the correct certificate is not provisioned correctly, or its private key is not accessible to the Guest Agent, resulting in the decryption failures observed. The Guest Agent is responsible for fetching the “goal state” from the Azure fabric, which includes configuration for extensions, and handling the decryption of protected settings using the provisioned transport certificate.
Solution 1: Update the Certificate¶
One of the primary solutions involves ensuring the correct transport certificate is present and recognized by the Guest Agent. This can often be achieved by forcing the Azure platform to re-provision or regenerate the necessary certificate on the VM. The specific certificate involved in the newer workflow is typically named “Windows Azure CRP Certificate Generator.”
Follow these steps to update or regenerate the required certificate:
- Check for the presence of the “Windows Azure CRP Certificate Generator” certificate on the VM. You can do this using the Certificates snap-in in the Microsoft Management Console (MMC). Open
certmgr.mscand navigate toCertificates (Local Computer)->Personal->Certificates. Look for a certificate with the subjectDC=Windows Azure CRP Certificate Generator. - If the “Windows Azure CRP Certificate Generator” certificate is found, delete it. In the MMC Certificates snap-in, select the certificate and click the ‘Delete’ icon or press the Delete key. Confirm the deletion when prompted. Deleting this certificate might seem counter-intuitive, but the Guest Agent and the Azure platform are designed to regenerate it if it’s missing and needed for the current goal state.
-
Trigger a new goal state fetch for the Guest Agent. This action prompts the Azure platform to evaluate the VM’s configuration, including extensions, and potentially provision or re-provision necessary components like the transport certificate. You can trigger a new goal state using one of the following methods:
-
Using Azure PowerShell: Run the following commands, replacing the placeholders with your resource group and VM names. The
Update-AzVMcommand pushes the current configuration to the VM, forcing the Guest Agent to check for updates to its goal state.$rg = "<name-of-the-resource-group-containing-the-virtual-machine>" $vmName = "<name-of-the-virtual-machine>" $vm = Get-AzVM -ResourceGroupName $rg -Name $vmName Update-AzVM -ResourceGroupName $rg -VM $vm -
Performing a ‘Reapply’ operation: The ‘Reapply’ operation in the Azure portal or via Azure CLI achieves a similar result to
Update-AzVM, forcing the VM configuration to be re-applied. This often includes re-evaluating and pushing the latest goal state to the Guest Agent. Consult Azure documentation for specific steps on performing a ‘Reapply’.
4. After triggering the new goal state, retry the extension operation that was failing. Monitor the Guest Agent logs (WaAppAgent.log) and the specific extension logs (C:\WindowsAzure\Logs\Plugins\<ExtensionName>) to see if the certificate is regenerated and the extension proceeds correctly. The Guest Agent should acquire the new certificate and successfully decrypt the protected settings.
-
If the certificate update and goal state trigger do not resolve the issue, the VM might be in a state where a simple re-application isn’t sufficient. In this scenario, stopping or deallocating the VM and then starting it again can force a full re-initialization of the Guest Agent and its communication with the Azure fabric, often resolving transient issues with certificate provisioning.
Solution 2: Fix the Access Control List (ACL) in the MachineKeys or SystemKeys Folders¶
If the FailedToDecryptProtectedSettings error persists and specifically points to the “Keyset does not exist” issue (as seen in the symptoms section), the problem is likely related to incorrect file system permissions on the directory storing the private keys. The private keys for machine certificates are typically stored within subfolders of C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys. The Guest Agent and extension handlers require specific permissions to access the private key associated with the transport certificate to perform decryption.
Follow these steps to fix the Access Control List (ACL) on the relevant key container file:
-
Identify the unique key container name associated with the tenant certificate. This name corresponds to a file within the
MachineKeysfolder. Open an administrative PowerShell console and run the following commands. You need to specify the correct certificate name based on whether your VM is a classic deployment model (RDFE) or an Azure Resource Manager (ARM) deployment. Ensure you comment out the$certNamedefinition that does not apply to your VM.# Comment out one of the following certificate name definitions based on your VM type. # $certName = "Windows Azure Service Management for Extensions" # Classic RDFE VM $certName = "Windows Azure CRP Certificate Generator" # Azure Resource Manager VM # Get the unique key container name for the specified certificate $fileName = (Get-ChildItem Cert:\LocalMachine\My | Where-Object {$_.Subject -eq "CN=$certName"} ).PrivateKey.CspKeyContainerInfo.UniqueKeyContainerName # Output the filename to confirm it was found Write-Host "Identified key container file: $fileName"Note: The
Where-Objectfilter usesCN=which is common for the subject of the ‘Windows Azure CRP Certificate Generator’ certificate. AdjustCN=toDC=if needed based on the exact certificate subject shown incertmgr.msc. -
Before making any changes, back up the current ACL permissions for the
MachineKeysfolder. This allows you to restore the permissions if necessary. Run theicaclscommand:icacls C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys /save machinekeys_permissions_before.aclfile /t Write-Host "Backed up MachineKeys ACLs to machinekeys_permissions_before.aclfile" -
Apply the correct permissions to the specific key container file identified in step 1 using the
icaclscommand. TheSYSTEMaccount (which the Guest Agent often runs under) needs Full Control ((F)) and theAdministratorsgroup needs Read and Execute ((RX)).icacls C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\$fileName /grant SYSTEM:(F) icacls C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys\$fileName /grant Administrators:(RX) Write-Host "Applied SYSTEM and Administrators permissions to $fileName"These permissions ensure that the necessary system processes and administrative tools can access the private key file.
-
After applying the permissions, redirect the updated MachineKeys ACLs to a text file to verify the changes.
icacls C:\ProgramData\Microsoft\Crypto\RSA\MachineKeys /t > machinekeys_permissions_after.txt Write-Host "Saved updated MachineKeys ACLs to machinekeys_permissions_after.txt" -
Open the
machinekeys_permissions_after.txtfile in a text editor (like Notepad) and review the permissions applied to the$fileNameidentified in step 1. Verify thatSYSTEMhas(F)andAdministratorshas(RX)permissions specifically on that file. -
Attempt the extension operation again. If the extension is managed by the Guest Agent, restarting the Guest Agent service might be necessary for it to pick up the corrected permissions. You can restart the “Windows Azure Guest Agent” service via the Services console (
services.msc) or by running theWaAppAgent.exeorWindowsAzureGuestAgent.exetool located in the Guest Agent installation directory (e.g.,C:\WindowsAzure\GuestAgent_<VersionNumber>).
If applying permissions to the MachineKeys folder for the specific file does not resolve the “Keyset does not exist” error, it is possible that the key is stored elsewhere or there’s a related permission issue in the SystemKeys folder. You can try to run the icacls commands (steps 2-4, skipping step 1 which gets a specific filename) on the C:\ProgramData\Microsoft\Crypto\SystemKeys\* wildcard folders instead of targeting a single file in MachineKeys. Be cautious when modifying permissions on these critical system folders.
mermaid
graph TD
A[Azure Platform] --> B{Send Goal State<br>with Encrypted Settings};
B --> C[Azure Guest Agent (waagent.exe)<br>on VM];
C --> D[Receive Goal State];
D --> E{Decrypt Protected Settings};
E -- Requires --> F[Transport Certificate<br>+ Private Key];
F -- Stored in --> G[MachineKeys/SystemKeys Folder];
G -- Access Controlled by --> H[File System ACLs];
E -- Success --> I[Process Extension Configuration];
E -- Failure (Missing Key/Permissions) --> J[Extension Error<br>e.g., FailedToDecryptProtectedSettings];
I --> K[Extension Operations];
K --> L{Success or Failure};
J --> M[Troubleshooting Steps<br>(Update Cert/Fix ACL)];
M --> C;
Diagram: Simplified workflow illustrating the decryption process and potential failure points.
More Information¶
For additional context on how Azure VM extensions handle certificates and potential issues with multiple certificates, refer to official Azure documentation. Understanding the role of different certificates on a VM can help diagnose related issues.
Let us know in the comments if you have encountered similar issues or have alternative troubleshooting steps that have worked for you. Sharing your experiences can help others facing these challenges.
Post a Comment