Troubleshooting SQL Server Always On Failover: Error 0x80990F75 in Data Protection Manager

Table of Contents

SQL Server Always On Availability Groups (AGs) provide a robust high-availability and disaster-recovery solution by ensuring that a set of user databases, known as availability databases, fail over together. System Center Data Protection Manager (DPM) is a powerful enterprise backup system designed to protect these critical SQL Server workloads. However, integrating these complex systems can sometimes lead to unexpected errors, such as the 0x80990F75 error, which indicates a connection refusal during backup operations.

This article delves into the specifics of error 0x80990F75 encountered when DPM attempts to protect a SQL Server Always On Availability Group. We will explore the common symptoms, dissect the underlying causes, and provide comprehensive resolution steps to ensure your SQL Server Always On backups succeed without interruption. Understanding the intricate relationship between DPM and SQL Server’s Always On configuration is crucial for maintaining data integrity and business continuity.

Troubleshooting SQL Server Always On Failover

Understanding the Problem: Error 0x80990F75

When attempting to back up a SQL Server Always On Availability Group using System Center Data Protection Manager, an administrator might encounter a failed job accompanied by the error code 0x80990F75. This error fundamentally signifies that the SQL Server instance hosting the availability group is refusing a connection from the DPM protection agent. Such connection failures can be perplexing, especially when other network communications appear to be functional.

The DPM agent relies on specific configurations and permissions within the SQL Server environment to perform its backup tasks effectively. If these prerequisites are not met, the agent cannot establish the necessary communication channels, leading to backup job failures. Diagnosing this error requires a methodical approach, examining both DPM’s perspective and the SQL Server’s Always On configuration. This deep dive will ensure that every potential misconfiguration is identified and rectified.

Symptoms of the 0x80990F75 Error

The primary symptom of this issue is a failed backup job within the System Center Data Protection Manager console. When an administrator initiates a protection job for a SQL Server Always On Availability Group, the job will eventually terminate with an error message detailing the connection refusal. This failure impacts the recoverability of the availability databases, jeopardizing the organization’s data protection strategy.

A typical error message observed in the DPM console might look like this:

The DPM job failed for SQL Server 2012 database <DBname> on <serverName> because the SQL Server instance refused a connection to the protection agent. (ID 30172 Details: Internal error code: 0x80990F75)

Beyond the direct error message, further investigation into the SQL Server Always On configuration often reveals specific settings that contribute to this problem. These settings, while seemingly logical in other contexts, conflict with DPM’s operational requirements for successful backups. Scrutinizing the Always On group’s properties is a critical step in pinpointing the exact cause of the connection refusal.

Always On Availability Group Configuration Details

When troubleshooting this error, it’s common to find the SQL Server Always On Availability Group configured with the following characteristics:

  • Availability Mode: Synchronous Commit. This mode ensures that transactions are committed on both the primary and secondary replicas before acknowledging the commit to the client, providing maximum data protection.
  • Failover Mode: Automatic. With automatic failover, the system can automatically transition the primary role to a secondary replica if the primary becomes unavailable, enhancing high availability.
  • Connections in Primary Role: Allow all connections. This setting ensures that applications can connect to the primary replica without restriction, which is standard practice for operational databases.
  • Readable Secondary: No. This particular setting is frequently identified as a key contributor to the 0x80990F75 error, as it dictates how secondary replicas handle read-only connection requests.
  • Backup Preferences:
    • Prefer Secondary: This preference indicates that backup operations should ideally occur on a secondary replica to offload the primary and minimize its operational impact.
    • Priority: 50 (for each node). Higher priority values suggest a more preferred replica for backups.
    • Exclude Replica: False (for each node). This means all replicas are eligible for backups based on preference and priority.

The “Prefer Secondary” backup preference generally directs backup operations to a secondary replica unless the primary is the only replica online. In scenarios with multiple secondary replicas, the one with the highest backup priority is chosen. If only the primary replica is available, the backup should, by design, occur on the primary replica. The interaction of these backup preferences with the Readable Secondary: No setting is where the conflict often arises, preventing DPM from establishing the necessary connection.

The Root Cause: Incorrect Always On Settings for DPM Backups

The primary cause of the 0x80990F75 error in this specific scenario lies in the configuration of the SQL Server Always On Availability Group, particularly the “Make Readable Secondary” option. When this setting is configured to “No,” secondary replicas are explicitly prevented from accepting read-only connections. This seemingly benign setting has significant implications for DPM backup operations.

Data Protection Manager leverages the Volume Shadow Copy Service (VSS) architecture to perform backups of SQL Server databases. For Always On Availability Groups, DPM typically attempts to connect to a secondary replica (as per the “Prefer Secondary” backup preference) to perform a VSS snapshot backup. If the secondary replica is configured with Readable Secondary: No, DPM’s protection agent is unable to establish the necessary read-only connection required to initiate the VSS snapshot process. The SQL Server instance, adhering to its configuration, actively refuses the connection attempt from the DPM agent, resulting in the 0x80990F75 error.

The DPM agent needs to query the state of the database and its log files on the chosen replica to prepare for the backup. Without the ability to establish a read-only connection, this essential preliminary step cannot be completed. This refusal is a security and access control mechanism within SQL Server, and while it protects the consistency of the secondary replica from unintended read/write operations, it inadvertently blocks the DPM agent’s legitimate backup requests.

Comprehensive Resolution Steps

Resolving error 0x80990F75 involves addressing the SQL Server Always On configuration and ensuring proper permissions for the DPM protection agent. The solutions are straightforward but require careful execution on all nodes participating in the Availability Group.

Resolution 1: Enabling Readable Secondary Replicas

The most direct solution to the 0x80990F75 error is to enable read-only routing for secondary replicas. This allows DPM to connect to the secondary replicas and perform backups as intended.

Step-by-Step Guide:

  1. Open SQL Server Management Studio (SSMS): Connect to one of the SQL Server instances that is a replica in the Availability Group.
  2. Navigate to Always On Availability Groups: In Object Explorer, expand AlwaysOn Availability Groups and then expand the specific Availability Group that is encountering the backup failure.
  3. Access Availability Replica Properties: Expand Availability Replicas. Right-click on each secondary replica and select Properties.
  4. Configure Readable Secondary: In the Availability Replica Properties dialog box, locate the Readable Secondary dropdown menu. Change its value from No to Yes or Yes, read-intent only.
    • Yes: Allows all connections to the secondary replica.
    • Yes, read-intent only: Allows only connections specified with Application Intent=ReadOnly in their connection string. For DPM, Yes is generally recommended for simplicity and compatibility.
  5. Repeat for All Secondary Replicas: Ensure this change is applied to all secondary replicas within the Availability Group. This is crucial because DPM might select any of them based on backup preferences.
  6. Apply and Test: Click OK to apply the changes. After the setting is updated on all relevant replicas, attempt the DPM protection job again.

Impact of Enabling Readable Secondary:
Enabling Readable Secondary means that the secondary replicas can now accept read-only connections. While this is necessary for DPM backups, it also means that applications can be configured to offload read-only queries to these replicas, potentially reducing the load on your primary. This is often a desirable configuration for many production environments. However, ensure that your infrastructure can handle potential additional connections if you have applications leveraging this capability.

Resolution 2: Verifying DPMRA Service Account and Permissions

If enabling Readable Secondary does not immediately resolve the issue, the problem might stem from insufficient permissions for the DPM protection agent (DPMRA) service account on the SQL Server instance. The DPMRA service requires specific elevated privileges to interact with SQL Server and perform VSS operations.

Part A: Verify DPMRA Service Account

The DPMRA service, which is the core component of the DPM protection agent on the SQL Server, should typically run under the Local System account for full functionality.

Step-by-Step Guide:

  1. Open Services Manager: On each SQL Server instance that is part of the Always On Availability Group, open Computer Management. This can be accessed by right-clicking This PC or My Computer, selecting Manage, then navigating to Services and Applications > Services.
  2. Locate DPMRA Service: Scroll down to find the DPMRA service.
  3. Verify Log On As: Right-click on the DPMRA service and select Properties. Go to the Log On tab. Ensure that Local System account is selected. If it’s not, select it and click Apply, then OK.
  4. Restart Service: After making any changes, restart the DPMRA service for the new settings to take effect.

Part B: Granting Sysadmin Role to NT AUTHORITY\SYSTEM

The Local System account maps internally to NT AUTHORITY\\SYSTEM within SQL Server. Therefore, NT AUTHORITY\\SYSTEM must have the sysadmin server role assigned to it within SQL Server Management Studio to allow the DPMRA service to perform its full range of operations, including VSS snapshots and interacting with Always On.

Step-by-Step Guide:

  1. Open SQL Server Management Studio (SSMS): Connect to the SQL Server 2012 instance (or relevant version) where the Always On Availability Group is hosted.
  2. Navigate to Security Logins: In Object Explorer, expand Security and then Logins.
  3. Locate NT AUTHORITY\SYSTEM: Right-click on NT AUTHORITY\\SYSTEM and select Properties.
  4. Assign Server Roles: In the Login Properties dialog box, navigate to the Server Roles page.
  5. Check Sysadmin: Ensure that the sysadmin checkbox is selected. If it’s not, check it.
  6. Apply Changes: Click OK to apply the changes.
  7. Repeat for All Nodes: This step must be performed on all SQL Server instances that are part of the Always On Availability Group, as any replica might be selected for backup by DPM.
  8. Retest DPM Job: After confirming these settings on all nodes, attempt the DPM protection job again.

Why Sysadmin is Required:
The sysadmin role grants the NT AUTHORITY\\SYSTEM account comprehensive control over the SQL Server instance. While granting such broad permissions might raise security concerns in other contexts, for backup agents like DPM operating on the same server, it’s often a necessary evil due to the deep integration required for VSS and database consistency checks. Without these permissions, the DPMRA service might not be able to interact correctly with the SQL Server VSS writer or query critical Always On metadata.

Troubleshooting Flowchart (Conceptual)

To summarize the troubleshooting process, consider this simplified flowchart:

mermaid graph TD A[DPM Backup Job Fails for AG] --> B{Error 0x80990F75?} B -- Yes --> C{Is "Make Readable Secondary" set to "No" on any replica?} C -- Yes --> D[Set "Make Readable Secondary" to "Yes" on all replicas] D --> E[Retry DPM Job] E -- Success --> F[Issue Resolved] E -- Failure --> G{Is DPMRA service running as Local System?} G -- No --> H[Change DPMRA service to run as Local System; Restart service] H --> I{Does NT AUTHORITY\SYSTEM have sysadmin role in SQL Server?} G -- Yes --> I I -- No --> J[Grant sysadmin role to NT AUTHORITY\SYSTEM on all SQL instances] J --> K[Retry DPM Job] K -- Success --> F K -- Failure --> L[Consult DPM/SQL Server logs for further errors or Microsoft Support] C -- No --> G

This flowchart provides a logical progression to diagnose and resolve the error, ensuring each potential cause is systematically addressed.

Prevention and Best Practices

To prevent similar issues and maintain a healthy DPM-SQL Server Always On environment, consider implementing the following best practices:

  • Standardized Configuration: Establish a standard configuration for all SQL Server Always On Availability Groups, particularly concerning Readable Secondary settings, when integrating with DPM.
  • Regular Audits: Periodically audit the service accounts and permissions for the DPMRA agent on all protected SQL Servers. This ensures that no unintended changes have compromised the backup environment.
  • Monitor DPM and SQL Logs: Actively monitor DPM job status and review SQL Server error logs and DPM agent logs for any warnings or errors that might indicate an impending issue. Early detection can prevent major backup failures.
  • Test Backup and Recovery: Regularly perform test backup and recovery operations for your Availability Groups. This not only validates your DPM configuration but also ensures the recoverability of your critical databases.
  • Understand Always On Backup Preferences: Be fully aware of how “Backup Preferences” within Always On Availability Groups interact with DPM. Ensure your DPM protection groups are configured to align with these preferences and expected backup behaviors.
  • Stay Updated: Keep both SQL Server and System Center Data Protection Manager updated with the latest service packs and cumulative updates. Microsoft frequently releases fixes and improvements that can address compatibility issues and enhance stability.
  • Network Connectivity Checks: Periodically verify network connectivity and firewall rules between DPM servers and SQL Server Always On nodes. Connection issues, though often distinct from 0x80990F75, can also cause backup failures.

Adhering to these best practices will significantly improve the reliability of your DPM backups for SQL Server Always On Availability Groups, minimizing the occurrence of errors like 0x80990F75 and safeguarding your data.

Conclusion

The 0x80990F75 error in Data Protection Manager, while disruptive, is typically a symptom of specific misconfigurations within SQL Server Always On Availability Groups or insufficient permissions for the DPM protection agent. By systematically checking the Readable Secondary setting on all availability replicas and verifying that the DPMRA service runs as Local System with NT AUTHORITY\\SYSTEM holding the sysadmin role in SQL Server, administrators can effectively resolve this issue.

Understanding the interplay between DPM’s backup mechanisms and SQL Server’s high-availability features is paramount for a robust data protection strategy. Implementing the resolutions and adhering to the recommended best practices will not only fix the immediate problem but also contribute to a more resilient and manageable backup environment for your critical SQL Server workloads. Proactive configuration management and vigilant monitoring are key to ensuring seamless data protection operations.


Have you encountered this error in your environment? What steps did you take to resolve it, and what were your key takeaways? Share your experiences and insights in the comments below to help the community learn and grow!

Post a Comment