Saturday 12 May 2018

Active Directory is unavailable after disaster recovery fail-over

Active Directory is available after a fail-over.

Customer has two domain controllers that are replicated to a recovery site using Veeam Backup & Replication.

During a DR test fail-over, Active Directory on both DCs would be available for only a few minutes before stopping working.

Tests like NETDOM QUERY FSMO and NLTEST state the domain is unavailable. NET SHARE shows the SYSVOL and NETLOGON shares are missing.

After a restore or replication Active Directory detects this has happened and attempts to protect it's self and effectively goes into a 'safe mode' so to speak.

The steps below outline what needs to be done to recovery from this. These steps apply to domain controllers using the legacy NTFRS replication and not DCs using DFSR. You can use dfsrmig.exe /getglobalstate to see if you are using NTFRS or DFSR


Step 1 - Power on both DCs and wait for the automatic reboot. If not you can't log in "No domain controllers available"
Step 2 - On DC1 or the DC with the FMSO roles, type NET SHARE and confirm that the SYSVOL and NETLOGON confirm that they are missing. Also check that the domain is unavailable NETDOM QUERY FSMO.
Step 3 - On DC1, CMD "Start SYSVOL" Make a backup of C:\windows\sysvol\domain\policies & C:\windows\sysvol\domain\scripts
Step 4 - NET STOP NTFRS on both DCs
Step 5 - On DC1 Set D4 to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Culmulative Replicate Set\GUID
Step 6 - On DC1 NET START NTFRS
Step 7 - On DC1 Check Event viewer for event id 13516 in File replication stating that the server is a DC
Step 8 - On DC1 CMD "start SYSVOL" (Should be empty)
Step 9 - Copy the backup of the Scripts and Policy folder to c:\windows\sysvol\domain on DC1
Step 12 - On DC1 CMD "Start SYSVOL" and check that Scripts and Policies exists with recent time stamp.
Step 13 - On DC0 Check NTFRS is stopped
Step 14 - On DC0 Set D2 on HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NtFrs\Parameters\Culmulative Replicate Set\GUID
Step 15 - on DC0 NET START NTFRS
Step 16 - On DC0 Open Event Viewer or event id 13516 in File replication event log
Step 17 - Type NET SHARE on both DCs and check that SYSVOL and NETLOGON exist. Restart NETLOGON if the NETLOGON share is missing.

Step 18 - Type NETDOM QUERY FSMO and make sure that both DCs report the same FSMO role holders.

Note that these steps differ from the ones details in this Microsoft KB article, which details setting the BurFlags under the Backup/Restore key, in my steps the BurFlags are under Culmulative Replicate Set

https://support.microsoft.com/en-gb/help/290762/using-the-burflags-registry-key-to-reinitialize-file-replication-servi