Monthly Archives: January 2019

A NIC device is tied to a disallowed network

I recently received a call from a former colleague where they were unable to update a machine catalog. They stated nothing had changed in vCenter, Citrix, or the master image. The error they were receiving was:

Error Id: XDDS:919D761E

Exception: Citrix.Console.Models.Exceptions.ProvisioningTaskException Create Catalog failed with an unknown reason, see terminating error for more details. at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.ProvisioningSchemeTask.CheckForTerminatingError(SdkProvisioningSchemeAction sdkProvisioningSchemeAction) at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.ProvisioningSchemeTask.WaitForProvisioningSchemeActionCompletion(Guid taskId, Action`1 actionResultsObtained) at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.ProvisioningSchemeCreationTask.StartProvisioningAction() at Citrix.Console.PowerShellSdk.ProvisioningSchemeService.BackgroundTasks.ProvisioningSchemeCreationTask.RunTask() at Citrix.Console.PowerShellSdk.BackgroundTaskService.BackgroundTask.Task.Run()


DesktopStudio_ErrorId : ProvisioningTaskError
ErrorCategory : NotSpecified
ErrorID : NetworkNotPermitted
TaskErrorInformation : Terminated
InternalErrorMessage : A NIC device is tied to a disallowed network.
DesktopStudio_PowerShellHistory : Create Machine Catalog 'XenApp - WSRV12 - DAPPS - DR'
11/25/2018 7:09:42 AM

The key error here is ‘A NIC device is tied to a disallowed network’. If you do a quick search you will find an article referencing this error: CTX139460. This points to a change in the vCenter networking config, but supposedly there weren’t any changes. Time to do some digging. I asked them to get networking info from both vCenter and CItrix using PowerShell.

To get the hypervisor networking I asked him to log in to one of the delivery controllers, launch PowerShell as administrator, and run the following:

Add-PSSnapin Citrix*’
dir XDHyp:\HostingUnits | Select PSPath,HostingUnit*,*Network* | Format-List

The output of this was:

PSPath : Citrix.Host.Admin.V1\Citrix.Hypervisor::XDHyp:\HostingUnits\DR_Cluster-vm_dr
HostingUnitName : DR_Cluster-vm_dr
HostingUnitUid : bddd641a-a55c-4f0e-bd62-9331502fd908
NetworkId : Network:network-641
NetworkPath : XDHyp:\Connections\PRDVCENTER01\DR.datacenter\DR Cluster.cluster\VM Network 201.network
...

The thing to take note of is the ‘NetworkId‘ for the DR hosting connection. This Id is the vCenter MoRef (Managed Object Reference) for the VM network. I then had him pull the VM networks from vCenter using PowerCLI.

To get the VM networks (with MoRef) from vCenter I asked him to launch VMware PowerCLI as administrator and run the following:

Connect-VIServer prdvcenter01.domain.com
Get-View -ViewType Network | Select Name,MoRef

The output of this was:

Name                                                        MoRef
---- -----
VM Network 201 Network-network-4790

...

The MoRef was network-641 in MCS, but network-4790 in vCenter even though the VM network names were the same. From this it was clear there was a networking change performed on the vCenter side at some point. After stating this, it was revealed that port groups were deleted and recreated (which generated new MoRef ids) in this DR cluster. At this point we have to reconfigure the hosting connection networking with the new MoRef and this cannot be done in Citrix Studio. To do this we have to reconfigure the ‘NetworkPath‘ in the hosting connection, but use the same ‘NetworkPath‘ since the network name did not change. Running this will force the network MoRef to be queried and updated in the MCS hosting connection.

To reset (or change if needed) the ‘NetworkPath‘ in the hosting connection you take the ‘PSPath‘ from the first command and copy everything starting with ‘XDHyp‘. I took that path and provided them with this command to run:

Set-Item –Path ‘XDHyp:\HostingUnits\DR_Cluster-vm_dr’ –NetworkPath ‘XDHyp:\Connections\PRDVCENTER01\Sungard.datacenter\DR Cluster.cluster\VM Network 201.network’

Finally, I asked them to re-run the first ‘dir‘ command again to verify the network MoRef updated. After doing this they were able to successfully update the machine catalog.

The replication operation failed because of a schema mismatch between the servers involved

Last week I had deploy a new domain controller to the root domain in a forest (it happened to be an RODC for a unique use case, but that is irrelevant). The domain only partially replicated before failing and showing errors on the new DC.

The errors were:

Log Name:      Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 1/4/2019 11:19:18 AM
Event ID: 1791
Task Category: Replication
Level: Error
Keywords: Classic
User: ANONYMOUS LOGON
Computer: rodc1.domain.com
Description:
Replication of application directory partition DC=domain,DC=com from source 24c77a2c-6da0-41a1-95cf-e0542bca5b89 (dc1.domain.com) has been aborted. Replication requires consistent schema but last attempt to synchronize the schema had failed. It is crucial that schema replication functions properly. See previous errors for more diagnostics. If this issue persists, please contact Microsoft Product Support Services for assistance. Error 8418: The replication operation failed because of a schema mismatch between the servers involved..
Log Name:      Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 1/4/2019 11:19:31 AM
Event ID: 1203
Task Category: Replication
Level: Warning
Keywords: Classic
User: ANONYMOUS LOGON
Computer: rodc1.domain.com
Description:
The directory service could not replicate the following object from the source directory service at the following network address because of an Active Directory Domain Services schema mismatch.
Object:
CN=Bob Smith,OU=Users,OU=All Users,DC=domain,DC=com
Network address:
24c77a2c-6da0-41a1-95cf-e0542bca5b89._msdcs.domain.com

It was obvious that the object referenced in the second event was causing the issue, but this object was in use and I couldn’t just remove it. When looking for related errors on the source DC I found this:

Log Name:      Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 1/4/2019 11:04:33 AM
Event ID: 1450
Task Category: Internal Processing
Level: Error
Keywords: Classic
User: ANONYMOUS LOGON
Computer: dc1.domain.com
Description:
The security descriptor propagation task could not calculate a new security descriptor for the following object.
Object:
CN=Bob Smith,OU=Users,OU=All Users,DC=domain,DC=com
This operation will be tried again later.
User Action
If this condition continues, attempt to view the status of this object and manually change the security descriptor.

Additional Data
Error value:
1340 The inherited access control list (ACL) or access control entry (ACE) could not be built.

This was much more specific and showed there was an issue with the ACL of the object. I tried making one small change to the security ACL on the object to verify there was an issue and received an error. This ACL was either corrupt or too large. I decided to try repairing the ACL on the object by using ADSI Edit (adsiedit.msc) to remove everything from the ACL, add only ‘Domain Admins’ and ‘SYSTEM’ with Full Control, and then resetting it using DASACLS. I also had to do this for the ‘ExchangeActiveSyncDevices’ child object and the leaf objects under that since this user had Exchange ActiveSync devices. I verified the child and leaf objects were inheriting from the user object and proceeded to reset the ACL using the DSACLS:

dsacls "CN=Bob Smith,OU=Users,OU=All Users,DC=domain,DC=com" /S /T

After resetting the ACL replication to this domain controller completed with this event:

Log Name:      Directory Service
Source: Microsoft-Windows-ActiveDirectory_DomainService
Date: 1/4/2019 11:49:51 AM
Event ID: 1394
Task Category: Service Control
Level: Information
Keywords: Classic
User: ANONYMOUS LOGON
Computer: rodc1.domain.com
Description:
All problems preventing updates to the Active Directory Domain Services database have been cleared. New updates to the Active Directory Domain Services database are succeeding. The Net Logon service has restarted.

Storage vMotion operations timing out

I recently ran across an issue during a ESXi cluster/SAN migration where we were down to a handful of VMs that were failing when trying to move them to the new cluster/SAN (using simultaneous compute/storage vMotion operations). I’d like to note that this was on vCenter 6.7 and ESXi 6.5.

The errors were:

  • Timed out waiting for migration data. The source detected that the destination failed to resume.
  • Operation timed out.
  • Timed out waiting for migration data. vMotion migration {#######################} failed to read stream keepalive: connection closed by remote host, possibly due to timeout.

I looked at all the standard issues (storage issues, vMotion connectivity issues, etc.) When looking at the VMs the only thing that made them different compared others in the cluster was the number of virtual disks attached to them. All four of the VMs were SQL Server Availability Group members and had a larger number of disks (5+). When looking into timeouts related to the number of disks I came across this VMware article: Using Storage vMotion to migrate a virtual machine with many disks timeout (1010045). The errors in the article were not the same, but it it aligned with my suspicion about the number of disks. I couldn’t look at the kernel vpxd logs because they had already rolled over, but I decided to give it a shot. I shutdown the problem VMs, set the fsr.maxSwitchoverSeconds configuration parameter to 900 for each one, powered them on, and retried the compute/storage vMotion operations. All vMotion operations completed successfully after this change.

I would like to note that there is a separate configuration parameter called vmotion.maxSwitchoverSeconds which controls the compute side of things. You can try adjusting this as well when having vMotion timeout issues.