Category Archives: Windows Server

Windows Defender Causing iSCSI and Citrix VDA Issues

Over the last few months we have gone through a Defender pilot and rollout. I’ll admit I was surprised how relatively painless the rollout has been. Major problems were scarce and we have been able to enable most features for the general landscape. One feature that is not turned on by default that we decided to turn on was Network Protection. This features provides filtering and detection around web traffic. When we first enabled this in our pilot a number of us were on persistent Citrix VDIs and would randomly experience disconnects and reconnects to our machines. Sometimes this would go away for days or weeks and then randomly come back. Another issue we had was when we were trying to roll out Defender to some of our backup servers. These servers were physical servers with a number of iSCSI volumes mounted on them (using the Microsoft iSCSI Initiator). When the server would boot up they would hang at logon and the system event log would report a ton of iScsiPrt/Event ID: 9 errors around timeouts while trying to mount the storage. Both of these issues were immediately resolved simply by enabling asynchronous inspection. I have found almost nothing on the internet about this, but it resolved two major issues for us. This setting doesn’t seem to be exposed in GPO or other the other management methods and needs to be done manually on the machine. This setting can be enabled in PowerShell using this command:

Set-MpPreference -AllowSwitchToAsyncInspection $true

Note: In addition turning on asynchronous inspection you can obviously also disable network protection altogether while troubleshooting an issue.

The Microsoft documentation now briefly mentions this functionality, but states that it is on by default. We just had the issue within the last two weeks and needed to enable this on our backup servers. If I find anymore information on this I will post it here.

Exchange FIP-FS Scan Engine Update Issues: How to roll-back the update

UPDATE #3: There is the potential that after running Rollback-FIPFSEngine.ps1 that you may see the following error in the Application Event Logs:

Log Name:      Application
Source:        Microsoft-Filtering-FIPFS
Date:          01/04/2022 00:00:00
Event ID:      6027
Task Category: None
Level:         Error
Keywords:      
User:          NETWORK SERVICE
Computer:      exchange-server.domain.com
Description:
MS Filtering Engine Update process was unsuccessful to download the engine update for Microsoft from Custom Update Path.
Update Path:http://amupdatedl.microsoft.com/server/amupdate
UpdateVersion:0
Reason:"There was a catastrophic error while attempting to update the engine. Error: DownloadEngine failed and there are no further update paths available.Engine Id: 1 Engine Name: Microsoft"

If you are seeing the error above this could be an indication that there is an improper ACL on the ‘FIP-FS\Data\Engines’ directory. The Rollback-FIPFSEngine.ps1 script has been updated to set the proper ACL, but if you’ve previously run the script and need to resolve this issue you can run the following on your affected Exchange server(s) from an elevated PowerShell session:

$InstallFIPFSPath = "$($env:ExchangeInstallPath)FIP-FS"
$Sddl = 'O:SYG:SYD:PAI(A;OICI;FA;;;SY)(A;OICI;FA;;;LS)(A;OICI;FA;;;NS)(A;OICI;FA;;;BA)'
$NewSddl = Get-Acl -Path "$InstallFIPFSPath\Data\Engines"
$NewSddl.SetSecurityDescriptorSddlForm($Sddl)
Set-Acl -Path "$InstallFIPFSPath\Data\Engines" -AclObject $NewSddl

UPDATE #2: I have still been recommending this procedure since it doesn’t rely on deleting definitions and then waiting for the definition download from MS. With this workaround you are immediately in a fully working state.

If you have already used the Microsoft reset method on one or some server(s), you can use Rollback-FIPFSEngine.ps1 to copy definitions directly from the good server. This will speed up resolution time as the other servers will not have to wait for a definition download. Simply change the $BackupPathFIPFSPath variable to something like below and run on the other servers:

$BackupPathFIPFSPath = '\\GOODEXSERVER.domain.com\C$\Program Files\Microsoft\Exchange Server\V15\FIP-FS'

Note: Just make sure to re-enable auto-updates by running: Set-EngineUpdateCommonSettings -EnableUpdates $true

UPDATE: Microsoft has released a new engine update with a version number that resolves the issue. If you have already performed the fix above there is *no need* to perform the procedures in their article as you are already in a functional state. All you need to do is run Set-EngineUpdateCommonSettings -EnableUpdates $true to re-enable updates and your server(s) will download the latest update at the next check interval. Their script/procedure is for customers who are broken.


How to roll-back…

As almost anyone with an on-prem Exchange implementation probably knows there was an update released on 12/31/21 that caused email delivery issues globally. The core issue is the new version number is too large to fit in a long variable. Microsoft has acknowledged the issue, but the only current workaround is to disable the FIP-FS engine. The issue here is that certain transport rules also use this service, so if you have transport rules like these you may still have email delays even after disabling the engine. A better option would be to roll-back to the engine version that did not invoke the bug while Microsoft works this out.

I created a script (Rollback-FIPFSEngine.ps1) that makes the roll-back quick and easy. The one pre-requisite is that you need to restore your FIP-FS directory (ex. C:\Program Files\Microsoft\Exchange Server\V15\FIP-FS) from a restore. The restore should be from some point before the bad engine update. You can use the same restore for all Exchange servers assuming they are all running the same version of Exchange and have the same architecture (ex. amd64).

To preform the roll-back you must do the following:

  • Obtain a restore of the FIP-FS directory
  • Copy the Rollback-FIPFSEngine.ps1 script to the Exchange server(s) that need the rollback
  • Edit the $BackupPathFIPFSPath variable of the script to reflect the restored FIP-FS directory path
  • Open a new elevated (as Administrator) Exchange Management Shell PowerShell session on the server. Do not run this from a regular PowerShell session, it must be a Exchange session
  • Execute the script

The script should take a few minutes to execute and will do the following:

  • Re-enable FIP-FS. It doesn’t matter if you previously disabled FIP-FS using the Disable-Antimalwarescanning.ps1 script or Set-MalwareFilteringServer cmdlet. It will re-enable both
  • Globally disable FIP-FS engine updates (so that the problem updates do not return)
  • Stop necessary services (BITS, MSExchangeTransport, and MSExchangeAntispamUpdate)
  • Rename current engine files/directories
  • Copy engine files/directories from restore path
  • Start stopped services

After the script is run everything should be functioning normally. We’ve already run this in our environment and mail flow is now back to normal.

NOTE: After MS releases a fix you will need to re-enable updates by running: Set-EngineUpdateCommonSettings -EnableUpdates $true

FilteringServiceFailureException Error: Microsoft.Exchange.MessagingPolicies.Rules.FilteringServiceFailureException: FIPS text extraction failed with error: ‘WSM_Error: Scanning Process caught exception: (0x00000005) Access is denied

For some time we had been seeing these events in the event logs of our Exchange mailbox servers and the ‘UnifiedContent‘ directory (related to the Hub Transport role) has been growing:

Log Name:      Application
Source:        MSExchange Messaging Policies
Date:          10/26/2021 8:08:10 AM
Event ID:      4010
Task Category: Rules
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      mbx1.domain.com
Description:
Transport engine failed to evaluate condition due to Filtering Service error. The rule is configured to ignore errors. Details: 'Organization: '' Message ID '<1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>' Rule ID 'bcdf1c32-0249-4149-a91b-85ecabaeb695' Predicate '' Action ''. FilteringServiceFailureException Error: Microsoft.Exchange.MessagingPolicies.Rules.FilteringServiceFailureException: FIPS text extraction failed with error: 'WSM_Error: Scanning Process caught exception: 
Stream ID: <1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>
ScanID: {E44453FB-B127-44F8-BEF0-357252C6DAA3}
(0x00000005) Access is denied.  Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231'. See inner exception for details ---> Microsoft.Filtering.FilteringException: WSM_Error: Scanning Process caught exception: 
Stream ID: <1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>
ScanID: {E44453FB-B127-44F8-BEF0-357252C6DAA3}
(0x00000005) Access is denied.  Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231
   at Microsoft.Filtering.InteropUtils.ThrowPostScanErrorAsFilteringException(WSM_ReturnCode code, String message)
   at Microsoft.Filtering.FilteringService.EndScan(IAsyncResult ar)
   at Microsoft.Filtering.FipsDataStreamFilteringService.EndScan(IAsyncResult ar)
   at Microsoft.Exchange.MessagingPolicies.Rules.UnifiedContentServiceInvoker.TextExtractionComplete(IFipsDataStreamFilteringService textExtractionService, TextExtractionCompleteCallback textExtractionCompleteCallback, IAsyncResult asyncResult)
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.MessagingPolicies.Rules.UnifiedContentServiceInvoker.GetUnifiedContentResults(FilteringServiceInvokerRequest filteringServiceInvokerRequest)
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetUnifiedContentResults()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetAttachmentStreamIdentities()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetAttachmentInfos()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.get_AttachmentNames()
   at Microsoft.Exchange.MessagingPolicies.Rules.MessageProperty.OnGetValue(RulesEvaluationContext baseContext)
   at Microsoft.Exchange.MessagingPolicies.Rules.Property.GetValue(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.TextMatchingPredicate.OnEvaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.PredicateCondition.Evaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.AndCondition.Evaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.RulesEvaluator.EvaluateCondition(Condition condition, RulesEvaluationContext evaluationContext)
   at Microsoft.Exchange.MessagingPolicies.Rules.TransportRulesEvaluator.EvaluateCondition(Condition condition, RulesEvaluationContext evaluationContext). Message-Id:<1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>'

You may notice the ‘T:\TransportRoles\data‘ path above and this is due to the fact that we have our transport queue database path set to an alternate location. It is clear in the error that there is an access issue as it is is stating ‘(0x00000005) Access is denied. Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231‘ as the core issue. Looking at the ‘Temp‘ directory ACL we saw the current permissions were:

  • LocalSystem – Full Control
  • Administrators – Full Control
  • NetworkService – Full Control

These permissions seem correct at face value, but when we look at the ACL of one of the files we actually found:

  • LocalSystem – Full Control
  • Administrators – Full Control
  • NetworkService – Full Control
  • LocalService – Full Control

If you look at a default Exchange installation you will also see the ACL above is how it is set. It seems that when using a non-default queue database location you are required to set the ACL yourself as it won’t be set automatically. After fixing the ACL we simply shut down the transport service, cleared the directory, and restarted the transport service:

Stop-Service MSExchangeTransport
Remove-Item -Path "T:\TransportRoles\data\Temp\UnifiedContent\*"
Start-Service MSExchangeTransport

After this change the ‘UnifiedContent‘ directories are no longer growing and the error we started with is no longer appearing in the event log.

Modifying Windows Server Failover Cluster (WSFC) node subnets the quick and dirty way

We were in the middle of expanding a server subnet and ran into complications. If you did not bring down the entire cluster and tried to make subnet mask changes to a NIC you ran into all kinds of complications which could take a while to fix. We wound up evicting nodes, removing/recreating SQL AG listeners (for SQL AlwaysOn nodes), rebooting nodes, etc. Basically, WSFC pulls subnet configuration directly from the NICs used in the cluster and you have to bring cluster services down, make changes, bring them back up in a specific order to avoid issues. I decided to find a faster way to accomplish this. It involves modifying the CLUSDB cluster configuration registry hive (here is a good page on this database). When cluster services are actually running this registry hive is mounted as ‘HKLM\Cluster‘, but we will be taking down the cluster and manually mounting the hives for offline modification.

  1. Stop other cluster-related services (like SQL Server) on all nodes
  2. Shutdown the entire cluster. You can do this from the ‘Failover Cluster Manager’, PowerShell, or the old ‘cluster.exe’ utility
  3. Verify the ‘Cluster Service‘ service is stopped on all nodes
  4. Update the subnet mask(s) on the appropriate NICs on all nodes
  5. Update the CLUSDB registry hives on ALL nodes. Make sure you are updating the correct registry values with the correct values. There are both regular subnet mask values AND subnet CIDR values to update
    • BACKUP THE ‘C:\Windows\Cluster\CLUSDB’ files on all nodes BEFORE making any changes. You may want to backup/snapshot the server as well
    • Load the ‘C:\Windows\Cluster\CLUSDB’ registry hive (REG LOAD HKLM\CLUSDB “C:\Windows\Cluster\CLUSDB”)
    • Edit HKLM\CLUSDB\NetworkInterfaces\OBJECTGUID\AddressPrefixList\0000  values
    • Edit HKLM\CLUSDB\Networks\OBJECTGUID  values
    • Edit HKLM\CLUSDB\Networks\OBJECTGUID\PrefixList\0000  values
    • Edit HKLM\CLUSDB\Resources\OBJECTGUID\Parameters values (for IP address resources)
    • UNLOAD HIVES ON ALL NODES (REG UNLOAD HKLM\CLUSDB)
  6. Start cluster again
  7. Verify all networks and IP address cluster resources are displaying right subnet values
  8. Start any other cluster-related services you may have stopped in step #1
  9. Verify functionality of the cluster and all other services

This method could also be used to move the cluster to an entirely new subnet, but it might be easier just to create new IP resources (which will create new cluster networks) at that point.

Here are the first two examples of the registry changes above: