Category Archives: Scripting

The woes of implementing M365 Information Barriers – One BIG gotcha and how to undo unexpected Teams team/group chat/meeting membership removals using Graph API

For the better part of three years we had been working to implement Information Barriers for Microsoft Teams. All throughout that time we ran into countless issues, bugs, and back-end changes forcing us to open a number of tickets with Microsoft and do multiple rounds of extensive testing before we were comfortable implementing. The day finally came when we were ready to implement. Finally one evening we had everything staged and we were ready to flip the switch later that week. To our surprise a number of unexpected changes occurred a day before flipping the switch. A large set of users were removed from teams, group chats, and meetings. There were also reports of users not being able to communicate with each other via chat/calls/meetings. Some of the users affected were not even part of either of the two IB segments that we were implementing policies for. After a little digging it became clear that IB started to go haywire sometime after we activated the IB policies (we did this as part of staging), but before running Start-InformationBarrierPoliciesApplication (this was our final step in activation). We immediately disabled both of the IB policies and ran Start-InformationBarrierPoliciesApplication to force an immediate processing of the policies. After this was done processing, all users were able to communicate again, but we were still left with all of the team/group chat/meeting removals.

I decided to start with addressing the team removals. I decided the quickest way to get a list of these was to do an Azure AD audit search since all team member removals translate to member removals of the associated M365 group. I already had experience with how IB removals show from an Azure AD perspective, so I started by running an audit search for AAD group removals and then filtered this using the three principals that could perform these IB-related removals:

Information Barrier Processor
5214069a-b412-4856-b885-c1db77635b0a

Microsoft Teams Services
47ec8689-d241-46e8-adfa-c7d2144c9848

Microsoft Substrate Management
8f3d0e67-f567-4e23-8e81-2cd5ee520c4c

I then exported this to a CSV, spot checked it for any removals that seemed to not be IB-related, and ran a little PS loop to add the members back to the M365 groups:

$RemovalList = Import-Csv -Path "C:\Temp\TeamsUserReadds.csv"

ForEach ($Removal in $RemovalList) {
	Add-AzureADGroupMember -ObjectId $Removal.GroupId -RefObjectId $Removal.UserId
}

Now for the group chat/meetings removals; these unfortunately were not going to be easy. I started digging into how I would even detect these removals. I first tried M365 audit logs, but the IB-initiated removals did not show in any audit logs. The only place I knew of where you could see a removal was in the chat itself. Whenever a user is added or removed from a group chat/meeting there is an event message generated in the chat. I started digging into the Teams Graph APIs for ways to query these events. Eventually I wrote a pair of scripts to deal with these removals. The first script is GetGroupChatRemovals.ps1. This script crawls through all group chats (and optionally meetings) for a specified set of users over a specified date/time range for removals and creates a CSV of each unique removal. We were able to produce a set of users to scan against because every removal could be traced back to group chat/meeting where there was at least one member who was in one of the IB segments. Sometimes the user who was removed was not in either IB segment, but that didn’t matter because the script would be scanning the whole group chat/meeting for all removals. Since we knew the approximate start and end times of the removals, we had a fairly tight window of ~15 hours to scan. There were multiple challenges with the API, one of them being how it sorts chats when you request them and the inability to properly filter on the last event. I had to make the script walk backwards through chats until it was outside of the specified date/time range. After that list was created it would go through each individual message searching for the deleted system-generated member removed messages. Meetings were even worse. Not only are there many meetings, but for some reason meetings were not being returned in an order based on last event. Because of this you would have to just go through all meetings. For this reason I made scanning meetings optional. This is how the script is executed (I recommend using dot sourcing so you are able to retrieve the output variables after execution in case of a failure or issue):

."C:\Temp\GetGroupChatRemovals.ps1" -CsvPath 'C:\Temp\IBUsers.csv' -TenantId 'aa10484c-ea8b-4d62-a130-54b306384fcc' -AppId '9f3d6a8e-4e1f-436b-9972-a6918e07dbcb' -AppSecret '*PqbUTk,#d;&gK:c@[]4tRHE^TCQ<ZDQ8-NW6Q&9' -StartDateTimeString '2022-12-01T00:00:00.000Z' -EndDateTimeString '2022-12-01T15:00:00.000Z' -OutputCsvPath 'C:\Temp\ChatRemovalsOutput.csv' -ErrorTxtPath 'C:\Temp\ErrorUserUPNs.txt'

The parameters are as follows:

  • CsvPath – A path for a CSV containing all UPNs you are scanning (this will scan for any removals in group chats/meetings that the user was part of)
  • TenantId – Your tenant ID
  • AppId – Your app ID. (You will need to create an app registration with the following permissions: Chat.Read.All, ChatMember.ReadWrite.All. You will also need to request access to the Teams Protected APIs to read chats as an application)
  • AppSecret – The secret created for your application registration
  • StartDateTimeString – The start date/time in UTC ISO 8601 format
  • EndDateTimeString – The end date/time in UTC ISO 8601 format
  • OutputCsvPath – The path to write the output CSV
  • ErrorTxtPath – The path to write the output TXT of UPNs of users the script was unable to retrieve chats for
  • IncludeMeetings – Include this as a switch if you want to scan meetings in addition to group chats. This is optional

Once the script is completed you are left with output like this:

The results this script produced were surprising. IB removed many people (IB segmented and non-segmented) from many group chats. Now that we had this list we used the second script (UndoChatRemovals.ps1) to restore the memberships. This script is fairly simple and you can feed it the exact same CSV that was created from the first script. We actually decided to filter this down to a much smaller list based on the last chat activity column (by excluding stale chats). The four parameters in this script are the same as the first script. This is how the script is executed:

."C:\Temp\UndoChatRemovals.ps1" -CsvPath 'C:\Temp\ChatRemovalsOutput.csv' -TenantId 'aa10484c-ea8b-4d62-a130-54b306384fcc' -AppId '9f3d6a8e-4e1f-436b-9972-a6918e07dbcb' -AppSecret '*PqbUTk,#d;&gK:c@[]4tRHE^TCQ<ZDQ8-NW6Q&9'

Once this script is completed all removed members should have been re-added. There are a few things to note. One is that there will be system messages generated in each chat for each member add event. Another is that there is no way to know if the removed member had limited visibility (added with the ability to only see beyond a certain date) or had access to the entire chat, so we have to add the user back with full visibility.

In conclusion, I will say this all could have been avoided by coupling the two IB activation steps together. Do not activate an IB policy until you are ready to run Start-InformationBarrierPoliciesApplication. The documentation does not mention that any actions happen until you complete the last step and in testing we never saw anything happen until completing the last step. For some reason IB starts to malfunction (seems to take hours for this to happen) if a policy is left active without completing the last step. When running them together it behaves as expected, lesson learned.

Running Sync-ModernMailPublicFolders.ps1 with Modern Authentication

UPDATE: After years of Microsoft leaving users lost and confused, they finally decided to refactor and release a new version their script (only a few days after I published this article). Their new script is now based on the EXO PS v2 module which uses Graph API and no longer uses EXO PS Remoting.

As part of an Exchange Online migration I was re-running a Public Folder sync that I initially ran two years ago when we still had basic authentication enabled from internal networks for Exchange Online. When I went to re-run I realized it was not going to work because we completely disabled basic authentication. I had recently created an unattended script that connects to Exchange Online using MSAL and integrated Windows authentication (IWA) and was able to use this methodology in this script without making any modifications to the script.

To make this work we first need to install the MSAL.PS module:

Install-Module -Name 'MSAL.PS'

Next, we need to get an OAuth token for Exchange Online PowerShell, create a bearer token secure string, and create a basic credential from it along with our UPN. We will need to specify the client ID for Exchange Online PowerShell (612e1698-a44c-49a4-b66b-4188cc69cbaa) along with our tenant ID (I’m entering a fake one here). When prompted you will login with an account that has EXO administrative access:

$EXOToken = Get-MsalToken -ClientId 'a0c73c16-a7e3-4564-9a95-2bdf47383716' -TenantId '612e1698-a44c-49a4-b66b-4188cc69cbaa' -Scopes 'https://outlook.office365.com/.default'

$TokenSecureString = ConvertTo-SecureString "Bearer $($EXOToken.AccessToken)" -AsPlainText -Force

$Credential = New-Object System.Management.Automation.PSCredential($EXOToken.Account.Username, $TokenSecureString)

Now we can execute the script, but we will need to use an alternate ConnectionUri so that we can make EXO do the basic -> OAuth conversion:

.\Sync-ModernMailPublicFolders.ps1 -Credential $Credential -ConnectionUri 'https://outlook.office365.com/powershell-liveid?BasicAuthToOAuthConversion=true' -CsvSummaryFile:sync_summary.csv

The script should have executed without an issue. No need to re-enable basic authentication!

Exchange FIP-FS Scan Engine Update Issues: How to roll-back the update

UPDATE #3: There is the potential that after running Rollback-FIPFSEngine.ps1 that you may see the following error in the Application Event Logs:

Log Name:      Application
Source:        Microsoft-Filtering-FIPFS
Date:          01/04/2022 00:00:00
Event ID:      6027
Task Category: None
Level:         Error
Keywords:      
User:          NETWORK SERVICE
Computer:      exchange-server.domain.com
Description:
MS Filtering Engine Update process was unsuccessful to download the engine update for Microsoft from Custom Update Path.
Update Path:http://amupdatedl.microsoft.com/server/amupdate
UpdateVersion:0
Reason:"There was a catastrophic error while attempting to update the engine. Error: DownloadEngine failed and there are no further update paths available.Engine Id: 1 Engine Name: Microsoft"

If you are seeing the error above this could be an indication that there is an improper ACL on the ‘FIP-FS\Data\Engines’ directory. The Rollback-FIPFSEngine.ps1 script has been updated to set the proper ACL, but if you’ve previously run the script and need to resolve this issue you can run the following on your affected Exchange server(s) from an elevated PowerShell session:

$InstallFIPFSPath = "$($env:ExchangeInstallPath)FIP-FS"
$Sddl = 'O:SYG:SYD:PAI(A;OICI;FA;;;SY)(A;OICI;FA;;;LS)(A;OICI;FA;;;NS)(A;OICI;FA;;;BA)'
$NewSddl = Get-Acl -Path "$InstallFIPFSPath\Data\Engines"
$NewSddl.SetSecurityDescriptorSddlForm($Sddl)
Set-Acl -Path "$InstallFIPFSPath\Data\Engines" -AclObject $NewSddl

UPDATE #2: I have still been recommending this procedure since it doesn’t rely on deleting definitions and then waiting for the definition download from MS. With this workaround you are immediately in a fully working state.

If you have already used the Microsoft reset method on one or some server(s), you can use Rollback-FIPFSEngine.ps1 to copy definitions directly from the good server. This will speed up resolution time as the other servers will not have to wait for a definition download. Simply change the $BackupPathFIPFSPath variable to something like below and run on the other servers:

$BackupPathFIPFSPath = '\\GOODEXSERVER.domain.com\C$\Program Files\Microsoft\Exchange Server\V15\FIP-FS'

Note: Just make sure to re-enable auto-updates by running: Set-EngineUpdateCommonSettings -EnableUpdates $true

UPDATE: Microsoft has released a new engine update with a version number that resolves the issue. If you have already performed the fix above there is *no need* to perform the procedures in their article as you are already in a functional state. All you need to do is run Set-EngineUpdateCommonSettings -EnableUpdates $true to re-enable updates and your server(s) will download the latest update at the next check interval. Their script/procedure is for customers who are broken.


How to roll-back…

As almost anyone with an on-prem Exchange implementation probably knows there was an update released on 12/31/21 that caused email delivery issues globally. The core issue is the new version number is too large to fit in a long variable. Microsoft has acknowledged the issue, but the only current workaround is to disable the FIP-FS engine. The issue here is that certain transport rules also use this service, so if you have transport rules like these you may still have email delays even after disabling the engine. A better option would be to roll-back to the engine version that did not invoke the bug while Microsoft works this out.

I created a script (Rollback-FIPFSEngine.ps1) that makes the roll-back quick and easy. The one pre-requisite is that you need to restore your FIP-FS directory (ex. C:\Program Files\Microsoft\Exchange Server\V15\FIP-FS) from a restore. The restore should be from some point before the bad engine update. You can use the same restore for all Exchange servers assuming they are all running the same version of Exchange and have the same architecture (ex. amd64).

To preform the roll-back you must do the following:

  • Obtain a restore of the FIP-FS directory
  • Copy the Rollback-FIPFSEngine.ps1 script to the Exchange server(s) that need the rollback
  • Edit the $BackupPathFIPFSPath variable of the script to reflect the restored FIP-FS directory path
  • Open a new elevated (as Administrator) Exchange Management Shell PowerShell session on the server. Do not run this from a regular PowerShell session, it must be a Exchange session
  • Execute the script

The script should take a few minutes to execute and will do the following:

  • Re-enable FIP-FS. It doesn’t matter if you previously disabled FIP-FS using the Disable-Antimalwarescanning.ps1 script or Set-MalwareFilteringServer cmdlet. It will re-enable both
  • Globally disable FIP-FS engine updates (so that the problem updates do not return)
  • Stop necessary services (BITS, MSExchangeTransport, and MSExchangeAntispamUpdate)
  • Rename current engine files/directories
  • Copy engine files/directories from restore path
  • Start stopped services

After the script is run everything should be functioning normally. We’ve already run this in our environment and mail flow is now back to normal.

NOTE: After MS releases a fix you will need to re-enable updates by running: Set-EngineUpdateCommonSettings -EnableUpdates $true

FilteringServiceFailureException Error: Microsoft.Exchange.MessagingPolicies.Rules.FilteringServiceFailureException: FIPS text extraction failed with error: ‘WSM_Error: Scanning Process caught exception: (0x00000005) Access is denied

For some time we had been seeing these events in the event logs of our Exchange mailbox servers and the ‘UnifiedContent‘ directory (related to the Hub Transport role) has been growing:

Log Name:      Application
Source:        MSExchange Messaging Policies
Date:          10/26/2021 8:08:10 AM
Event ID:      4010
Task Category: Rules
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      mbx1.domain.com
Description:
Transport engine failed to evaluate condition due to Filtering Service error. The rule is configured to ignore errors. Details: 'Organization: '' Message ID '<1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>' Rule ID 'bcdf1c32-0249-4149-a91b-85ecabaeb695' Predicate '' Action ''. FilteringServiceFailureException Error: Microsoft.Exchange.MessagingPolicies.Rules.FilteringServiceFailureException: FIPS text extraction failed with error: 'WSM_Error: Scanning Process caught exception: 
Stream ID: <1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>
ScanID: {E44453FB-B127-44F8-BEF0-357252C6DAA3}
(0x00000005) Access is denied.  Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231'. See inner exception for details ---> Microsoft.Filtering.FilteringException: WSM_Error: Scanning Process caught exception: 
Stream ID: <1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>
ScanID: {E44453FB-B127-44F8-BEF0-357252C6DAA3}
(0x00000005) Access is denied.  Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231
   at Microsoft.Filtering.InteropUtils.ThrowPostScanErrorAsFilteringException(WSM_ReturnCode code, String message)
   at Microsoft.Filtering.FilteringService.EndScan(IAsyncResult ar)
   at Microsoft.Filtering.FipsDataStreamFilteringService.EndScan(IAsyncResult ar)
   at Microsoft.Exchange.MessagingPolicies.Rules.UnifiedContentServiceInvoker.TextExtractionComplete(IFipsDataStreamFilteringService textExtractionService, TextExtractionCompleteCallback textExtractionCompleteCallback, IAsyncResult asyncResult)
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.MessagingPolicies.Rules.UnifiedContentServiceInvoker.GetUnifiedContentResults(FilteringServiceInvokerRequest filteringServiceInvokerRequest)
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetUnifiedContentResults()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetAttachmentStreamIdentities()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.GetAttachmentInfos()
   at Microsoft.Exchange.MessagingPolicies.Rules.MailMessage.get_AttachmentNames()
   at Microsoft.Exchange.MessagingPolicies.Rules.MessageProperty.OnGetValue(RulesEvaluationContext baseContext)
   at Microsoft.Exchange.MessagingPolicies.Rules.Property.GetValue(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.TextMatchingPredicate.OnEvaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.PredicateCondition.Evaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.AndCondition.Evaluate(RulesEvaluationContext context)
   at Microsoft.Exchange.MessagingPolicies.Rules.RulesEvaluator.EvaluateCondition(Condition condition, RulesEvaluationContext evaluationContext)
   at Microsoft.Exchange.MessagingPolicies.Rules.TransportRulesEvaluator.EvaluateCondition(Condition condition, RulesEvaluationContext evaluationContext). Message-Id:<1ea41f5d-64ec-424a-b863-19d7fc2cf7d0@journal.report.generator>'

You may notice the ‘T:\TransportRoles\data‘ path above and this is due to the fact that we have our transport queue database path set to an alternate location. It is clear in the error that there is an access issue as it is is stating ‘(0x00000005) Access is denied. Failed to open file: T:\TransportRoles\data\Temp\UnifiedContent\8bedad9e-130a-490e-be7a-af8a58758231‘ as the core issue. Looking at the ‘Temp‘ directory ACL we saw the current permissions were:

  • LocalSystem – Full Control
  • Administrators – Full Control
  • NetworkService – Full Control

These permissions seem correct at face value, but when we look at the ACL of one of the files we actually found:

  • LocalSystem – Full Control
  • Administrators – Full Control
  • NetworkService – Full Control
  • LocalService – Full Control

If you look at a default Exchange installation you will also see the ACL above is how it is set. It seems that when using a non-default queue database location you are required to set the ACL yourself as it won’t be set automatically. After fixing the ACL we simply shut down the transport service, cleared the directory, and restarted the transport service:

Stop-Service MSExchangeTransport
Remove-Item -Path "T:\TransportRoles\data\Temp\UnifiedContent\*"
Start-Service MSExchangeTransport

After this change the ‘UnifiedContent‘ directories are no longer growing and the error we started with is no longer appearing in the event log.

Windows Autopilot with User-Driven Hybrid Azure AD Domain Join using Palo Alto GlobalProtect VPN: Part 2, using GlobalProtect PLAP with Basic Credentials

I recently had a call with another company attempting to setup Autopilot following my previous post (Windows Autopilot with User-Driven Hybrid Azure AD Domain Join using Palo Alto GlobalProtect VPN). While speaking to them I learned that are currently using basic credentials (LDAP+RADIUS) with GlobalProtect and are only attempting to setup certificate authentication to get Autopilot working. They were still planning on having the user perform two-factor basic authentication after the Autopilot-based deployment. This configuration was the perfect use-case for GlobalProtect’s new “Use Connect Before Logon” functionality. This functionality was introduced version 5.2 and works by registering a Pre-Login Access Provider (PLAP). With PLAP you now have interactive access to the GlobalProtect client at the logon screen. A huge plus with this method is that it requires NO back-end changes to your existing GlobalProtect configuration. The functionality is completely client-side and only really requires an additional step during installation. This PLAP functionality works with basic credentials, certificates, and even SAML! I will be using basic two factor credentials below.

The first step will be to create a new GlobalProtect package in Intune. I am using the newest version below, 5.2.7. You can use the same steps for creating the package that I laid out in my first post, but we will be using an alternate wrapper script, InstallGlobalProtect_PLAP.ps1. InstallGlobalProtect_PLAP.ps1, will install GlobalProtect, set our default GlobalProtect portal, and register the Pre-Login Access Provider (PLAP). Everything else non-certificate related in my original post will still apply (ex. IntuneHybridJoinHelperInstaller.ps1).

Once the machine has been deployed you will notice an extra button in the lower right. This is the PLAP.

When clicked, GlobalProtect will attempt to connect to the portal configured in the wrapper script and you will be presented with a screen like the one below. The prompts here will vary based on your authentication method. Here I am being prompted for my LDAP credentials to authenticate to the portal.

Once I passed the correct credentials here (and the correct second set of credentials at a second screen for two-factor) I was connected.

At this point you can click the ‘Back’ button and continue to log in to the device. That’s all there is to it! This is a great option for those of you who are lacking the desire to use certificates in your existing GlobalProtect configuration, but want to start using Autopilot.

TeamsFirewall – The open-source ethical firewall for Microsoft Teams

Some months back I was faced with an interesting challenge in Microsoft Teams. How do we prevent chats for users that have access to an external tenant? The background here is that certain industries like mine require you to capture and archive electronic communications. Depending on how you interpret the requirement this could include chats taking place in a tenant other than your own. An administrator naturally has zero access to the data in a tenant other than their own. You can use tenant restrictions to prevent a user (who you have some network or proxy control over) from accessing a tenant other than your own, but there are cases where a user needs this access for external collaboration. The challenge here was finding a way to prevent certain actions while working in another tenant… and so came TeamsFirewall.

I started analyzing Teams traffic in my free time and got a feel for how it works. I quickly realized that I could approach the issue with a scriptable proxy server that supported HTTPS. I chose mitmproxy for this task. In the first iteration of the product I broke traffic down to actions (like sending a message or deleting a message) and location (internal tenant or external tenant). After I had that working I wanted to expand the functionality to control actions based on not only location, but on other things like conversation type and participants. After more development I came up with a process that learns the environment by looking at action -> conversation -> participants. The system caches API tokens from the users it supervises to make requests on their behalf in order to learn what it does not know about the environment. The product does not need any credentials or direct access to a tenant to function. This data is then saved in the cache database (teamsfirewall_cache.db). The cache is used for lookups so that faster decisions can be made on the fly. Cache lifetime of both the user table and the conversation table can be configured via the config file.

An example of console output with core level logging set to debug

The rule engine of the product allows for extremely granular rulesets. You can get as granular as saying user A cannot edit messages sent to user B or as broad as user A/B/C cannot communicate with anyone @companyA.com. I would like to note that M365 Information Barriers perform some basic ethical wall functions, but it does not have much granularity and does not address examples like the the one above with external tenants.

Some next steps are improving the scalability of the product and developing an easily deployable package. I am currently looking at adding an option to use a central database for the cache database and using Docker containers with a load balancer to add more workers. You can see some of this upcoming work in the TODO file within the project repository (below).

All development and documentation will be maintained at the TeamsFirewall GitHub here: https://github.com/markdepalma/TeamsFirewall. Please feel free to post issues/questions and contribute!

Using an alternative address for Microsoft Bookings companies

UPDATE (11/2/22): I just noticed a parameter in OWA policies in EXO PowerShell the other day that allows you to set the default domain for Bookings calendars. I was able to set the default domain to bookings.company.com and it is taking effect for all newly created Bookings calendars. You need to run this for all OWA mailbox policies that apply to users who have access to create calendars. The command I used was:

Set-OwaMailboxPolicy -Identity OwaMailboxPolicy-Default -BookingsMailboxDomain bookings.company.com

During testing of Microsoft Bookings we found there was no way to adjust the sender email address for a Bookings company. This was frustrating as our default address was our default tenant address (which ended in onmicrosoft.com). This led me to ask how does this system even work? What is generating the emails? What hosts the email address?

Let’s start by looking for the object(s) hosting this service from an email perspective. To do this we can issue the command below in an Exchange Online PowerShell session. For each ‘company’ created in Bookings there is an associated mailbox created with the ‘RecipientTypeDetails‘ of ‘SchedulingMailbox‘. These mailboxes are not visible in the Exchange Online admin console.

Get-Mailbox -RecipientTypeDetails SchedulingMailbox

When the company is created, the user(s) with permissions to manage are added with ‘FullAccess‘ to the mailbox permissions. This is how Bookings company permissions are managed.

The mailbox is also configured to forward to the email address configured under ‘Business Information’ -> ‘Send customer replies to’.

There is also a Azure AD user account created for each company.

To avoid email address conflicts we decided we would create a new sub-domain for Bookings mailboxes called bookings.domain.com. To do this you would use the Microsoft 365 admin center to add a new domain. After this domain is added and proofed it can be used in Azure AD/O365. Make sure you configure on-premises Exchange routing for this new sub-domain if you are in a hybrid configuration. You can use the instructions under ‘Configure a group domain’ here to setup this routing.

Now that we have the new sub-domain we can use this in our Bookings company mailbox. The easiest way to do this is to adjust the UPN of the user account associated with the mailbox. This will automatically update the address.

After the Exchange Online directory syncs up with Azure AD the email address should be updated.

Now we need to unpublish and re-publish the company booking page so that it starts using the new address.

After re-publishing the page we need to wait a little bit for it to pick up the new email address. Eventually Bookings will start sending emails from the new email address. In testing it took around 10-15 minutes to take effect.

One more thing to note is that Bookings does not give you a method to delete companies after creating them. If you ever need to delete the company you simply need to delete the mailbox/user using the Remove-Mailbox cmdlet.

Windows Autopilot with User-Driven Hybrid Azure AD Domain Join using Palo Alto GlobalProtect VPN

UPDATE: Please also see part two on implementing Autopilot/GlobalProtect without certificates here: https://blog.markdepalma.com/?p=763

Back in April, at the beginning of the pandemic, I started putting a lot of focus into getting Windows Autopilot to work with Hybrid Join clients and Microsoft Always On VPN. I was looking at both for different reasons but also looking at them as a combined solution. The issue with Autopilot was that technically you were still required to have line of sight to a domain controller even though the domain join happened via an offline blob using the on-prem Intune connector. Sifting through logs I could see the only thing holding back a successful enrollment was a little function at the end of enrollment that was simply looking for a domain controller. I was able to sometimes get an enrollment to work via device tunnel MS VPN policies, but success wasn’t consistent and relied on policies/certificates coming down in a timely manner. In the logging I also saw references to a configuration parameter that would disable the DC check. Soon after, I found a post from Microsoft saying that they had this setting in private beta and would be releasing it in the coming months. After this I decided to put everything on the backburner and abandon MS VPN (I found the MS VPN solution using RRAS to be clunky and inconsistent with a lot to be desired).

Fast forward a few months and Microsoft finally released the new ‘functionality‘. At its core it is really just a flag telling OOBE not to perform a DC connectivity check. After enrollment is completed you are on your own to establish pre-login connectivity to facilitate an initial logon to your domain as there are no cached credentials yet on the machine. We are already a Palo Alto GlobalProtect customer and have been happy with the solution, so getting the two to work together just made sense. At the same time there has also been a push to implement a proper Always On VPN configuration. I’ll be writing a post dedicated to the full technical and security architecture around a cert-based Palo Alto Always On VPN configuration, so I’ll only briefly touch on the relevant parts here. Please refer to Palo Alto documentation on the missing pieces. There are a number of security aspects that should be taken into account like revocation, key storage, etc., and you should already have a proper certificate authority. I am not going over those in this article. You should also already have configured your Autopilot profiles, Intune Connector for Active Directory, etc. as per this document.

In this configuration I use a certificate-only approach (only using certificate profiles and no other authentication methods) for both the portal and the gateway. Remember, our first GlobalProtect connection after an Autopilot enrollment will be a pre-logon connection via certificate. There are no other authentication methods available for this first connection and the portal -> gateway authentication flow needs to support this. Before configuration of the portal and gateway you need to configure zones, interfaces, policies, and a certificate profile. These steps are documented here (steps 1-3 and 5-6).

Below is the portal config. Notice how there are no client authentication methods present. If you were to add any method here, it would be layered on top of the certificate authentication and would prevent the pre-logon connection. PAN-OS 9.0 has implemented mixed authentication support so that you can implement an either/or type of configuration here. The other important note is that your user connection (post-logon) will be connecting to the same portal/gateway and because of this we will be using certificates for the user as well.

The next step is to configure the agent settings within the portal config. Our config is being configured as Always On, but this is not technically required for Autopilot to work. If you do not want an Always On user connection, set ‘Connect Method‘ to ‘Pre-logon then On-Demand‘. Some Palo-Alto documents mention using multiple agent configurations for pre-logon and post-logon that use different connect methods, but this is not necessary here (and will not always work as expected due to the order of operations). The other important thing is to set ‘Client Certificate Store Lookup‘ to ‘User and Machine‘ so that the client will be able to use user and device certificate. The client seems to do a good job at using the proper certificate depending on if the connection is pre-logon or post-logon.

After the portal you will configure the gateway. Authentication will be identical to the portal to allow for a seamless authentication flow.

To deliver a device certificate to the device we will use an Intune PKCS certificate profile. I won’t go into great detail here as Microsoft has done a good job of documenting the steps involved. The profile will need to be assigned to the device properly, and the easiest way to do this is by using an Azure AD dynamic group. I am using a custom EKU value in the screenshot below (Extended Key Usage) which you do not need to replicate. I had a specific reason for doing this.

You will also need to deliver a user certificate to the device. You can use another Intune PKCS certificate profile to do this or you can use GPO/User Certificate Autoenrollment. I chose the latter because I like the granular control it provides. If you use an Intune profile, but target the machine, every user that logs on to the machine will get a certificate and VPN access. If I target the user, every Intune-managed machine they log on to will get a user certificate. Both of these were undesirable. With autoenrollment I’m only enabling autoenrollment for the computers I want (using a user policy and loopback processing), and I’m controlling the users that can enroll/autoenroll via the ACL on the certificate template itself.

To get the GlobalProtect client deployed to our Autopilot device we will be using Intune to deploy it via a ‘Windows app (Win32)’ deployment. We need the Microsoft-Win32-Content-Prep-Tool utility, the GlobalProtect MSI (I am using version 5.1.5 at this time), and two wrapper scripts to complete the package.

The first wrapper script is InstallGlobalProtect.ps1. This is the one responsible for installing the MSI and pre-configuring some registry values. I thought this part would be very straightforward, but I had trouble getting the pre-logon credential provider to kick in initially when installing the client via Intune. If I manually installed the client, it worked the first time without an issue. It was only acting this way when being deployed via Intune during Autopilot. After a few hours of procmon traces and some reverse engineering of the client I figured out the issue. There is a post-setup process that runs that doesn’t process some registry changes correctly until the client is executed in the context of a user at least once. I use the wrapper to stage these two registry values (LogonFlag + LogonState) along with the others needed to make this configuration work. I also enable User-initiated Pre-Logon (via the ShowPrelogonButton value), so it gives the user a chance to verify they have internet connectivity and so that they can perform a retry of the pre-logon connection on demand. I later turn this off via GPO making pre-logon completely automatic after the first successful login. This value is totally optional.

The next wrapper script is a batch script that launches the script above. It is called InstallGlobalProtect.cmd. This script is needed because Intune will launch the installer in 32-bit mode and we want everything kicking off in 64-bit mode (mainly for the registry work above).

Once the three files are ready, we can create our package which generates a .intunewin file.

IntuneWinAppUtil.exe -c C:\Temp\GlobalProtectPackage\Install -s GlobalProtect64-5.1.5.msi -o C:\Temp\GlobalProtectPackage\Output

We then take this file and upload it to Intune to create our application. The application will need to be assigned to a group. Again, I am using a dynamic group that targets my Autopilot devices. As you can see below some of the MSI info is pulled in automatically because it was read by the Microsoft-Win32-Content-Prep-Tool utility. We must also change the ‘Install command‘ to point to the batch file we created earlier.

In testing I came across multiple issues due to machine GPO not being applied before the first login. One of these was loopback processing not applying which caused multiple user GPOs not to apply. Others were trusted root certs not installing (used for things like SSL decryption) and User Certificate Autoenrollment not working (I touched on this earlier). The trusted root issue actually caused my hybrid join to get stuck (SSL decryption is being used here). I decided to create IntuneHybridJoinHelperInstaller.ps1 to solve all of this.

IntuneHybridJoinHelperInstaller.ps1 does the following:

  • Creates a script directory (C:\Scripts)
  • Modifies the SACL of the directory to remove modify access from ‘Authenticated Users(someone could use this directory to execute malicious code in the context of ‘LOCAL SYSTEM‘ if you do not do this)
  • Create a script in the directory above called IntuneHybridJoinHelper.ps1 with an accompanying scheduled task that executes at any user logon in the context of ‘LOCAL SYSTEM

At the next logon, this newly deployed script is triggered by the scheduled task, checking to see if the computer group policy cache has ever been provisioned (has ever received computer policies) and if not it will do the following:

  • Perform a gpupdate for computer policies
  • Get the interactive logged on user
  • Create a task to run gpupdate as the currently logged on user which will perform a gupdate of their user policies
  • Re-run ‘Automatic-Device-Join‘ task to complete the device registration in case it failed at logon

It is best to deploy this as a Win32 app, like the GlobalProtect client, so that we can ensure it is on the machine before the first logon. Like GlobalProtect, we are using a batch wrapper (IntuneHybridJoinHelperInstaller.cmd) to launch the PowerShell script as a 64-bit process. I used the same dynamic group that I used for the GlobalProtect client as the target here. I also used a dummy uninstall command since we never need to ever uninstall this. For install detection I am just using the script path (C:\Scripts\IntuneHybridJoinHelperInstaller.ps1). We will build our package using the utility like we did for GlobalProtect.

IntuneWinAppUtil.exe -c C:\Temp\IntuneHybridJoinHelperInstaller\Install -s IntuneHybridJoinHelperInstaller.ps1 -o C:\Temp\IntuneHybridJoinHelperInstaller\Output

Now that we have everything in place we can test an enrollment. If everything is configured properly, you’ll be asked to sign-in to your corporate environment right after establishing network connectivity. After everything completes you should wind up at a logon screen. Because I am using User-initiated Pre-Logon I will need to switch to the GlobalProtect logon provider, click ‘Start GlobalProtect Connection’, and wait for the status to change to ‘Connected’.

After logging on you are presented with the User ESP (Enrollment Status Page). This is when our helper script kicks in to resolve GPO issues and moves our device registration along. This process can take a bit because after the ‘Automatic-Device-Join‘ completes you still have to wait for the on-prem computer object to sync up to Azure AD via AD Connect. Steve Prentice came up with a little script to help speed this up called SyncNewAutoPilotComputersandUsersToAAD.ps1. It just forces an AD Connect sync after computer object has its ‘userCertificate‘ attribute populated.

Once this is completed you should be left at a functioning desktop and GlobalProtect should have switched over to a full tunnel using the user certificate. At this point I would be using my primary endpoint management product, Ivanti Endpoint Manager, to perform any additional application installs/configurations. I have its agent being deployed via Win32 app as part of my Autopilot process.

Properly securing your on-prem Exchange 2016 environment when using Hybrid Modern Authentication

In the past many organizations completely blocked or limited external access to on-premises Exchange servers because of the lack of multi-factor authentication. Protocols like OutlookAnywhere (also known as RPC-over-HTTP, now MAPI-over-HTTP) and EWS had no native methods to accomplish multi-factor authentication. Failure to protect these protocols from external exposure has led to many breaches like FIN4 and London Blue.

HMA to the rescue… In 2017 Microsoft finally answered this deficiency with Hybrid Modern Authentication. I briefly touched on modern authentication in two previous articles (here and here). With Hybrid Modern Authentication Microsoft gave you the ability to use new technologies like modern authentication and conditional access for on-premises Exchange. Clients will connect using modern authentication by default once Exchange is on a supported version, supported clients are implemented, and the configuration is implemented. The issue here is that legacy Windows authentication is still available. You can simply disable modern authentication in the client or use a different client and you are now connected to on-premises Exchange with a simple username and password completely bypassing conditional access. Conditional access is only invoked when you are authenticating with modern authentication. Exchange 2019 implemented Authentication Policies which allow you turn off legacy authentication methods. If you are using Exchange 2019, you can use these to lock down your environment.

We were in the situation where we wanted to allow secure external access to Exchange (mainly for OutlookAnywhere, but also Outlook Mobile), but we couldn’t have any legacy authentication exposure. The solution we came up with was creating a set of externally facing Exchange 2016 mailbox servers (think Client Access Servers from the pre-Ex2016 days) that have all legacy authentication methods disabled (only OAuth available). These servers are the only ones exposed to the internet. The protocols we want to expose but lock down are ActiveSync (needed for Outlook Mobile), EWS (Exchange Web Services), MAPI, and OAB (Offline Address Book). To lock these down we ran the following against the externally facing servers:

$Servers = @(Get-MailboxServer excas01)
$Servers = $Servers + (Get-MailboxServer excas02)
$Servers | Get-ActiveSyncVirtualDirectory | Set-ActiveSyncVirtualDirectory -BasicAuthEnabled $false
$Servers | Get-WebServicesVirtualDirectory | Set-WebServicesVirtualDirectory -WindowsAuthentication $false
$Servers | Get-MapiVirtualDirectory | Set-MapiVirtualDirectory -IISAuthenticationMethods @('OAuth')
$Servers | Get-OabVirtualDirectory | Set-OabVirtualDirectory -WindowsAuthentication $false

After this is completed, Windows and basic authentication should now fail for these virtual directories.

IMPORTANT: It is VERY important to regularly check that these settings are still in place. You should always re-run these commands after any kind of Exchange update. If you do not do this, you could inadvertently expose your Exchange environment. A simple script could be run on a schedule to check and report on any changes to the authentication configuration of these virtual directories.

The second step is disabling or blocking the other virtual directories that do not need to be accessed externally. For us, these were ECP, OWA, PowerShell, and RPC. We have an on-premises load balancer with SSL bridging configured for our Exchange environment, so we used that to block access to these virtual directories. Another option is to use IP restrictions in IIS on these virtual directories. A third option is to disable the virtual directories via PowerShell. For those of you who want to allow secure access to OWA (Outlook Web Access) you can use Azure App Proxy to accomplish this or an ADC like NetScaler or F5 Big-IP.

The final step in this configuration is allowing the O365 servers to reach an unaltered version of EWS for the IntraOrganizationConnector used for Exchange Online to pull free/busy data (and other data like photos) from your on-premises environment. I found that for some reason the IntraOrganizationConnector fails to authenticate from EXO->on-premises when it uses the modified virtual directory even though all OAuth tests pass. I also use this configuration for my MRS endpoint when doing mailbox migrations since MRS wants to do traditional Windows authentication to EWS. If you are using the Microsoft Hybrid Agent, you shouldn’t have to do this since Azure App Proxy is taking care of the MRS and free/busy communication. I have still have an ongoing ticket open with Microsoft to understand the root cause of this. The workaround is fairly simple:

  • Create a namespace that can be used for EXO->on-premises communications. (Ex. exocomm.domain.com)
  • Configure this namespace to point to your regular INTERNAL and unaltered mailbox servers
  • Lock down this namespace in your firewall, so that ONLY Microsoft O365 servers can reach it. NOTE: This is very important and failure to do so will undermine all of the work done above and leave you exposed. We use a combination of PaloAlto firewalls and MineMeld to accomplish this, but this can be accomplished with a static/maintained ACL as well.
  • Configure the IntraOrganizationConnector in EXO to not use Autodiscover and to use this new namespace as its endpoint with the following commands:
Get-IntraOrganizationConnector | Set-IntraOrganizationConnector -TargetSharingEpr "https://exocomm.domain.com/ews/Exchange.asmx"
Get-IntraOrganizationConnector | Set-IntraOrganizationConnector -DiscoveryEndpoint $null

Outlook with ADAL + Hybrid Modern Authentication causing a white box and AADSTS500011 / 500011 errors in Azure AD

We are in the process of selectively turning on ADAL for Outlook clients. We have already gone through enabling Hybrid Modern Authentication for Exchange (https://docs.microsoft.com/en-us/exchange/configure-oauth-authentication-between-exchange-and-exchange-online-organizations-exchange-2013-help) a while back. We recently ran into an issue where specific users were getting a white box about a minute after launching Outlook. I have seen this issue where all of Outlook freezes, but this was not the same. They receive this error while Outlook continues to run in the background. The error is also accompanied by an Azure AD sign-in failure for the user. The error received is 500011. When looking this up in the documentation (https://login.microsoftonline.com/error?code=500011) you can see it is referring to the error ‘The resource principal named {name} was not found in the tenant named {tenant}‘.

I decided to do a Fiddler trace to get to the bottom of this and this is where the issue started becoming clearer. In the trace you see Outlook reaching out to autodiscover.domainname.com (which is on-prem), getting a 401 response, reaching out to login.windows.net/login.microsoftonline.com, and looping in this manner. This part of the capture aligned exactly with the mysterious white box.

In my case this specific set of users had a different primary SMTP address (and UPN) than the other users we had already enabled ADAL for and their autodiscover.domain.com URL was never added to our Azure AD service principals for the ‘Office 365 Exchange Online‘ application ID. Microsoft documentation talks about this in Step 5 of the link I added at the beginning of this post. Using the ‘MSOnline‘ PowerShell module I was able to add the URL to the service principal list.

$x = Get-MsolServicePrincipal -AppPrincipalId 00000002-0000-0ff1-ce00-000000000000
$x.ServicePrincipalnames.Add("https://autodiscover.domain.com/")
Set-MSOLServicePrincipal -AppPrincipalId 00000002-0000-0ff1-ce00-000000000000 -ServicePrincipalNames $x.ServicePrincipalNames

After adding the principal there were no more instances of the white box.