Building an Exchange Server CAL Inventory Connector for Snow License Manager

Standard vs Enterprise — Per Mailbox

An enterprise telecom client was running Microsoft Exchange Server across a four-server Database Availability Group (DAG) cluster. Thousands of mailboxes, multiple regions, a complex organizational structure. The licensing question they couldn't answer: which users Microsoft's licensing model classifies as Standard, and which as Enterprise?

Microsoft Exchange CAL licensing splits into two tiers. Standard covers basic email. Enterprise kicks in when specific features are enabled on a mailbox — in-place archiving, custom ActiveSync policies, Unified Messaging. The distinction matters: Enterprise CALs cost significantly more than Standard, and during an audit, Microsoft will check whether your CAL assignments match actual feature usage.

Manual assessment means opening each mailbox in the Exchange admin console, checking which features are enabled, cross-referencing against license assignments, and repeating that across thousands of users on four servers. Nobody was doing this. Nobody could. The scale made manual assessment impractical, and the SAM team had no automated way to determine which CAL tier each user actually required.

Snow License Manager had no native Exchange CAL connector. The feature-level data that determines CAL requirements lives inside Exchange, not in inventory scans. I needed to extract it, transform it, and deliver it to Snow in a format the SAM team could use for compliance reporting.

But before I could build that pipeline, I needed to understand the data.

Research First: Power BI as a Prototyping Tool

Before writing a single line of PowerShell automation, I built a Power BI report. This wasn't a deliverable — it was a research tool.

The core question was deceptively complex: how do you determine which CAL tier a user actually needs? Exchange has its own license assignment data. Active Directory has assigned licenses. The mailbox feature flags tell you what should trigger an Enterprise CAL. These three data sources don't always agree, and I needed to understand the gaps before I could automate anything.

I pulled the raw data — mailbox attributes, Exchange license assignments, AD license records — into Power BI and started reconciling. The report let me visualize the relationships: which users had Enterprise features enabled but only Standard CALs assigned, which mailbox types existed across the environment, how organizational units mapped to license distributions, and where the discrepancies lived.

This analysis revealed the data structure I'd need to capture. It showed me which attributes actually mattered for CAL determination (archive GUID, ActiveSync policy, Unified Messaging) versus which ones were noise. It exposed the reconciliation logic between what Exchange reports and what AD assigns. And it gave me confidence in the architecture before I started building the automated pipeline.

The Power BI report also served as an immediate deliverable for the SAM team — they could see the compliance landscape while I built the long-term automation. Research that produces a working analytical tool along the way isn't overhead; it's the kind of upfront investment that prevents rework downstream.

The Architecture: PowerShell to XML to Python to CSV to Snow

With the data structure mapped out through the Power BI analysis, I built a multi-stage ETL pipeline with clear separation between collection, transformation, and delivery.

Stage 1 — PowerShell Collection on Each Exchange Server

The collection script runs inside the Exchange Management Shell directly on each server in the DAG. It extracts every mailbox recipient with the attributes that determine CAL requirements: archive status, ActiveSync policy customization, Unified Messaging enablement, last logon timestamp, mailbox size, and the full organizational unit path.

Archive detection is more subtle than it sounds. Exchange stores archive status as a GUID. A null GUID — all zeros — means no archive is configured. Any other GUID means archiving is enabled, and that alone triggers an Enterprise CAL requirement. The script checks each mailbox's archive GUID against the null value.

Separately, the script enumerates Standard and Enterprise CAL assignments using Exchange's built-in license cmdlets. This is Exchange's own assessment of which tier each user belongs to — an independent data point alongside the feature flags.

The output per server: a timestamped XML file containing all recipient data, and a separate text file listing CAL assignments.

Stage 2 — Python XML-to-CSV Converter

A Python script reads all XML files from the collection output across all four servers. It discovers attributes dynamically — recipients can have varying fields depending on which server they live on and which features are configured. The converter collects all possible column names across every XML file, sorts them alphabetically for a consistent schema, and writes two CSVs: one for recipient details, one for CAL assignment mappings.

Dynamic schema discovery was essential. Hardcoding column expectations would have meant the converter breaks every time a new attribute appears on one server but not others. Instead, the converter adapts to whatever the collection scripts produce.

Stage 3 — Automated Transfer to Snow

A batch script delivers the CSVs to Snow License Manager's import staging area. Snow ingests the data, making it available for compliance reporting alongside every other software product the SAM team manages.

Four Versions to Get It Right

The collection script evolved through four versions. Each one was a response to a real problem encountered on the DAG cluster.

v1.1 — The Baseline

The first version extracted recipients and their compliance-relevant attributes to XML. Core data flowing, basic structure working. But license assignment data — which users Exchange itself classifies as Standard vs Enterprise — wasn't captured yet. The feature flags told you what a user should need; the license assignments tell you what Exchange thinks they need. Both are necessary for defensible compliance.

v1.2 — Adding License Extraction

I added calls to Exchange's license enumeration cmdlets, pulling both Standard and Enterprise CAL user lists. This version also introduced comprehensive logging — essential when you're debugging collection issues across four servers and need to know exactly where a failure occurred and why.

v1.3 — Separating Concerns

I tried embedding the license data directly into the recipient XML. It didn't fit cleanly. License assignments have a different structure than recipient attributes — a user can hold multiple license types, and the metadata includes product edition, tabulation method, and unit labels. Cramming this into the recipient XML schema created parsing complexity downstream. v1.3 started pulling the license data out of the XML.

v1.4 — The Production Version

The breakthrough: license data belongs in its own file. v1.4 writes recipient data to XML and license assignments to a standalone text file per server. This clean separation made the Python converter simpler and the entire pipeline more reliable — each data stream has its own format optimized for its content.

v1.4 also added Exchange version detection via the Windows registry. The script reads the installed Exchange version and loads the correct management snap-in. This matters in environments that run mixed Exchange versions during migrations — loading the wrong snap-in produces silent failures or incomplete data.

The path from v1.1 to v1.4 is a pattern I see in every ETL project: the first version tries to be clever with a single output format, and production reality teaches you to separate your data streams.

The Edge Cases

OU hierarchy parsing. Organizational unit paths in Active Directory can be deeply nested — region, department, sub-department, team. The script parses the full OU path into indexed attributes, giving the SAM team the ability to filter and group compliance data by organizational structure.

Mailbox statistics warnings. Some mailboxes throw warnings when you query their statistics — disconnected mailboxes, corrupted stores, mailboxes in transit during DAG rebalancing. The script suppresses these warnings to avoid halting the collection mid-run. A warning on one mailbox shouldn't prevent the other thousands from being processed.

DAG cluster awareness. The script detects whether it's running in a DAG environment via registry lookup, and captures the cluster name for reporting context. Each server in the DAG runs the collection independently — no single point of failure, and the Python converter merges the results.

Outcomes

The connector delivers automated Exchange CAL compliance inventory across the entire four-server DAG cluster. Thousands of mailboxes are assessed with defensible audit data showing exactly which users need Standard CALs, which need Enterprise, and which specific features triggered the Enterprise requirement.

Manual mailbox-by-mailbox assessment — which nobody was doing because the scale made it impractical — is replaced with automated collection and transformation. The data flows into Snow License Manager for compliance reporting alongside all other software, giving the SAM team a single pane of glass for their entire Microsoft licensing position.

Each Exchange server runs the collection independently. The Python converter merges the outputs. No single point of failure in the pipeline.

What I Learned

Separate your data streams. Trying to embed license data into the recipient XML created coupling that made both harder to process. Splitting into XML for recipients and a text file for license assignments was the v1.4 breakthrough that simplified everything downstream. When two data types have different structures, give them different files.

Exchange knows its own CAL assignments. The Get-ExchangeServerAccessLicenseUser cmdlet is Exchange's own opinion of which CAL tier each user needs. Using this alongside the feature flags — archiving, ActiveSync, UM — gives you two independent data points for compliance. If they disagree, that's a finding worth investigating.

Version-aware scripts are worth the complexity. Exchange environments often run mixed versions during migrations. Registry-based version detection adds lines to the script, but it prevents silent failures when the wrong management snap-in loads. A few extra lines of detection logic is cheaper than debugging a collection that ran but produced incomplete data.

Research before automation pays for itself twice. The Power BI report took additional effort upfront, but it delivered value in two ways: the SAM team got immediate analytical capability while I built the long-term solution, and I understood the data well enough to design the pipeline correctly on the first pass. Every architectural decision in the automated connector — which attributes to extract, how to reconcile license sources, what the output schema should look like — was informed by that research. Skipping it would have meant discovering these requirements during automation, which is slower and more expensive.

Four versions isn't scope creep — it's discovery. Each version fixed a real production issue on the DAG cluster. The evolution was driven by the environment's reality, not by changing requirements. In ETL work, the first version gets the shape right; the production versions get the details right.