When Manual Classification Stops Scaling
The first Java compliance engagement I worked on was largely manual. Pull the Flexera evidence, walk through it row by row with a licensing specialist, classify each installation, write the report. It worked — but only because the effort was concentrated into a single deliverable and the customer didn't expect to repeat it.
That model breaks the moment the dataset gets large or the customer wants ongoing visibility.
A typical enterprise Java footprint is tens of thousands of evidence records spread across thousands of machines. Manual review at that scale is slow and error-prone. Worse, even after a clean classification today, the result decays — new Oracle JDK installs appear, NFTC windows expire, version updates shift binaries across boundaries — and within months the customer is back to "we don't know what we have."
This article is about the engine I built so manual classification doesn't have to happen twice.
The First-Pass Engine: Why "All Oracle = RED" Is Wrong
The earliest version of the classification engine took a blunt approach: anything Flexera recognised as an Oracle-published Java title got classified as RED. Anything matching one of the major free distributions (Temurin, Zulu, Liberica, Corretto) got classified as GREEN. Everything else fell to YELLOW by default.
This is the classification you get when you treat Oracle Java as a single entity. It is also wrong in three specific ways:
- Java SE 8 updates 1–202 are free under the Binary Code License, but the crude rule classifies them as RED alongside 8u211+ OTN installations. The customer is shown exposure they don't actually have.
- Embedded Java in Oracle Database, Oracle Client, and Oracle Hyperion is covered by the host-product licence, but the crude rule treats it as standalone Oracle Java. Legitimately-licensed installations get flagged RED.
- Java SE 17 and 21 sit on different sides of NFTC expiry. A timeless "all Oracle = RED" rule misses both the retroactive flip of Java 17 (NFTC expired September 2024 — every Java 17 install became licensable) and the planning horizon for Java 21 (NFTC expiring September 2026).
The result is a wildly inflated RED count. A customer staring at a four-figure RED list draws very different conclusions than one staring at a curated list of installations that genuinely require licensing. The negotiation posture, the migration urgency, and the financial story all change.
The engine had to get smarter.
A Version-Aware Classification Pipeline
The current pipeline treats each evidence record as a sequence of decisions, not a single keyword match. Order matters:
- Recognition rules first. If the file path matches a known pattern (Oracle DB Client, Hyperion, MATLAB, SAP, hardware-management agents), the rule decides the classification. This whitelists embedded Java in one pass.
- Version parsing. For everything else, parse the major version and update level out of the recognised title and any auxiliary version data. This is what unlocks BCL/OTN/NFTC logic downstream.
- Known free distributions → GREEN. Eclipse Temurin, Azul Zulu Community, BellSoft Liberica, IBM Semeru, Amazon Corretto, Microsoft OpenJDK, Red Hat OpenJDK, Oracle OpenJDK from jdk.java.net, GraalVM Community, SapMachine, Dragonwell. Each is identified by recognition title and, where helpful, by path.
- Sun Microsystems → RED. Legacy Java predating the Oracle acquisition is still Oracle property. Explicit handling avoids it falling through to YELLOW by default.
- Version-aware Oracle classification. This is the core licensing logic, encoded as a sequence of checks: Java 8 update ≤202 → YELLOW (BCL); Java 8 update ≥211 → RED (OTN); Java 11 commercial → RED; Java 17 → RED post-NFTC; Java 21 → YELLOW (NFTC active until September 2026).
- Default → YELLOW for everything else, with a populated BreachReason explaining why human review is needed.
- GLAS flag escalation. If Oracle's GLAS agent reports active commercial features on a machine that didn't already match a recognition rule, escalate to RED regardless of version.
- Computer-level risk rollup. Aggregate per-installation risk to a per-computer minimum-risk view so dashboards and migration plans can pivot at the host level.
Each step writes back enrichment data: a risk level (Critical / High / Medium / Low / None), a licence type (OTN / BCL / NFTC / GPL+CPE), a human-readable breach reason, and a recommended action. Every classification decision is auditable end-to-end.
The licensing rules behind every step — what counts as commercial, where the version boundaries fall, which embedded scenarios are covered — were defined by dedicated Oracle licensing specialists. My job was to translate their decisions into deterministic SQL that runs identically every time, against any data snapshot, with full traceability.
Generic Rules Beat Machine-Specific Rules
The first iteration of recognition rules included a lot of machine-specific patterns — F:\oracle\121cclient\%, D:\oracle64\product\11.2.0\client_1\%, and so on. These rules worked, but only on the exact machines they were written for. Every new server, every renamed drive, every reorganised installation directory broke them.
The fix was to rewrite them as generic patterns: %\Oracle\product\%\client%, %\oracle64\product\%\client%, /%/bin/java%/19c/%. One generic rule replaces a whole bucket of machine-specific ones, and it survives infrastructure changes the original rules wouldn't have.
Two related disciplines came out of that rewrite:
- Path separators prevent over-matching. A pattern like
%19c%will happily match unrelated paths that contain "19c" as a substring. Anchoring it as%\19c\%(Windows) or%/19c/%(Linux) eliminates false positives. - Every rule has a documented rationale. A rule with no description is impossible to audit later — by me, by the licensing specialist, or by the customer's compliance team. Each rule's description spells out what it matches and why the resulting classification is correct.
The Python Validation Script
Running the classification pipeline in two independent implementations turned out to be one of the better decisions on this project. Alongside the SQL engine, I maintain a Python script that loads the same recognition rules, applies the same version-aware logic, and produces the same classification output.
Two implementations that should agree, on the same data, catch each other's bugs:
- A SQL bug that classifies a rare update level wrong is caught when Python disagrees.
- A pandas operation that drops rows silently is caught when SQL has more.
- A new edge case found in production data can be tested in Python first, then promoted to SQL once it is verified.
The Python script also doubles as the engine I run on exported datasets when the SQL pipeline isn't deployed yet — the customer can be on the data-extraction phase only, and I still produce a comparable compliance report.
GLAS Flag Escalation
Oracle's GLAS (Global License Advisory Services) agent isn't deployed everywhere, but where it is, it surfaces the single highest-confidence indicator of an Oracle licensing obligation: active use of commercial features.
Two GLAS flags drive the escalation logic:
UnlockCommercialFeatures— Java Flight Recorder commercial mode, Application Class-Data Sharing, and other Oracle-specific commercial features are actively enabled on this machine.AMCAgent— Oracle's Advanced Management Console agent is running.
When either flag is active and the installation didn't already match an embedded-Java recognition rule, the classification escalates to RED (Critical). The guard against rule-matched installations matters: if a recognition rule has already correctly identified the install as embedded in Oracle DB or covered by another product's licence, we don't want a stray GLAS flag to override that. Escalation only fires when the version-based logic was about to make the call.
GLAS data is also the most reliable signal for migration feasibility. Applications that don't use commercial features migrate cleanly to OpenJDK; applications that do, won't. When GLAS data exists, the migration candidate list practically writes itself.
Continuous Monitoring with ToolsHub24
A one-time classification produces a snapshot. The customer needs a stream.
I deployed the engine into ToolsHub24 so the same classification pipeline runs on a recurring schedule, weekly by default. Each run produces a fresh snapshot of every Java installation in the environment, classified RED/YELLOW/GREEN, with full enrichment intact.
The ToolsHub24 Java module surfaces this through dashboards built for the customer's compliance team:
- Risk overview with per-computer RED/YELLOW/GREEN counts, drillable down to specific installations
- Historical trends showing how the RED/YELLOW/GREEN distribution moves over time across snapshots — useful for tracking remediation progress
- Vendor breakdown showing the share of Temurin, Zulu, Oracle, OpenJDK, and others — both numerically and visually
- Cost projection combining the current RED count with the customer's employee headcount and Oracle list pricing to show annual exposure
- Filters by measurement date, exemption status, recognised yes/no, computer name, business unit, and risk category
The module is configurable for the customer's organisation: data centres, divisions, legal entities. The vendor pattern library can be extended for environments with unusual JDK distributions.
Catching New Oracle Java Post-Remediation
Most Java compliance engagements end with a remediation push — Oracle JDK installs are replaced with OpenJDK builds, Java 17 is upgraded or downgraded, embedded Java is documented. The RED count drops. Everyone is happy.
Without continuous monitoring, that is also where the trouble starts. Three months later, somebody in a different division installs Oracle JDK on a new server because that is what they have always done. Six months later, a developer downloads Oracle Java 17 for a proof-of-concept. Twelve months later, the original problem is back, undetected.
With weekly snapshots feeding the same classification pipeline, every new Oracle Java installation lights up as RED in the next dashboard refresh. The compliance team sees it within days, not months — and the remediation conversation happens before audit pressure builds.
Outcomes
After the rules engine and continuous monitoring went live:
- RED installations decreased 44%. Embedded Java is now correctly classified as GREEN. Java 8 BCL installations are correctly classified as YELLOW rather than RED. The customer's licensable footprint shrank to what was actually licensable.
- Computer-level risk dropped 50%. Hosts that previously appeared in the RED bucket because of a single misclassified install — usually embedded — moved into YELLOW or GREEN.
- Ongoing compliance monitoring replaces periodic manual audits. New Oracle Java installations are flagged within days, not months.
Lessons Learned
Encode the rules, not the answers. Oracle changes Java licensing on a calendar — NFTC windows expire, new versions ship, pricing tiers shift. A pipeline built around version-aware rules adapts to those changes by editing logic; a pipeline built around hardcoded RED/GREEN lists has to be rewritten.
Two implementations beat one. Running SQL and Python against the same data finds bugs that neither would catch alone. The Python script also lets me deliver value before the full SQL pipeline is deployed.
Generic rules outlive their authors. Machine-specific patterns are tempting because they are easy to write from a single example, but they decay the moment infrastructure changes. Generic patterns survive renames, migrations, and reorganisations.
Continuous monitoring is what actually keeps compliance. A perfect classification today is worth less than a "good enough" classification that runs every week. The real long-term value isn't the first report; it is the eleventh.
