Purview auto-labelling policies
How auto-labelling applies sensitivity and retention labels automatically based on content match — and the operational realities.
User-applied sensitivity and retention labels rely on users picking the right label at the right time. That works for engaged users but leaves substantial volumes of content unlabelled — and oversharing in Copilot becomes "the file was never labelled, so Copilot treated it as Internal." Auto-labelling policies in Purview fill this gap by applying labels automatically based on content match.
Two flavours: client-side and service-side
Client-side auto-labelling
Office apps (Word, Excel, PowerPoint, Outlook) recommend or apply a label as users work, based on what the content contains. Example: "this document contains 100+ customer SSNs; we recommend the Confidential label" — and the user can accept or override (depending on policy).
Configured as auto-labelling settings on each sensitivity label. Applies in real time at the user's device.
Service-side auto-labelling
The Purview service scans existing content in SharePoint, OneDrive, and Exchange Online mailboxes, applying labels to content matching the rules — independent of user action. Useful for labelling content at rest without waiting for users to do it.
Configured as auto-labelling policies in the Purview portal. Runs on a schedule.
What can trigger labelling
Match conditions include:
- Sensitive information types — credit card patterns, SSNs, government IDs, custom regex.
- Trainable classifiers — ML-based content category detection (legal docs, source code, healthcare records).
- Keyword phrases.
- Content properties — SharePoint column values, document properties.
- Sender / recipient patterns (Exchange-specific).
Conditions are AND/OR combinable. The Purview rule editor handles complex predicates.
Where this matters most
Auto-labelling is particularly valuable for:
- Copilot readiness — labelled content gives Copilot signal about sensitivity, affecting what it includes in grounding and what it warns about in output.
- DLP enforcement — DLP policies keyed to labels work better when more content is labelled.
- Retention enforcement — auto-applied retention labels handle compliance retention without user action.
- Historical content cleanup — apply correct labels to years-old content that nobody is going to manually relabel.
Operational rollout
A common pattern:
- Start with simulation mode — service-side auto-labelling policies have a simulate option. Run a simulation, review what would be labelled, validate the matches.
- Tune the rules — false positives in classifier or SIT matching are common. Refine the conditions.
- Enable the policy — start labelling. Schedule incremental scans.
- Monitor — Purview reports on labelled-vs-unlabelled volumes, classifier confidence, user overrides on client-side recommendations.
- Iterate — add rules as new sensitive data types emerge.
Limits and caveats
- Service-side auto-labelling processes ~25,000 files per day per policy initially, scaling up over time. Initial backlog processing can take weeks for large tenants.
- Trainable classifiers require training and validation cycles — they're not magic.
- Client-side auto-labelling works only when users are signed into Office apps with the right licence.
- Encryption applied by auto-labelling can break legacy workflows — pilot carefully on labels that encrypt.
Licensing
Auto-labelling (both flavours) requires Microsoft 365 E5, Microsoft 365 E5 Compliance, or the Information Protection and Governance add-on.
For tenants serious about information protection and Copilot readiness, auto-labelling is a fundamental investment. User-applied labelling alone leaves too much content unmanaged in any non-trivial tenant.