Browse all topics

Purview auto-labelling policies

How auto-labelling applies sensitivity and retention labels automatically based on content match — and the operational realities.

User-applied sensitivity and retention labels rely on users picking the right label at the right time. That works for engaged users but leaves substantial volumes of content unlabelled — and oversharing in Copilot becomes "the file was never labelled, so Copilot treated it as Internal." Auto-labelling policies in Purview fill this gap by applying labels automatically based on content match.

Two flavours: client-side and service-side

Client-side auto-labelling

Office apps (Word, Excel, PowerPoint, Outlook) recommend or apply a label as users work, based on what the content contains. Example: "this document contains 100+ customer SSNs; we recommend the Confidential label" — and the user can accept or override (depending on policy).

Configured as auto-labelling settings on each sensitivity label. Applies in real time at the user's device.

Service-side auto-labelling

The Purview service scans existing content in SharePoint, OneDrive, and Exchange Online mailboxes, applying labels to content matching the rules — independent of user action. Useful for labelling content at rest without waiting for users to do it.

Configured as auto-labelling policies in the Purview portal. Runs on a schedule.

What can trigger labelling

Match conditions include:

  • Sensitive information types — credit card patterns, SSNs, government IDs, custom regex.
  • Trainable classifiers — ML-based content category detection (legal docs, source code, healthcare records).
  • Keyword phrases.
  • Content properties — SharePoint column values, document properties.
  • Sender / recipient patterns (Exchange-specific).

Conditions are AND/OR combinable. The Purview rule editor handles complex predicates.

Where this matters most

Auto-labelling is particularly valuable for:

  • Copilot readiness — labelled content gives Copilot signal about sensitivity, affecting what it includes in grounding and what it warns about in output.
  • DLP enforcement — DLP policies keyed to labels work better when more content is labelled.
  • Retention enforcement — auto-applied retention labels handle compliance retention without user action.
  • Historical content cleanup — apply correct labels to years-old content that nobody is going to manually relabel.

Operational rollout

A common pattern:

  1. Start with simulation mode — service-side auto-labelling policies have a simulate option. Run a simulation, review what would be labelled, validate the matches.
  2. Tune the rules — false positives in classifier or SIT matching are common. Refine the conditions.
  3. Enable the policy — start labelling. Schedule incremental scans.
  4. Monitor — Purview reports on labelled-vs-unlabelled volumes, classifier confidence, user overrides on client-side recommendations.
  5. Iterate — add rules as new sensitive data types emerge.

Limits and caveats

  • Service-side auto-labelling processes ~25,000 files per day per policy initially, scaling up over time. Initial backlog processing can take weeks for large tenants.
  • Trainable classifiers require training and validation cycles — they're not magic.
  • Client-side auto-labelling works only when users are signed into Office apps with the right licence.
  • Encryption applied by auto-labelling can break legacy workflows — pilot carefully on labels that encrypt.

Licensing

Auto-labelling (both flavours) requires Microsoft 365 E5, Microsoft 365 E5 Compliance, or the Information Protection and Governance add-on.

For tenants serious about information protection and Copilot readiness, auto-labelling is a fundamental investment. User-applied labelling alone leaves too much content unmanaged in any non-trivial tenant.