Browse all topics
SharePoint & OneDrive

SharePoint search and crawl architecture

How SharePoint indexes content, what affects search quality, and how to influence ranking.

SharePoint's search experience — under Microsoft Search as the user-facing surface — depends on a continuously-updated index of tenant content. Understanding what gets indexed, how often, and how ranking works helps both with troubleshooting "I can't find that file" and with deliberate improvement of search quality.

What gets indexed

The SharePoint search index covers:

  • Files in SharePoint and OneDrive libraries — full text for supported file types (Word, Excel, PowerPoint, PDF, plain text, HTML, and many more).
  • List items — text columns, choice columns, custom columns.
  • Pages — modern and classic SharePoint pages.
  • Site metadata — site title, description, owner.
  • People — profile fields from Entra ID.
  • External content — third-party data via Microsoft Graph connectors.

Each indexed item carries its permissions so search results are trimmed per user — users see only what they're authorised to access.

Crawl behaviour

SharePoint Online's search is continuous crawl — content is indexed as it changes, typically within minutes for most changes. No admin crawl schedules to configure. Specific events that trigger indexing:

  • File created, modified, or moved.
  • List item created or modified.
  • Page published.
  • Site created.
  • Permissions changed.
  • Sensitivity label applied.

Initial indexing of a new site or large content upload can take longer — Microsoft prioritises across the global service, so peak times have longer indexing latency.

Why search results are sometimes stale

When users report "I uploaded this file yesterday and search can't find it," common causes:

  • Recent upload — give it a few minutes / hours.
  • Indexing pause for very-large bulk operations — Microsoft sometimes pauses to keep service health.
  • Tenant just resumed from preservation hold — re-indexing in progress.
  • Permissions just changed — search may show old permissions briefly.
  • Content too short to index — extremely small documents may not surface.
  • Unsupported file type — some specialised formats don't index full-text.

For investigation, the search admin centre shows tenant-wide search health.

Ranking signals

Search results are ranked by:

  • Relevance to query — keyword match, semantic similarity.
  • Recency — recent content ranks higher (with decay).
  • Popularity — frequently-clicked items rank higher.
  • Personal signals — content the user has interacted with ranks higher for that user.
  • People graph — content from people you work with ranks higher.
  • Promoted results — admin-configured bookmarks always appear at top.

The personal-signals layer means two users see different results for the same query — relevance depends on each user's work context.

Influencing search quality

Admin actions that improve search results:

Bookmarks and answers

For known frequent queries, configure bookmarks in the Search admin centre. "VPN" returns the VPN setup page; "expense report" returns the expense system. Avoids users sifting through irrelevant results.

Acronyms

For organisation-specific acronyms, define them. "ARR" → "Annual Recurring Revenue."

Custom verticals

For specialised content, define search verticals that filter results to specific content types — files, pages, people, ServiceNow tickets, etc.

Mark specific sites as featured — they appear prominently in the user's organisation home and in relevant searches.

Microsoft Graph connectors

Index external content alongside SharePoint — Jira tickets, ServiceNow records, GitHub repos, file shares — for unified search across all enterprise data.

Content quality

  • Descriptive titles on pages and files surface better than Document 1.
  • Useful descriptions in site and library properties.
  • Sensitivity labels correctly applied — users can find content by label.
  • Up-to-date content — stale content with old dates ranks lower than current.

Operational considerations

  • Search analytics in the Search admin centre show top queries, zero-click queries (no result clicked), trending content.
  • Use top-zero-click queries as candidates for new bookmarks.
  • Quarterly review — search admin practice as ongoing operation.
  • User feedback — provide a feedback channel for "I couldn't find X" reports.

For Microsoft 365 customers with significant content volume, deliberate search administration is one of the higher-leverage admin investments. The search box gets used many times a day across the tenant; small improvements compound.