SharePoint search and crawl architecture
How SharePoint indexes content, what affects search quality, and how to influence ranking.
SharePoint's search experience — under Microsoft Search as the user-facing surface — depends on a continuously-updated index of tenant content. Understanding what gets indexed, how often, and how ranking works helps both with troubleshooting "I can't find that file" and with deliberate improvement of search quality.
What gets indexed
The SharePoint search index covers:
- Files in SharePoint and OneDrive libraries — full text for supported file types (Word, Excel, PowerPoint, PDF, plain text, HTML, and many more).
- List items — text columns, choice columns, custom columns.
- Pages — modern and classic SharePoint pages.
- Site metadata — site title, description, owner.
- People — profile fields from Entra ID.
- External content — third-party data via Microsoft Graph connectors.
Each indexed item carries its permissions so search results are trimmed per user — users see only what they're authorised to access.
Crawl behaviour
SharePoint Online's search is continuous crawl — content is indexed as it changes, typically within minutes for most changes. No admin crawl schedules to configure. Specific events that trigger indexing:
- File created, modified, or moved.
- List item created or modified.
- Page published.
- Site created.
- Permissions changed.
- Sensitivity label applied.
Initial indexing of a new site or large content upload can take longer — Microsoft prioritises across the global service, so peak times have longer indexing latency.
Why search results are sometimes stale
When users report "I uploaded this file yesterday and search can't find it," common causes:
- Recent upload — give it a few minutes / hours.
- Indexing pause for very-large bulk operations — Microsoft sometimes pauses to keep service health.
- Tenant just resumed from preservation hold — re-indexing in progress.
- Permissions just changed — search may show old permissions briefly.
- Content too short to index — extremely small documents may not surface.
- Unsupported file type — some specialised formats don't index full-text.
For investigation, the search admin centre shows tenant-wide search health.
Ranking signals
Search results are ranked by:
- Relevance to query — keyword match, semantic similarity.
- Recency — recent content ranks higher (with decay).
- Popularity — frequently-clicked items rank higher.
- Personal signals — content the user has interacted with ranks higher for that user.
- People graph — content from people you work with ranks higher.
- Promoted results — admin-configured bookmarks always appear at top.
The personal-signals layer means two users see different results for the same query — relevance depends on each user's work context.
Influencing search quality
Admin actions that improve search results:
Bookmarks and answers
For known frequent queries, configure bookmarks in the Search admin centre. "VPN" returns the VPN setup page; "expense report" returns the expense system. Avoids users sifting through irrelevant results.
Acronyms
For organisation-specific acronyms, define them. "ARR" → "Annual Recurring Revenue."
Custom verticals
For specialised content, define search verticals that filter results to specific content types — files, pages, people, ServiceNow tickets, etc.
Promoted content
Mark specific sites as featured — they appear prominently in the user's organisation home and in relevant searches.
Microsoft Graph connectors
Index external content alongside SharePoint — Jira tickets, ServiceNow records, GitHub repos, file shares — for unified search across all enterprise data.
Content quality
- Descriptive titles on pages and files surface better than
Document 1. - Useful descriptions in site and library properties.
- Sensitivity labels correctly applied — users can find content by label.
- Up-to-date content — stale content with old dates ranks lower than current.
Operational considerations
- Search analytics in the Search admin centre show top queries, zero-click queries (no result clicked), trending content.
- Use top-zero-click queries as candidates for new bookmarks.
- Quarterly review — search admin practice as ongoing operation.
- User feedback — provide a feedback channel for "I couldn't find X" reports.
For Microsoft 365 customers with significant content volume, deliberate search administration is one of the higher-leverage admin investments. The search box gets used many times a day across the tenant; small improvements compound.