Power BI dataflows and Fabric dataflows
How dataflows centralise data preparation across many Power BI semantic models and reports.
Dataflows in Power BI / Microsoft Fabric centralise data preparation logic so many downstream semantic models can consume the same shaped, cleaned data without re-doing the work. For organisations with mature BI practice, dataflows are where ETL lives — once authored, reused everywhere.
What a dataflow is
A dataflow is essentially Power Query running in the cloud on a schedule:
- Source connections to your data (SQL Server, SharePoint, REST APIs, dozens of other sources).
- Transformations in Power Query M language (filter, transform, merge, calculated columns).
- Refresh schedule (hourly, daily, etc.).
- Output as entities that semantic models can query.
The pattern: dataflow produces clean, modelled tables; multiple semantic models import or DirectQuery from those tables; reports build on the models. Dataflow logic isn't repeated across every model.
Power BI dataflows vs Fabric Dataflow Gen2
The dataflow product has evolved:
Power BI dataflows (Gen1)
The original Power BI dataflows. Stored in:
- Workspace's underlying storage (typically a Common Data Service Lake or Azure Data Lake Storage Gen2 in Premium scenarios).
- Outputs are entities consumed by Power BI Desktop or other dataflows.
Still supported; broadly used.
Dataflow Gen2 in Fabric
The newer dataflow capability in Microsoft Fabric:
- Same Power Query authoring experience.
- Destination flexibility — output to OneLake lakehouses, Fabric warehouses, Azure SQL, Azure Data Lake, on-prem SQL Server.
- Direct Lake mode for downstream semantic models.
- Better performance for larger datasets.
- Integrated with Fabric workspace and pipeline scheduling.
For new development in Fabric tenants, Gen2 is the strategic choice. Existing Gen1 dataflows continue working.
Common use cases
Centralised dimension data
A Date table consumed by every analytical model:
- Author once as a dataflow.
- Refreshes daily.
- Every model imports the same Date table.
- Consistent date hierarchies across all reports.
Beats every model creating its own date table.
Pre-shaped fact tables
A Sales fact table at standardised granularity:
- Source from raw sales database with multiple joins.
- Apply business rules (currency conversion, regional groupings).
- Output as a clean fact table.
- Multiple sales reports build on the same dataflow output.
Combined source data
Customer data from CRM + ERP + customer-service system, combined into one clean Customer table.
Slowly Changing Dimensions (SCD)
Type 2 SCD handling in dataflows produces correct historical-attribute versioning that downstream models inherit.
Authoring
Dataflows are authored in the Power BI Service or Microsoft Fabric browser experience:
- Create dataflow in the relevant workspace.
- Connect to sources — same connector library as Power BI Desktop.
- Apply transformations using Power Query M.
- Save and refresh to populate the entities.
Power Query Desktop can be used for some development and import; the cloud-hosted authoring is the standard.
Refresh schedule
Dataflows refresh on schedule independently from downstream semantic models:
- Dataflow refreshes nightly at 02:00.
- Semantic models that consume it refresh at 03:00 — they get the freshly-prepared data.
- Reports built on the semantic models reflect current data when users open them.
Layered refresh schedule keeps the pipeline current.
Linked entities
A dataflow's output can be consumed by another dataflow — chain transformations:
- Source dataflow ingests raw data.
- Transformation dataflow further processes.
- Final dataflow applies business-specific shaping.
Useful for separating concerns — source ownership vs business modelling vs report-ready data.
Operational considerations
- Workspace governance — who can create / edit dataflows, who can consume.
- Refresh monitoring — failures should alert.
- Performance — large dataflows can take hours; tune Power Query for query folding.
- Cost — Fabric capacity consumes CUs during refresh; Premium dataflows use Premium capacity.
- Source-of-truth labelling — document which dataflows are authoritative.
When dataflows aren't the right tool
- One-off transformations for a single model — just do it in the model, no dataflow needed.
- Real-time data — dataflows refresh on schedule, not streaming. For real-time, use Real-Time Analytics in Fabric.
- Heavy data engineering at petabyte scale — use Spark notebooks or warehouses in Fabric.
For organisations with multiple analysts authoring related reports, dataflows are how you avoid the "every report has its own ETL" problem. Author once; consume everywhere; standardise across the BI estate.