Power BI dataflows and Fabric dataflows

Dataflows in Power BI / Microsoft Fabric centralise data preparation logic so many downstream semantic models can consume the same shaped, cleaned data without re-doing the work. For organisations with mature BI practice, dataflows are where ETL lives — once authored, reused everywhere.

What a dataflow is

A dataflow is essentially Power Query running in the cloud on a schedule:

Source connections to your data (SQL Server, SharePoint, REST APIs, dozens of other sources).
Transformations in Power Query M language (filter, transform, merge, calculated columns).
Refresh schedule (hourly, daily, etc.).
Output as entities that semantic models can query.

The pattern: dataflow produces clean, modelled tables; multiple semantic models import or DirectQuery from those tables; reports build on the models. Dataflow logic isn't repeated across every model.

Power BI dataflows vs Fabric Dataflow Gen2

The dataflow product has evolved:

Power BI dataflows (Gen1)

The original Power BI dataflows. Stored in:

Workspace's underlying storage (typically a Common Data Service Lake or Azure Data Lake Storage Gen2 in Premium scenarios).
Outputs are entities consumed by Power BI Desktop or other dataflows.

Still supported; broadly used.

Dataflow Gen2 in Fabric

The newer dataflow capability in Microsoft Fabric:

Same Power Query authoring experience.
Destination flexibility — output to OneLake lakehouses, Fabric warehouses, Azure SQL, Azure Data Lake, on-prem SQL Server.
Direct Lake mode for downstream semantic models.
Better performance for larger datasets.
Integrated with Fabric workspace and pipeline scheduling.

For new development in Fabric tenants, Gen2 is the strategic choice. Existing Gen1 dataflows continue working.

Common use cases

Centralised dimension data

A Date table consumed by every analytical model:

Author once as a dataflow.
Refreshes daily.
Every model imports the same Date table.
Consistent date hierarchies across all reports.

Beats every model creating its own date table.

Pre-shaped fact tables

A Sales fact table at standardised granularity:

Source from raw sales database with multiple joins.
Apply business rules (currency conversion, regional groupings).
Output as a clean fact table.
Multiple sales reports build on the same dataflow output.

Combined source data

Customer data from CRM + ERP + customer-service system, combined into one clean Customer table.

Slowly Changing Dimensions (SCD)

Type 2 SCD handling in dataflows produces correct historical-attribute versioning that downstream models inherit.

Authoring

Dataflows are authored in the Power BI Service or Microsoft Fabric browser experience:

Create dataflow in the relevant workspace.
Connect to sources — same connector library as Power BI Desktop.
Apply transformations using Power Query M.
Save and refresh to populate the entities.

Power Query Desktop can be used for some development and import; the cloud-hosted authoring is the standard.

Refresh schedule

Dataflows refresh on schedule independently from downstream semantic models:

Dataflow refreshes nightly at 02:00.
Semantic models that consume it refresh at 03:00 — they get the freshly-prepared data.
Reports built on the semantic models reflect current data when users open them.

Layered refresh schedule keeps the pipeline current.

Linked entities

A dataflow's output can be consumed by another dataflow — chain transformations:

Source dataflow ingests raw data.
Transformation dataflow further processes.
Final dataflow applies business-specific shaping.

Useful for separating concerns — source ownership vs business modelling vs report-ready data.

Operational considerations

Workspace governance — who can create / edit dataflows, who can consume.
Refresh monitoring — failures should alert.
Performance — large dataflows can take hours; tune Power Query for query folding.
Cost — Fabric capacity consumes CUs during refresh; Premium dataflows use Premium capacity.
Source-of-truth labelling — document which dataflows are authoritative.

When dataflows aren't the right tool

One-off transformations for a single model — just do it in the model, no dataflow needed.
Real-time data — dataflows refresh on schedule, not streaming. For real-time, use Real-Time Analytics in Fabric.
Heavy data engineering at petabyte scale — use Spark notebooks or warehouses in Fabric.

For organisations with multiple analysts authoring related reports, dataflows are how you avoid the "every report has its own ETL" problem. Author once; consume everywhere; standardise across the BI estate.