Does the gold layer always have to be a star schema?

I get a version of this question often. In the gold layer of a medallion lakehouse, do you always recommend a star schema? It is a fair thing to ask, because the star schema is the default answer most of us reach for, and for good reason. I have built Kimball-style data warehouses for decades, and the star schema has earned its place a thousand times over.

A couple of years ago my answer would have been an unqualified yes. I believed there was always a way to shape the data into a star schema that the business could use. If a model felt awkward, that was a sign I had not found the right grain or the right conformed dimensions yet, not a sign that the star schema itself was the wrong tool. Then I met a group of users who changed my mind.

The reality that challenged the dogma

These users were not report consumers in the usual sense. They were analysts, and they worked the way analysts on the TV show "24" work. Think of the intelligence analysts piecing together fragments from many different sources, under time pressure, to make a fast decision. My users did the same thing with their own data. They pulled from many places at once and assembled an answer in the moment.

A lot of what they needed arrived from outside their own systems. They received emails from vendors and partners, took that external information on the fly, and blended it with data from their internal systems. They needed many different kinds of information, and a great deal of it landed in Excel.

I want to be honest about the Excel part, because normally I steer users away from Excel and toward governed Power BI reports, and that is usually the right call. This was a legitimate exception. Excel has real uses, and this was one of them. These analysts were already pulling data into spreadsheets and building formulas by hand, copying and pasting, matching and cleaning manually. It was genuinely impressive to watch. They flew through spreadsheets and built formulas on the fly faster than I could follow. It was also entirely manual, and much of the logic lived only in their heads, unsupported by any governed source of truth.

Why a star schema would have failed them

You can put a PivotTable over a star schema, and for many users that is exactly the right experience. For these analysts it would not have been enough. They would still have been hand-building their working datasets with formulas every time, because the data they needed was spread across multiple tables that they would have had to understand and join themselves.

That joining is the friction that matters here. A star schema asks the consumer to know which dimension connects to which fact and on what key. For a report author or a semantic model, that is a feature. For an analyst who is moving fast and blending in external data by hand, it is a tax on every single task. The friction of understanding and joining multiple tables would have made the star schema nearly impossible for them to benefit from. We would have shipped something technically correct and practically unusable.

What we built instead

After a lot of fretting, and I will admit this went against instinct, we built a very wide, flattened model. We took the dataset these analysts needed and flattened it into a Microsoft Fabric Direct Lake semantic model: one big wide table that combined the master data and the transactions together. No joins left for the user to perform.

I will be honest that the scale made me nervous. This is a very wide table, several hundred columns across, and the dataset runs to more than a hundred million rows. I expected to pay for that at query time. The Microsoft Fabric Direct Lake semantic model handled it far better than I anticipated. Queries come back quickly and the model stays responsive, even at that width and that row count. Direct Lake only pulls the columns a given query actually touches, so a very wide table does not cost you the way you might fear.

From Excel, the analysts connect to that model and pick the fields they need. This is Analyze in Excel against the Power BI semantic model. The model exposes the fields directly, so there is no manual table stitching. They get a PivotTable or a clean dataset where everything is already in one place, with no joins to get right. The more advanced users were trained to blend and join additional tables on top of that, but not every user could do that, and that was the point. The wide flat table met them where they were instead of demanding they come to the model.

The gold layer is not a single shape

Here is the reframing I want you to take away. The gold layer is not one shape, and you do not get only one of it. You can have more than one gold layer for different consumers.

In this case we built what I have started calling a gold Power Query layer: a flattened model purpose-built for these analysts. A conventional star-schema gold layer can still exist right alongside it, serving the reports and semantic models that want a star. Other gold layers can be shaped differently again for other use cases. None of them are wrong. They are answers to different questions.

  • A star-schema gold layer for report authors and Direct Lake semantic models that benefit from conformed dimensions and clean grain.
  • A wide, flattened gold layer for analysts who blend on the fly and cannot afford to join tables themselves.
  • Room for more shapes as new consumers show up with needs that neither of the first two serves well.
Medallion lakehouse with two gold layers Five data sources (ERP, CRM, Excel, SharePoint, and SQL) flow into a Bronze layer, which flows to a Silver layer. Silver branches into two gold layers: a star-schema Gold that serves Power BI, and a wide, flattened PQ Gold that serves Excel through Power Query. The two gold-to-consumer paths run in parallel. SOURCES BRONZE SILVER GOLD CONSUMERS ERP CRM Excel SharePoint SQL Bronze Silver Gold star schemas PQ Gold wide, flattened Power BI Excel Power Query
A medallion lakehouse with two gold layers: a star-schema Gold for Power BI, and a wide, flattened PQ Gold for Excel.

Most data warehouse and lakehouse architects, myself very much included until recently, would reflexively insist on the star schema. This is a case for looking twice. The job is to serve the consumer, not the dogma.

The payoff

Had we shipped a star schema, these analysts would have been stuck with something they could not use. Instead their work accelerated, because the datasets now arrive with everything already blended. They write far fewer Excel formulas and spend far less time piecing things together. They still blend in external datasets when they need to, because that is the nature of the work, but the job is much easier now that the internal side comes pre-assembled and trustworthy.

There is one more payoff, and it is the part I find most interesting. With clean, pre-blended, trustworthy data sitting in that model, the analysts can build reusable components and bring AI into the loop. They can use AI to combine the Power Query datasets with their external data and get what they need without writing queries or Excel formulas at all. The hard part of that has always been trusting the joins, and the flattened model takes that worry off the table.

Making clean, joined, trustworthy lakehouse data available for both people and AI to consume, without anyone having to worry about getting the joins right, has been a real unlock. That is worth going against instinct for. So the next time someone asks me whether the gold layer has to be a star schema, my honest answer is that it depends entirely on who is standing on the other side of it.