#PowerQuery – Joining to a Slowly Changing Dimension

I blogged previously about how to look up a surrogate key for a slowly changing dimension using DAX. This post is about how to do the same thing using Power Query.

I'm going to start off with the same 2 tables that I used in the previous blog post. One is a fact table and the other is my BU (Business Unit) table. I started by clicking on each table of data in Excel and choosing the "From Table" data source option.

image

And for each table I unchecked the "Load to worksheet" option and then clicked apply & save.

SNAGHTML179a06bf

Once I had done that for both tables my Power Query tool pane looked like the following, I have two queries defined, but neither of them is loading any data directly.

image

Now that we have our two source queries we want to use the Merge option in Power Query to join them together

image

The Merge option in Power Query is how you join matching rows in two tables together. I chose "Fact" as my first table as for each row in the Fact I want to find the matching BU_Key from the BU table.

image

You'll notice that at this point we can only choose columns for an equality match, there are no options for us to test that the Date in Fact is between the StartDate and EndDate in the BU table.

When we click on OK we end up with a result like the following which has our original rows from the Fact table and then a column called "NewColumn" which contains the 1 or more rows from the BU table which matched on the BU_Code column.

image

If we click on the little double arrow button in the header of the NewColumn column you get the following options:

image

We can choose to either expand or aggregate the rows in the nested table. Because we want to lookup the BU_Key we tick that as well as the StartDate and EndDate columns as we will need those later.

That gives us a result like the following:

image

Now we are getting close, but we still have one major issue. We now have 16 rows instead of our original 8 because each row in the Fact table is matching to multiple rows in the BU table as we have not done any filtering based on the start and end dates yet. Clicking on the filter button at the top of the "Date" column it initially looks like doing a date filter and choosing the "Between" option would be a solution.

image

But that only gives us the option to select fixed dates values from our data, not references to another column.

image

One solution would be to put in fixed dates and then manually edit the filter in the formula bar, but I wanted to see how far I could get without resorting to doing any advanced editing. The solution I came up with involved some minor code, but it can be done without manually editing the formula.

What I ended up doing was inserting a new custom column which we can then use to filter out the rows we don't want. So from the "Insert" tab on the ribbon I chose the "Insert Custom Column" option:

image

Then I entered the following expression to create a new column called "DateFilter" which will return a value of True if the Date from the current Fact row was between the StartDate and EndDate from the BU table.

= ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )

image

That gives us the following result:

image

Then to filter down to just the "True" values we just need to click on the dropdown in the header of the "DateFilter" column and select the "TRUE" value in our filter.

image

We are now back to our original 8 rows.

image

Then we just need to do a little clean up. By holding the Ctrl key while clicking on the green columns above we can remove those columns. Then I just renamed "NewColumn.BU_Key" to BU_Key and clicked on the "Date" column and set it's type as date (which somehow did not get correctly get detected) we now end up with our finished table which we could choose to load into Excel or directly into a Power Pivot model.

image

Below is the Power Query Formula that was created as a result of the above steps. (this is just the merge query excluding the 2 source queries for "BU" and "Fact")

let
    Source = Table.NestedJoin(Fact,{"BU_Code"},BU,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Source
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    InsertedCustom = Table.AddColumn(
            #"Expand NewColumn", "DateFilter"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),
    FilteredRows = Table.SelectRows(InsertedCustom, each ([DateFilter] = true)),
    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate", "DateFilter"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}})
in
    RenamedColumns

If you want to manually tweak things you can go into the Advanced Editor and manually edit the formula to combine all three queries into one and you can also do away with the custom column and just do the between filtering inline. The following query shows the single query solution.

let
    Fact1 = Excel.CurrentWorkbook(){[Name="Fact"]}[Content],
    BU1 = Excel.CurrentWorkbook(){[Name="BU"]}[Content],
    Join = Table.NestedJoin(Fact1,{"BU_Code"},BU1,{"BU_Code"},"NewColumn"),
    #"Expand NewColumn" = Table.ExpandTableColumn(Join
            , "NewColumn"
            , {"BU_Key", "StartDate", "EndDate"}
            , {"NewColumn.BU_Key", "NewColumn.StartDate", "NewColumn.EndDate"}),
    FilteredRows = Table.SelectRows(#"Expand NewColumn"
            , each ( ( [Date] >= [NewColumn.StartDate] ) and ( [Date] <= [NewColumn.EndDate] ) )),

    RemovedColumns = Table.RemoveColumns(
            FilteredRows,{"BU_Code", "NewColumn.StartDate", "NewColumn.EndDate"}),
    RenamedColumns = Table.RenameColumns(RemovedColumns,{{"NewColumn.BU_Key", "BU_Key"}}),
    ChangedType = Table.TransformColumnTypes(RenamedColumns,{{"Date", type datetime}})
in
    ChangedType

If you are curious you can download the workbook I used for this blog post from my OneDrive: http://1drv.ms/1mux5O5

Print | posted on Monday, May 5, 2014 9:53 PM

Comments on this post

# re: #PowerQuery – Joining to a Slowly Changing Dimension

Requesting Gravatar...
You could have used lookup functionality for this instead. I had been using joins often, but have started moving away from them because you have the issue of modifying your fact tables.

Lookups are a little work to get right at first, but very handy, and like in excel they will return the first record if there are many. Unlike excel you can get a lookup to work on multiple parameters.

Here you would want to use Something like Table.AddColumn(Fact, "ReturnedRecord", (_) => BU{[StartDate <= _[Date], EndDate >= _[Date], BU_Code = _[BU_Code]]})

You Could also close that recordLookup with an additional field lookup if you were only interested in that one BU field. After Last curly bracket type [BU_Key].


When using nested joins to a fact table, before expanding, I will always test the row count to ensure it is equal to one to ensure that I do not screw up my table from an ambiguous lookup-via-join.
Left by DBExcelAccounting@Blogspot.Com on May 20, 2014 4:06 AM

# re: #PowerQuery – Joining to a Slowly Changing Dimension

Requesting Gravatar...
@Derek - as we discussed via email. Your approach you suggested here does not actually work. The record indexer syntax only does equality matches and does not hand the >= or <= conditions. It can be worked around using a function and Table.SelectRows, but that is more complex to setup and results in worse performance than the join approach if executed against a source that is capable of doing function folding.
Left by Darren Gosbell on Jun 01, 2014 2:17 PM

Your comment:

 (will show your gravatar)