The most important dimensions in every data warehouse is the date dimension. Almost every fact table has one or more surrogate keys that join to this date dimension. Normally, dates from a source table have to be translated into surrogate keys, by looking them up in the date dimension.
However, in DataVault, there is a nifty little trick to overcome this:
- create a source table that contains the dates and the details (Month, Year etc.) you want in your data marts.
- create a hub and a satellite to store the date and the details . Use the format of dates in your source system. If your source systems use a numeric format for dates (like 20151015) then use this format.
- create a surrogate key that matches the business key.
The combination of hub and satellite looks like this:
|20150101||20150101||January 1, 2015||1||January||2015||...|
|20150102||20150102||January 2, 2015||2||January||2015||...|
When loading a link that contains a date, use the date as a peg legged link.
Links can be partitioned, based on the smart surrogate key of the date hub. Partitioning is the process of dividing a single large table into multiple logical chunks. Partitioning improves the performance of the queries running against this partitioned table, enables the piecemeal backup / restore of historical data and improves the speed with which you can load your data.
- Fact tables can also be partitioned, thereby gaining the same benefits.
- Fact tables can be queried without joining the date dimension.
- Every date gets stored in the hub, so no more missing dimension records.