Staging
Stage tables are used to transform the data to a star schema or third normal form model. A stage table can be a fact or normalized table that only contains change data or a work table. In star schema data warehouses, the stage table brings all the dimensional joins together in preparation for publishing into the fact table.
A stage table is built from the Data Warehouse connection. Unless you are retrofitting an existing system, stage tables are typically built from one or more load or stage tables. They can utilize the surrogate keys from a number of dimension tables.
The normal steps for creating a stage table are defined below and are covered in this chapter. The steps are:
Identify the source transactional data that will ultimately constitute the fact or normalized table. If the data is sourced from multiple tables ascertain if a join between the source tables is possible, or if a series of passes will be required to populate the stage table. If the latter option is chosen then bespoke code is needed.
Using the 'drag and drop' functionality drag the table with the lowest granular data into a stage target. See
Building the Stage Table (on page
326).
Add columns from other source tables. See
Building the Stage Table (on page
326).
Add any relevant dimension table or normalized table keys. See
Building the Stage Table (on page
326).
Create the stage table in the database. See
Building the Stage Table (on page
326).
Build the update procedure. See
Generating the Staging Update Procedure (on page
330).
Test the update procedure and analyze the results. See
Tuning the Staging Update Process (on page
346).
Modify the update procedure as required. See
Tuning the Staging Update Process (on page
346).
Building the Stage Table
Building the stage table is potentially the most challenging part of the overall task of building a data warehouse analysis area. Most of the effort required is in the design phase, in terms of knowing what data needs to come into the fact table that will ultimately be built. This section assumes that the decision as to what to include has been made.