Slowly Changing Dimensions

Ralph introduced the concept of "slowly changing dimension (SCD)" attributes in 1996.

Slowly Changing Dimensions (SCD) - dimensions that evolve slowly over time, rather than changing the time base in the regular calendar. In the data warehouse, it is necessary to follow the changes of the attributes in the dimension in order to make the data historized. In other words, the implementation of an SCD type must allow users to assign the dimension attribute value appropriate for a given date. Example of these dimensions could be: customer, geography, employee.

There are many approaches to implement SCD, the most popular are:

o    Type 0 - The passive method

o    Type 1 - Overwrite the old value

o    Type 2 - Creating an additional new record

o    Type 3 - Addition of a new column

o    Type 4 - Using the historical table

o    Type 6 - Combining 1,2,3 type approaches (1 + 2 + 3 = 6)

Type 0 - The passive method.

In this method, no special action is performed on dimensional changes. Some dimension data may remain the same as the first time inserted, others may be overwritten.

Type 1 - Overwrite the old value.

In this method, no dimension change history is kept in the database. The old value is simply overwritten by the new one. This type is easy to maintain and often used for data whose changes are the result of processing corrections (for example: special characters for removal, correction of spelling mistakes).

Before the change:

EmployeeKey FirsName Title
13 Sidney Production Technician

After the change:

EmployeeKey FirsName Title
13 Sidney Production Supervisor
Type 2 - Creating an additional new record.

In this methodology, the entire history of dimension changes is kept in the database. You capture the attribute change by adding a new row with a new surrogate key to the dimension table. Both lines contain the natural key attribute. Also “effective date” and “current indicator” are used in this method. There could be only one recording with the current indicator set to 'Y'. For the "effective date" columns, namely start_date and end_date, the end_date for the current record is usually set to the value 9999-12-31.

Before the change:

EmployeeKey FirsName Title START_DATE END_DATE Current_Flag
13 Sidney Production Technician 01/08/2014 31/12/9999 Y

After the change:

EmployeeKey FirsName Title START_DATE END_DATE Current_Flag
13 Sidney Production Technician 01/08/2014 01/10/2015 N
25 Sidney Production Supervisor 02/10/2015 31/12/9999 Y
Type 3 - Adding a new column .

In this type, typically the current value and the previous dimension value are kept in the database. The new value is loaded in the 'current / new' column and the old one in the 'old / previous' column.

In general, the history is limited to the number of columns created to store historical data

Before the change:

EmployeeKey FirstName Current_Title Previous_Title
13 Sidney Production Technician Production Technician

After the change:

EmployeeKey FirstName Current_Title Previous_Title
13 Sidney Production Technician Production Supervisor
Type 4 - Using the Historical Table.

In this method, a separate historical table is used to track changes in historized attributes for each dimension.

The "main" dimension table only keeps current data, for example: the customer and customer_history tables.

Current table:

EmployeeKey FirstName Title
13 Sidney Production Technician

Historical table:

EmployeeKey FirsName Title START_DATE END_DATE
13 Sidney Production Technician 01/08/2014 01/10/2015
13 Sidney Production Supervisor 02/10/2015 01/01/2016
13 Sidney Production Control Manager 02/01/2016 31/12/9999
Type 6 - Combine type 1,2,3 (1 + 2 + 3 = 6) approaches.

In this type we have the following additional columns in the dimension table:

current_title - to maintain the current value of the attribute. All historical records have the same current value.

historical_title - to maintain the historical value of the attribute. All historical records can have different values.

start_date - to keep the start date of the attribute history.

end_date - to maintain the end date of the attribute history.

current_flag - to keep information about the most recent recording.

In this method, to capture the attribute change, we add a new record like in type 2. The information in current_title is replaced with the new one like in type 1. We store the history in a historical_title column like in the type 3.

EmployeeKey FirsName Current_Title Historical_Title START_DATE END_DATE Current_Flag
13 Sidney Production Control Manager Production Technician 01/08/2014 01/10/2015 N
25 Sidney Production Control Manager Production Supervisor 02/10/2015 01/01/2016 N
30 Sidney Production Control Manager Production Control Manager 02/01/2016 31/12/9999 Y
 

2 thoughts on "Slowly Changing Dimensions"

  1. I was curious if you ever changed the structure of your website?
    Its very well written; I love what you've got to say. But maybe you could have a little more
    in the way of being happy.

    You've got an awful lot of text for only having one or two
    images. Maybe you could space it out better?

    Reply

    1. Hi,
      Thank you for these remarks, I will take them into consideration and try to improve the best possible.

      Reply

Leave a Reply

Your email address Will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment is processed.