Ralph introduced the concept of "slowly changing dimension (SCD)" attributes in 1996.
Slowly Changing Dimensions (SCD) - dimensions that evolve slowly over time, rather than changing the time base in the regular calendar. In the data warehouse, it is necessary to follow the changes of the attributes in the dimension in order to make the data historized. In other words, the implementation of an SCD type must allow users to assign the dimension attribute value appropriate for a given date. Example of these dimensions could be: customer, geography, employee.
There are many approaches to implement SCD, the most popular are:
o Type 0 - The passive method
o Type 1 - Overwrite the old value
o Type 2 - Creating an additional new record
o Type 3 - Addition of a new column
o Type 4 - Using the historical table
o Type 6 - Combining 1,2,3 type approaches (1 + 2 + 3 = 6)
Type 0 - The passive method.
In this method, no special action is performed on dimensional changes. Some dimension data may remain the same as the first time inserted, others may be overwritten.
Type 1 - Overwrite the old value.
In this method, no dimension change history is kept in the database. The old value is simply overwritten by the new one. This type is easy to maintain and often used for data whose changes are the result of processing corrections (for example: special characters for removal, correction of spelling mistakes).
Before the change:
EmployeeKey | FirsName | Official Qualification |
13 | Sidney | Production Technician |
After the change:
EmployeeKey | FirsName | Official Qualification |
13 | Sidney | Production Supervisor |
Type 2 - Creating an additional new record.
In this methodology, the entire history of dimension changes is kept in the database. You capture the attribute change by adding a new row with a new surrogate key to the dimension table. Both lines contain the natural key attribute. Also “effective date” and “current indicator” are used in this method. There could be only one recording with the current indicator set to 'Y'. For the "effective date" columns, namely start_date and end_date, the end_date for the current record is usually set to the value 9999-12-31.
Before the change:
EmployeeKey | FirsName | Official Qualification | START_DATE | END_DATE | Current_Flag |
13 | Sidney | Production Technician | 01/08/2014 | 31/12/9999 | Y |
After the change:
EmployeeKey | FirsName | Official Qualification | START_DATE | END_DATE | Current_Flag |
13 | Sidney | Production Technician | 01/08/2014 | 01/10/2015 | N |
25 | Sidney | Production Supervisor | 02/10/2015 | 31/12/9999 | Y |
Type 3 - Adding a new column .
In this type, typically the current value and the previous dimension value are kept in the database. The new value is loaded in the 'current / new' column and the old one in the 'old / previous' column.
In general, the history is limited to the number of columns created to store historical data
Before the change:
EmployeeKey | FirstName | Current_Title | Previous_Title |
13 | Sidney | Production Technician | Production Technician |
After the change:
EmployeeKey | FirstName | Current_Title | Previous_Title |
13 | Sidney | Production Technician | Production Supervisor |
Type 4 - Using the Historical Table.
In this method, a separate historical table is used to track changes in historized attributes for each dimension.
The "main" dimension table only keeps current data, for example: the customer and customer_history tables.
Current table:
EmployeeKey | FirstName | Official Qualification |
13 | Sidney | Production Technician |
Historical table:
EmployeeKey | FirsName | Official Qualification | START_DATE | END_DATE |
13 | Sidney | Production Technician | 01/08/2014 | 01/10/2015 |
13 | Sidney | Production Supervisor | 02/10/2015 | 01/01/2016 |
13 | Sidney | Production Control Manager | 02/01/2016 | 31/12/9999 |
Type 6 - Combine type 1,2,3 (1 + 2 + 3 = 6) approaches.
In this type we have the following additional columns in the dimension table:
current_title - to maintain the current value of the attribute. All historical records have the same current value.
historical_title - to maintain the historical value of the attribute. All historical records can have different values.
start_date - to keep the start date of the attribute history.
end_date - to maintain the end date of the attribute history.
current_flag - to keep information about the most recent recording.
In this method, to capture the attribute change, we add a new record like in type 2. The information in current_title is replaced with the new one like in type 1. We store the history in a historical_title column like in the type 3.
EmployeeKey | FirsName | Current_Title | Historical_Title | START_DATE | END_DATE | Current_Flag |
13 | Sidney | Production Control Manager | Production Technician | 01/08/2014 | 01/10/2015 | N |
25 | Sidney | Production Control Manager | Production Supervisor | 02/10/2015 | 01/01/2016 | N |
30 | Sidney | Production Control Manager | Production Control Manager | 02/01/2016 | 31/12/9999 | Y |
2 thoughts on "Slowly Changing Dimensions"
I was curious if you ever changed the structure of your website?
Its very well written; I love what you've got to say. But maybe you could have a little more
in the way of being happy.
You've got an awful lot of text for only having one or two
images. Maybe you could space it out better?
data sidney
Hi,
Thank you for these remarks, I will take them into consideration and try to improve the best possible.
Ousama EL HOR