Slowly Changing Dimensions (SCDs) refer to the process of updating dimensions in a data warehouse when the dimension attributes change over time. There are three commonly recognized types of SCD updates: Type 1, Type 2, and Type 3. Each type has its own approach to handling dimension updates. Let's explore each type in more detail:
- Type 1 (Overwrite):
In Type 1 SCD updates, the old dimension attributes are simply overwritten with the new values. This means that no historical information is preserved. When a change occurs, the existing dimension record is updated with the new attribute values, and the previous values are lost. This approach is useful when historical data is not important or when it is acceptable to lose the historical context.
For example, let's consider a customer dimension. If a customer changes their phone number, a Type 1 update would overwrite the existing phone number with the new one, and the old phone number would be lost.
- Type 2 (Historical Tracking):
Type 2 SCD updates involve preserving the entire history of dimension attribute changes. In this approach, a new record is created for each change, and a surrogate key and effective date/time are assigned to each version of the dimension. This allows for tracking the historical values and the periods during which they were valid.
Using the previous example, with a Type 2 update, instead of overwriting the phone number, a new record would be created to represent the customer with the updated phone number. Both the old and new records would coexist in the dimension table, each with their respective effective dates.
- Type 3 (Partial History):
Type 3 SCD updates aim to strike a balance between Types 1 and 2 by preserving limited historical information. In this approach, a few selected attributes are updated in place, while others are stored in additional columns to maintain a limited history.
Continuing with the customer dimension example, with a Type 3 update, the original phone number may be overwritten with a new one, but the previous phone number would be stored in a separate column. This allows for some historical context while maintaining simplicity.
It's important to note that these types of SCD updates are not mutually exclusive, and different dimensions within a data warehouse can utilize different approaches based on their specific requirements.