A data warehouse is a heterogeneous collection of different data sources organized under a unified schema. There are 2 approaches to building a data warehouse: the top-down approach and the bottom-up approach are explained below.
1. Top-down approach
The essential components are discussed below
- External sources: The external source is a source from which data is collected, regardless of the type of data. Data can also be structured, semi-structured, and unstructured.
- Scenario area: Since data extracted from external sources does not follow a particular format, it is necessary to validate this data to load it into the data warehouse. For this purpose, it is recommended to use the ETL tool. E(Extracted): The data is extracted from the external data source. T(Transform): The data is transformed into the standard format.
- L(Load): The data is loaded into the data store after being transformed into the standard format
- – After data cleansing, it is stored in the data warehouse as a central repository. It actually stores the metadata and the actual data is stored in the data marts. Note that the data warehouse stores data in its purest form in this top-down approach.
- Marts – Data Mart is also part of the storage component. It stores the information of a particular function of an organization that is handled by a single authority. There can be as many data marts in an organization depending on the functions. We can also say that data mart contains a subset of the data stored in the data warehouse.
- – The practice of analyzing the big data present in the data warehouse is data mining. It is used to find the hidden patterns that are present in the database or data warehouse with the help of data mining algorithm.
Inmon defines this approach as a data warehouse as a central repository for the entire organization, and data marts are created from it after the entire data warehouse has been created.
Advantages of the
- : Since data marts are created from the data warehouse, it provides a consistent dimensional view of the data marts
- In addition, this model is considered to be the strongest model for business changes. That’s why large organizations prefer to follow this approach.
- Creating data marts from a data warehouse is easy.
Disadvantages of the top-down approach
- : the cost, the time needed to design and its maintenance is very high
2. Bottom-up focus:
First, the data is
- pulled from external sources (just like the top-down approach).
- The data then passes through the staging area (as explained above) and is loaded into data marts instead of the data warehouse. Data marts are created first and provide reporting capability. It addresses a single business area.
- These data marts are integrated into the data warehouse.
This approach is given by Kinball as – data marts are created first and provide a thin view for analyses and the data warehouse is created after complete data marts have been created.
Advantages of the bottom-up approach
- : Because data marts are created first, reports are generated quickly
- We can accommodate more data marts here and in this way the data warehouse can be extended
- In addition, the cost and time required to design this model are comparatively low.
Disadvantage of the
- : This model is not as strong as the top-down approach, as the dimensional view of the data marts is not consistent as it is in the previous approach.