When it comes to data warehouse (DWH) designing, two of the most widely discussed and explained data warehouse approaches are the Inmon and the Kimball methodology. For years, people have debated over which data warehouse approach is better and more effective for businesses. However, there’s still no definite answer as both methods have their benefits and drawbacks.
In this blog, we will discuss the basics of a data warehouse, it’s characteristics, and compare the two popular data warehouse approaches – Kimball vs. Inmon.
The key data warehouse concept allows users to access a unified version of truth for timely business decision-making, reporting, and forecasting. DWH functions like an information system with all the past and commutative data stored from one or more sources.
Design and launch your data warehouse from scratch with zero codingData Warehouse Models refer to the architectural designs and structures used to organize and manage data within a data warehousing environment. These models dictate how data is stored, accessed, and utilized for analytical purposes. Major sections include:
The following are the four characteristics of a Data Warehouse:
Characteristics and Functions of Data Warehouse (Source: GeeksforGeeks)
Data warehouse functions as a repository. It helps organizations avoid the cost of storage systems and backup data at an enterprise level. The prominent functions of the data warehouse are:
Normalization is defined as a way of data re-organization. This helps meet two main requirements in an enterprise data warehouse i.e. eliminating data redundancy and protecting data dependency. On the other hand, denormalization increases the functionality of the database system’s infrastructure.
The main differences between data warehouse and database are summarized in the table below:
Database | Data Warehouse |
A database is an amalgamation of related data. | Data warehouse serves as an information system that contains historical and commutative data from one or several sources. |
A database is used for recording data. | A data warehouse is used for analyzing data. |
A database is an application-oriented collection of data. | Data warehouse is the subject-oriented collection of data. |
A database uses Online Transactional Processing (OLTP). | Data warehouse uses Online Analytical Processing (OLAP). |
Database tables and joins are normalized, therefore, more complicated. | Data warehouse tables and joins are denormalized, hence simpler. |
ER modeling techniques are used for designing. | Data modeling techniques are used for designing. |
Both data warehouse design methodologies have their own pros and cons. Let’s go through them in detail to figure out which one is better.
Initiated by Ralph Kimball, the Kimball data model follows a bottom-up approach to data warehouse architecture design in which data marts are first formed based on the business requirements.
The primary data sources are then evaluated, and an Extract, Transform and Load (ETL) tool is used to fetch data from several sources and load it into a staging area of the relational database server. Once data is uploaded in the data warehouse staging area, the next phase includes loading data into a dimensional data warehouse model that’s denormalized by nature. This model partitions data into the fact table, which is numeric transactional data or dimension table, which is the reference information that supports facts.
Star schema is the fundamental element of the dimensional data warehouse model. The combination of a fact table with several dimensional tables is often called the star schema. Kimball dimensional modeling allows users to construct several star schemas to fulfill various reporting needs. The advantage of star schema is that small dimensional-table queries run instantaneously.
To integrate data, Kimball approach to Data Warehouse lifecycle suggests the idea of conformed data dimensions. It exists as a basic dimension table shared across different fact tables (such as customer and product) within a data warehouse or as the same dimension tables in various Kimball data marts. This guarantees that a single data item is used in a similar manner across all the facts.
An important design tool in Ralph Kimball’s data warehouse methodology is the enterprise bus matrix or Kimball bus architecture that vertically records the facts and horizontally records the conformed dimensions. The Kimball matrix, which is a part of bus architecture, displays how star schemas are constructed. It is used by business management teams as an input to prioritize which row of the Kimball matrix should be implemented first.
The Kimball approach to data warehouse lifecycle is also based on conformed facts, i.e. data marts that are separately implemented together with a robust architecture.
Figure 2. Basic Kimball Data Warehouse architecture explained (Source: Zentut)
Some of the main benefits of the Kimball Data Warehousing Concept include:
Kimball Approach to Data Warehouse Lifecycle (Source: Kimball Group)
Some of the drawbacks of the Kimball Data Warehousing design concept include:
Bill Inmon, the father of data warehousing, came up with the concept to develop a data warehouse which identifies the main subject areas and entities the enterprise works with, such as customers, product, vendor, and so on. Bill Inmon’s definition of a data warehouse is that it is a “subject-oriented, nonvolatile, integrated, time-variant collection of data in support of management’s decisions.”
The model then creates a thorough, logical model for every primary entity. For instance, a logical model is constructed for products with all the attributes associated with that entity. This logical model could include ten diverse entities under product, including all the details, such as business drivers, aspects, relationships, dependencies, and affiliations.
The Bill Inmon design approach uses the normalized form for building entity structure, avoiding data redundancy as much as possible. This results in clearly identifying business requirements and preventing any data update irregularities. Moreover, the advantage of this top-down approach in database design is that it is robust to business changes and contains a dimensional perspective of data across data mart.
Next, the physical model is constructed, which follows the normalized structure. This Bill Inmon model creates a single source of truth for the whole business. Data loading becomes less complex due to the normalized structure of the model. However, using this arrangement for querying is challenging as it includes numerous tables and links.
This Inmon data warehouse methodology proposes constructing data marts separately for each division, such as finance, marketing sales, etc. All the data entering the data warehouse is integrated. The data warehouse acts as a single data source for various data marts to ensure integrity and consistency across the enterprise.
Figure 3. Basic Bill Inmon data warehousing architecture explained (Source: Stanford University)
The Bill Inmon design approach offers the following benefits :
The possible drawbacks of this approach are as follows:
Now that we’ve evaluated the Kimball vs. Inmon approach and seen the advantages and drawbacks of both these methods, the question arises: Which one of these data warehouse concepts would best serve your business?
Both these approaches consider data warehouse as a central repository that supports business reporting. Also, both types of approaches use ETL concepts for data loading. However, the main difference lies in modeling data and loading it in the data warehouse.
The approach used for data warehouse construction influences the preliminary delivery time of the warehousing project and the capacity to put up with prospective variations in the ETL design.
Still not sure about the conclusion to Kimball vs. Inmon dilemma? We can help you decide which one of these data warehouse approaches would help improve your data quality management framework in the best way?
We’ve narrowed down a few aspects that can help you decide between the two approaches.
Both Kimball vs. Inmon data warehouse concepts can be used to design data warehouse models successfully. In fact, several enterprises use a blend of both these approaches (called hybrid data model).
In the hybrid data model, the Inmon method creates a dimensional data warehouse model of a data warehouse. In contrast, the Kimball method is followed to develop data marts using the star schema.
It’s impossible to claim which approach is better as both methods have their benefits and drawbacks, working well in different situations. A data warehouse designer has to choose a method, depending on the various factors discussed in this article.
Lastly, for any method to be effective, it has to be well-thought-out, explored in-depth, and developed to gratify your company’s business intelligence reporting requirements.
Astera Data Warehouse Builder offers an integrated platform to design, deploy and test large-volume data warehouses and automate the processes to reach meaningful insights quickly, without the hassle of writing ETL codes.
Organizations are moving toward data warehouse automation to save costs, maximize productivity, and get actionable insights quicker. Data Warehousing Automation allows you to quickly build high-quality data marts, build self-regulating data pipelines, and deliver relevant insights to decision-makers via BI and analytics tools.
Data Warehousing Automation eliminates the most time-consuming part in populating a data warehouse: writing ETL/ELT code. As no SQL hand coding is required, developers can focus their energy on working at a logical level (design level) to create more efficient integration flows.
In addition, automation helps you design an agile data warehouse infrastructure. The result is a more adaptable, responsive data repository that can be queried efficiently, producing valuable insights in seconds and allow you to extract valuable insights.
In a nutshell, removing manual intervention in the planning, modeling, and deployment steps allows you to build a better quality data warehouse with success — that too, in a matter of weeks or even days.