Skip to main content

Chapter 8: Data Warehousing and Business Intelligence

8.1 Data Warehouse Architecture

A data warehouse is a system used for reporting and data analysis, and is considered a core component of business intelligence.

8.1.1 Components of a Data Warehouse

  1. Data Sources
  2. ETL (Extract, Transform, Load) Process
  3. Data Warehouse Database
  4. Data Marts
  5. Metadata Repository
  6. Query and Analysis Tools

8.1.2 Data Warehouse Models

  1. Star Schema
  2. Snowflake Schema
  3. Galaxy Schema

Example of a Star Schema:

CREATE TABLE DimCustomer (
CustomerKey INT PRIMARY KEY,
CustomerName VARCHAR(100),
City VARCHAR(50),
State VARCHAR(50)
);

CREATE TABLE DimProduct (
ProductKey INT PRIMARY KEY,
ProductName VARCHAR(100),
Category VARCHAR(50)
);

CREATE TABLE DimDate (
DateKey INT PRIMARY KEY,
Date DATE,
Year INT,
Month INT,
Quarter INT
);

CREATE TABLE FactSales (
SalesKey INT PRIMARY KEY,
CustomerKey INT,
ProductKey INT,
DateKey INT,
SalesAmount DECIMAL(10,2),
FOREIGN KEY (CustomerKey) REFERENCES DimCustomer(CustomerKey),
FOREIGN KEY (ProductKey) REFERENCES DimProduct(ProductKey),
FOREIGN KEY (DateKey) REFERENCES DimDate(DateKey)
);

8.2 ETL Processes

ETL stands for Extract, Transform, and Load. It's the process of taking data from source systems and bringing it into the data warehouse.

8.2.1 Extract

Involves extracting data from homogeneous or heterogeneous data sources.

8.2.2 Transform

Involves converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database.

Transformation processes:

  1. Cleaning
  2. Standardizing
  3. Deduplicating
  4. Sorting
  5. Aggregating

8.2.3 Load

Involves loading the transformed data into the end target, usually the data warehouse database.

Example of an ETL process using SQL:

-- Extract
INSERT INTO StagingCustomers (CustomerID, Name, City, State)
SELECT CustomerID, CustomerName, City, State
FROM SourceSystem.Customers;

-- Transform
UPDATE StagingCustomers
SET City = UPPER(City),
State = UPPER(State);

-- Load
INSERT INTO DimCustomer (CustomerKey, CustomerName, City, State)
SELECT CustomerID, Name, City, State
FROM StagingCustomers;

8.3 OLAP vs. OLTP

8.3.1 OLTP (Online Transaction Processing)

  • Handles day-to-day transactions
  • Optimized for write operations
  • Deals with current data

8.3.2 OLAP (Online Analytical Processing)

  • Used for complex queries and analysis
  • Optimized for read operations
  • Deals with historical data

Key Differences:

AspectOLTPOLAP
PurposeDay-to-day operationsData analysis
DataCurrent, operationalHistorical, consolidated
DatabaseNormalized, ER modelDenormalized, Star schema
QueriesSimple, standardizedComplex, ad hoc
PerformanceTransaction speedQuery speed

8.4 Business Intelligence

Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information.

8.4.1 BI Tools

  1. Tableau
  2. Power BI
  3. QlikView
  4. SAP BusinessObjects

8.4.2 Key BI Features

  1. Reporting
  2. Dashboards
  3. Data Visualization
  4. Ad Hoc Analysis
  5. Predictive Analytics

8.5 Real-life Example: Building a Data Warehouse for a Multinational Corporation

Let's consider building a data warehouse for a multinational retail corporation to analyze sales data across different regions and product categories.

Step 1: Design the Data Warehouse Schema

We'll use a star schema for our data warehouse:

-- Dimension Tables
CREATE TABLE DimStore (
StoreKey INT PRIMARY KEY,
StoreID VARCHAR(10),
StoreName VARCHAR(100),
City VARCHAR(50),
Country VARCHAR(50),
OpenDate DATE
);

CREATE TABLE DimProduct (
ProductKey INT PRIMARY KEY,
ProductID VARCHAR(10),
ProductName VARCHAR(100),
Category VARCHAR(50),
SubCategory VARCHAR(50),
Brand VARCHAR(50)
);

CREATE TABLE DimDate (
DateKey INT PRIMARY KEY,
Date DATE,
Day INT,
Month INT,
Quarter INT,
Year INT,
IsHoliday BIT
);

-- Fact Table
CREATE TABLE FactSales (
SalesKey INT PRIMARY KEY,
StoreKey INT,
ProductKey INT,
DateKey INT,
SalesAmount DECIMAL(10,2),
Quantity INT,
Discount DECIMAL(5,2),
FOREIGN KEY (StoreKey) REFERENCES DimStore(StoreKey),
FOREIGN KEY (ProductKey) REFERENCES DimProduct(ProductKey),
FOREIGN KEY (DateKey) REFERENCES DimDate(DateKey)
);

Step 2: Implement ETL Processes

Here's a simplified ETL process using SQL:

-- Extract and transform store data
INSERT INTO DimStore (StoreKey, StoreID, StoreName, City, Country, OpenDate)
SELECT
ROW_NUMBER() OVER (ORDER BY StoreID),
StoreID,
StoreName,
City,
Country,
OpenDate
FROM SourceSystem.Stores;

-- Extract and transform product data
INSERT INTO DimProduct (ProductKey, ProductID, ProductName, Category, SubCategory, Brand)
SELECT
ROW_NUMBER() OVER (ORDER BY ProductID),
ProductID,
ProductName,
Category,
SubCategory,
Brand
FROM SourceSystem.Products;

-- Extract and transform sales data
INSERT INTO FactSales (SalesKey, StoreKey, ProductKey, DateKey, SalesAmount, Quantity, Discount)
SELECT
ROW_NUMBER() OVER (ORDER BY s.SalesID),
ds.StoreKey,
dp.ProductKey,
dd.DateKey,
s.SalesAmount,
s.Quantity,
s.Discount
FROM SourceSystem.Sales s
JOIN DimStore ds ON s.StoreID = ds.StoreID
JOIN DimProduct dp ON s.ProductID = dp.ProductID
JOIN DimDate dd ON s.SaleDate = dd.Date;

Step 3: Create OLAP Cubes

Using SQL Server Analysis Services (SSAS), we can create OLAP cubes for faster analysis:

<Cube name="Sales">
<MeasureGroups>
<MeasureGroup name="FactSales">
<Measures>
<Measure name="Sales Amount" column="SalesAmount"/>
<Measure name="Quantity" column="Quantity"/>
</Measures>
</MeasureGroup>
</MeasureGroups>
<Dimensions>
<Dimension name="Store" table="DimStore"/>
<Dimension name="Product" table="DimProduct"/>
<Dimension name="Date" table="DimDate"/>
</Dimensions>
</Cube>