Skip to main content

Chapter 3: Advanced SQL

3.1 Joins and Subqueries

Joins and subqueries are powerful SQL features that allow you to combine data from multiple tables and perform complex queries.

3.1.1 Joins

Joins allow you to combine rows from two or more tables based on a related column between them.

Types of Joins:

  1. INNER JOIN
  2. LEFT (OUTER) JOIN
  3. RIGHT (OUTER) JOIN
  4. FULL (OUTER) JOIN

Example of INNER JOIN:

SELECT Orders.OrderID, Customers.Name, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;

3.1.2 Subqueries

A subquery is a query nested inside another query. It can be used in various parts of an SQL statement.

Example of a subquery in WHERE clause:

SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM Products);

3.2 Aggregate Functions and Grouping

Aggregate functions perform calculations on a set of values and return a single result. They are often used with the GROUP BY clause.

3.2.1 Common Aggregate Functions

  • COUNT()
  • SUM()
  • AVG()
  • MAX()
  • MIN()

Example:

SELECT CategoryID, AVG(UnitPrice) as AveragePrice
FROM Products
GROUP BY CategoryID;

3.2.2 HAVING Clause

The HAVING clause is used to filter the results of GROUP BY based on an aggregate function.

Example:

SELECT CategoryID, AVG(UnitPrice) as AveragePrice
FROM Products
GROUP BY CategoryID
HAVING AVG(UnitPrice) > 50;

3.3 Indexes and Query Optimization

Indexes are used to speed up data retrieval operations on database tables.

3.3.1 Creating Indexes

CREATE INDEX idx_lastname
ON Employees (LastName);

3.3.2 Types of Indexes

  1. Single-Column Indexes
  2. Composite Indexes
  3. Unique Indexes
  4. Clustered Indexes
  5. Non-Clustered Indexes

3.3.3 Query Optimization Techniques

  1. Use appropriate indexes
  2. Avoid using functions in WHERE clauses
  3. Use JOIN instead of subqueries when possible
  4. Limit the use of wildcard characters at the beginning of a LIKE pattern

Example of optimizing a query:

Before optimization:

SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2023;

After optimization:

SELECT * FROM Orders
WHERE OrderDate >= '2023-01-01' AND OrderDate < '2024-01-01';

3.4 Real-life Example: Analyzing Sales Data for a Retail Chain

Let's consider a retail chain with multiple stores across different regions. We'll use advanced SQL techniques to analyze their sales data.

Step 1: Table Structure

CREATE TABLE Stores (
StoreID INT PRIMARY KEY,
StoreName VARCHAR(100),
Region VARCHAR(50)
);

CREATE TABLE Products (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(100),
Category VARCHAR(50),
UnitPrice DECIMAL(10, 2)
);

CREATE TABLE Sales (
SaleID INT PRIMARY KEY,
StoreID INT,
ProductID INT,
SaleDate DATE,
Quantity INT,
FOREIGN KEY (StoreID) REFERENCES Stores(StoreID),
FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

Step 2: Sample Queries

  1. Total sales by region:
SELECT s.Region, SUM(p.UnitPrice * sa.Quantity) as TotalSales
FROM Sales sa
JOIN Stores s ON sa.StoreID = s.StoreID
JOIN Products p ON sa.ProductID = p.ProductID
GROUP BY s.Region;
  1. Top 5 bestselling products:
SELECT p.ProductName, SUM(sa.Quantity) as TotalQuantitySold
FROM Sales sa
JOIN Products p ON sa.ProductID = p.ProductID
GROUP BY p.ProductID, p.ProductName
ORDER BY TotalQuantitySold DESC
LIMIT 5;
  1. Stores with above-average sales:
SELECT s.StoreName, SUM(p.UnitPrice * sa.Quantity) as TotalSales
FROM Sales sa
JOIN Stores s ON sa.StoreID = s.StoreID
JOIN Products p ON sa.ProductID = p.ProductID
GROUP BY s.StoreID, s.StoreName
HAVING TotalSales > (
SELECT AVG(StoreSales)
FROM (
SELECT SUM(p.UnitPrice * sa.Quantity) as StoreSales
FROM Sales sa
JOIN Products p ON sa.ProductID = p.ProductID
GROUP BY sa.StoreID
) as AverageSales
);
  1. Monthly sales trend:
SELECT 
EXTRACT(YEAR FROM SaleDate) as Year,
EXTRACT(MONTH FROM SaleDate) as Month,
SUM(p.UnitPrice * sa.Quantity) as MonthlySales
FROM Sales sa
JOIN Products p ON sa.ProductID = p.ProductID
GROUP BY Year, Month
ORDER BY Year, Month;

Step 3: Optimizing Queries

To optimize these queries, we can create the following indexes:

CREATE INDEX idx_sales_date ON Sales(SaleDate);
CREATE INDEX idx_sales_store ON Sales(StoreID);
CREATE INDEX idx_sales_product ON Sales(ProductID);

These indexes will significantly improve the performance of our queries, especially for large datasets.

Understanding how to write and optimize complex SQL queries like these is crucial for database professionals. In real-world scenarios, you'll often need to extract meaningful insights from large datasets, and your ability to do so efficiently can greatly impact business decision-making processes.