Query Processing, Indexing (B-tree, Hashing), and Advanced Database Systems

Unit 5•CLO02, CLO05

When you execute a SQL query, the database doesn't just blindly run it. Behind the scenes, the query processor analyzes, transforms, and optimizes it to find the fastest execution path. Understanding query processing, indexing, and advanced database architectures is crucial for building high-performance applications that scale from single servers to globally distributed systems handling petabytes of data.

The Basics

Query Processing

Definition: Activities to transform high-level query into efficient execution plan

Steps:

Parsing and Translation
- Check syntax
- Verify relations and attributes exist
- Translate SQL to internal representation (relational algebra)
Optimization
- Generate alternative execution plans
- Estimate cost of each plan
- Choose plan with lowest cost
Evaluation
- Execute chosen plan
- Return results to user

Why Optimization Matters:

Same query, different plans
Execution time can vary by orders of magnitude
Example:
- Bad plan: 10 minutes
- Good plan: 1 second
- 600x speedup!

Indexing

Purpose: Fast data retrieval without scanning entire table

Analogy: Book index vs reading cover to cover

Trade-offs:

Benefit: Fast SELECT queries
Cost: Slower INSERT/UPDATE/DELETE, storage overhead

Types:

B-tree Index: Balanced tree, range queries
Hash Index: Hash table, equality queries
Bitmap Index: Bit vectors, low-cardinality columns
Full-text Index: Text search

Advanced Database Systems

Beyond traditional single-server relational databases:

Distributed Databases: Data across multiple sites
Data Warehouses: Historical data for analytics
NoSQL: Non-relational, scalable
Mobile Databases: Lightweight, synchronization
Web Databases: Internet-scale, cloud-native

Technical Details

QUERY PROCESSING DETAILED

1. SELECTION

Algorithms:

A1 (Linear Search):

Scan entire file
Cost: br (number of blocks)
Use when: No index, unsorted

A2 (Binary Search):

Requires sorted file
Cost: ⌈log₂(br)⌉ + ⌈sc/fr⌉
Use when: Sorted on search key

A3 (Primary Index, Equality):

Use B+-tree or hash index
Cost: HTi + 1 (tree height + 1 data block)
Use when: Primary key equality

A4 (Primary Index, Range):

Use B+-tree index
Cost: HTi + b (blocks in range)
Use when: Range query on primary key

A5 (Secondary Index):

May retrieve many blocks
Cost: HTi + n (n matching records)
Use when: Secondary key query

2. SORTING

External Sort-Merge:

Given: M memory buffers, br file blocks

Pass 0 (Create Runs):

Read M blocks, sort in memory, write
Creates ⌈br/M⌉ runs of M blocks each

Pass 1, 2, ... (Merge):

Merge M-1 runs at a time
Number of passes: ⌈log_{M-1}(br/M)⌉

Total Cost:

Read + Write each block in each pass
2br × (number of passes)

Example:

File: 1000 blocks
Memory: 5 buffers
Pass 0: ⌈1000/5⌉ = 200 runs
Pass 1: ⌈200/4⌉ = 50 runs
Pass 2: ⌈50/4⌉ = 13 runs
Pass 3: ⌈13/4⌉ = 4 runs
Pass 4: ⌈4/4⌉ = 1 run
Total: 5 passes, cost = 2×1000×5 = 10,000 block accesses

3. JOIN ALGORITHMS

Nested-Loop Join:

for each tuple tr in R:
    for each tuple ts in S:
        if tr and ts match:
            output <tr, ts>

Cost: nr × br + bs
(For each R tuple, scan entire S)

Block Nested-Loop Join:

for each block Br of R:
    for each block Bs of S:
        for each tuple tr in Br:
            for each tuple ts in Bs:
                if match: output

Cost: br + br × bs
(For each R block, scan entire S)

Better: Use smaller relation as outer

Indexed Nested-Loop Join:

for each tuple tr in R:
    use index to find matching tuples in S
    for each match: output

Cost: br + nr × c
(c = cost of index lookup)

Best when: S has index on join attribute

Sort-Merge Join:

Algorithm:

Sort R on join attribute
Sort S on join attribute
Merge sorted relations

Cost:

Sort R: 2br × ⌈log_{M-1}(br/M)⌉
Sort S: 2bs × ⌈log_{M-1}(bs/M)⌉
Merge: br + bs
Total: Sort cost + br + bs

Best when: Relations already sorted or result needed sorted

Hash Join:

Build Phase:

Hash smaller relation R into M-1 buckets
Keep hash table in memory

Probe Phase:

Hash each tuple of S
Probe corresponding bucket
Output matches

Cost: 3(br + bs)
(Read both once for partition, read again for join, write once)

Best when: Equi-join, enough memory for hash table

Most commonly used in practice!

QUERY OPTIMIZATION

Cost-Based Optimization:

Factors:

Access Cost: Disk I/O
Storage Cost: Intermediate results
Computation Cost: CPU operations
Memory Cost: Buffers needed
Communication Cost: Distributed systems

Statistics Needed:

nr: Number of tuples in relation r
br: Number of blocks containing tuples of r
lr: Size of tuple in relation r
fr: Blocking factor (tuples per block)
V(A, r): Number of distinct values of attribute A in r

Selectivity Estimation:

Equality: σA=v(r)

Selectivity: 1/V(A, r)
Result size: nr / V(A, r)

Range: σA≥v(r)

Selectivity: (max(A) - v) / (max(A) - min(A))
Result size: nr × selectivity

Conjunction: σA=v ∧ B=w(r)

Selectivity: s(A) × s(B) (independence assumption)

Disjunction: σA=v ∨ B=w(r)

Selectivity: s(A) + s(B) - s(A) × s(B)

Join Size Estimation:

Natural Join: R ⋈ S

If R.A is key for R:

Size ≤ |S| (each S tuple matches at most one R tuple)

If S.A is key for S:

Size ≤ |R|

If neither is key:

Size ≤ |R| × |S| / max(V(A,R), V(A,S))

Transformation Rules:

σc1∧c2(r) = σc1(σc2(r))
(Cascade selections)
σc1(σc2(r)) = σc2(σc1(r))
(Commutative selections)
πL1(πL2(r)) = πL1(r) if L1 ⊆ L2
(Cascade projections)
σc(r ⋈ s) = σc(r) ⋈ s if c uses only r attributes
(Push selection through join)
r ⋈ s = s ⋈ r
(Join commutativity)
(r ⋈ s) ⋈ t = r ⋈ (s ⋈ t)
(Join associativity)

Optimization Strategy:

Push selections down (reduce tuples early)
Push projections down (reduce attributes early)
Combine selections and cross-products into joins
Choose join order using dynamic programming
Use indexes when beneficial

Example Optimization:

Original:

πName(σDeptName='IT'(Employee × Department))

Problems:

Cartesian product very expensive
Selection after product

Optimized:

πName(σSalary>50000(Employee) ⋈ σDeptName='IT'(Department))

Benefits:

Selections pushed down (smaller intermediate results)
Product replaced with join
May use indexes on selections

B-TREE INDEX

Properties:

Balanced tree: All leaves at same depth
Order n: Each node has [⌈n/2⌉, n] keys (except root)
Root has [1, n] keys
Internal node with k keys has k+1 children

Structure:

Internal Node: [P₁ K₁ P₂ K₂ ... Pₙ Kₙ Pₙ₊₁]
- Kᵢ: Search keys
- Pᵢ: Pointers to children
- Keys in Pᵢ < Kᵢ < keys in Pᵢ₊₁

Search Algorithm:

Search(key K, node N):
    if N is leaf:
        return record with key K in N
    else:
        find i such that Kᵢ ≤ K < Kᵢ₊₁
        return Search(K, child Pᵢ₊₁)

Cost:

Height h = ⌈log_⌈n/2⌉(N)⌉ where N = number of keys
Search cost: h disk reads

Insertion:

Search for leaf where key should go
If leaf has space, insert key
If leaf full, split:
- Create new leaf
- Distribute keys evenly
- Promote middle key to parent
- Recursively split parent if full

Deletion:

Search for key and remove
If node underflows (< ⌈n/2⌉ keys):
- Try to borrow from sibling
- If can't borrow, merge with sibling
- Recursively fix parent

B+-TREE INDEX

Differences from B-tree:

All data in leaves
- Internal nodes only have keys + pointers
- Leaves contain actual data records or pointers
Leaves linked
- Sequential access without tree traversal
- Efficient range queries
Redundant keys
- Keys in internal nodes repeated in leaves

Structure:

Internal Node: [P₁ K₁ P₂ K₂ ... Pₙ]
- Guide searches, no data

Leaf Node: [K₁ Data₁ K₂ Data₂ ... Kₙ Datₙ Next]
- Contains actual records
- Linked to next leaf (→)

Advantages:

Higher fanout (more keys per internal node)
Shorter tree (fewer disk I/Os)
Efficient range scans (follow leaf links)
Most databases use B+-tree (not B-tree)

Example B+-tree (order 3):

                    [30]
                   /    \
             [10,20]    [40,50]
            /  |  \     /  |  \
Leaves: [5,7][12,15][25,28][35,38][45,48][55,60]
         ↓     ↓      ↓       ↓       ↓       ↓
Leaves linked: → → → → → →

Range Query Example:

Find all keys 12 ≤ key ≤ 45:

Search for 12 → finds leaf [12,15]
Follow leaf links: [12,15] → [25,28] → [35,38] → [45,48]
Stop when key > 45
Cost: O(log n) + number of result blocks

HASHING INDEX

Static Hashing:

Hash Function: h(K) = K mod M

Buckets: M buckets, each can hold multiple records

Search:

1. Compute h(K)
2. Access bucket h(K)
3. Search bucket for key K

Cost: 1-2 disk reads (bucket + maybe overflow)

Collision Handling:

Chaining: Link overflow buckets
Open addressing: Probe next bucket

Problem: Fixed M, poor for growing data

Dynamic Hashing:

Extendible Hashing:

Use d bits of hash value
d grows/shrinks dynamically
Directory of 2^d entries
Buckets split when full

Example:

d=2 (use 2 bits)
Directory:
00 → Bucket A
01 → Bucket B
10 → Bucket C
11 → Bucket D

Insert causes overflow in Bucket A:
- Split Bucket A into A1 (00) and A2 (10)
- If directory full, double directory (d=3)

Linear Hashing:

No directory
Buckets split in linear order
Uses two hash functions
Gradual growth

Hash vs B+-tree:

Feature	Hash	B+-tree
Equality	O(1)	O(log n)
Range	O(n)	O(log n + result)
Ordered	No	Yes
Dynamic	Complex	Easy

Use hash for: Exact match lookups (primary keys)
Use B+-tree for: Everything else (default)

DISTRIBUTED DATABASES

Definition: Database stored across multiple sites connected by network

Types:

Homogeneous: Same DBMS at all sites
Heterogeneous: Different DBMS at different sites

Data Distribution:

1. Fragmentation:

Horizontal:

Split rows
Example: Customers_East, Customers_West

Vertical:

Split columns
Example: Employee_Public(ID, Name), Employee_Private(ID, Salary)

Hybrid:

Both horizontal and vertical

2. Replication:

Full Replication:

Complete copy at each site
Fast reads, slow writes
High availability

Partial Replication:

Some fragments replicated
Balance availability and cost

No Replication:

Each fragment at one site only
Fragmentation only

Advantages:

Local autonomy
Improved performance (data locality)
Improved reliability (no single point of failure)
Easier expansion

Challenges:

Distributed query processing
Distributed transaction management
Maintaining consistency (replication)
Increased complexity

Distributed Transactions:

Two-Phase Commit (2PC):

Phase 1 (Prepare):

Coordinator sends "Prepare" to all participants
Participants execute transaction
Participants respond "Ready" or "Abort"

Phase 2 (Commit/Abort):

If all "Ready": Coordinator sends "Commit"
If any "Abort": Coordinator sends "Abort"
Participants commit/abort and ACK
Coordinator completes

Problem: Blocking if coordinator fails

Three-Phase Commit (3PC):

Adds "Pre-commit" phase
Non-blocking
More complex

CAP Theorem:

Can achieve at most 2 of 3:

Consistency: All nodes see same data
Availability: Every request gets response
Partition Tolerance: System works despite network failures

Trade-offs:

Traditional RDBMS: CA (sacrifice P)
NoSQL: Often AP or CP

DATA WAREHOUSING

Definition: Subject-oriented, integrated, time-variant, non-volatile collection for decision support

Characteristics:

Subject-Oriented:
- Organized by business subjects (Sales, Inventory)
- Not by applications
Integrated:
- Data from multiple sources
- Consistent naming, encoding, attributes
Time-Variant:
- Historical data
- Timestamped
Non-Volatile:
- Read-only (mostly)
- Loaded and accessed, not updated

Architecture:

Data Sources (OLTP)
   ↓
ETL (Extract, Transform, Load)
   ↓
Data Warehouse (Central)
   ↓
Data Marts (Departments)
   ↓
OLAP / BI Tools
   ↓
Reports / Dashboards

ETL Process:

Extract:

Read from source systems
Incremental or full extraction

Transform:

Cleanse data (handle nulls, errors)
Convert data types
Standardize formats
Apply business rules
Aggregate data

Load:

Write to data warehouse
Incremental or full load
Schedule: nightly, hourly, real-time

Dimensional Modeling:

Star Schema:

     DimTime
        |
        |
DimProduct - FactSales - DimCustomer
        |
        |
     DimStore

Fact Table: Measures (sales, quantity, revenue)
Dimension Tables: Context (who, what, when, where, why)

Snowflake Schema:

Normalized dimensions
Dimension tables have foreign keys to other dimension tables
More complex, less redundant

OLAP Operations:

Roll-Up: Aggregate to higher level
- Daily → Monthly → Yearly
Drill-Down: Detail to lower level
- Yearly → Quarterly → Monthly
Slice: Fix one dimension
- Sales in 2024
Dice: Fix multiple dimensions
- Sales in 2024, Q1, West region
Pivot: Rotate cube
- Rows ↔ Columns

DATA MINING

Definition: Discovering patterns, correlations, trends in large datasets

KDD Process:

Data Selection: Choose relevant data
Pre-processing: Clean, integrate
Transformation: Normalize, aggregate
Data Mining: Apply algorithms
Interpretation: Evaluate patterns
Knowledge: Act on insights

Techniques:

1. Association Rules:

Market Basket Analysis:

{Bread, Butter} ⇒ {Milk}
Support = 30% (30% of transactions have all three)
Confidence = 80% (80% with bread and butter also have milk)

Apriori Algorithm:

Find frequent itemsets (support ≥ threshold)
Generate association rules (confidence ≥ threshold)

2. Classification:

Assign items to predefined classes:

Decision Trees (ID3, C4.5, CART)
Naive Bayes
Support Vector Machines (SVM)
Neural Networks

Example: Email spam classification

3. Clustering:

Group similar items (no predefined classes):

K-Means
Hierarchical Clustering
DBSCAN (density-based)

Example: Customer segmentation

4. Regression:

Predict numeric values:

Linear Regression
Logistic Regression
Polynomial Regression

Example: House price prediction

5. Anomaly Detection:

Identify outliers:

Statistical methods
Distance-based
Density-based

Example: Fraud detection, intrusion detection

MOBILE AND WEB DATABASES

Mobile Databases:

Characteristics:

Limited resources (CPU, memory, battery)
Intermittent connectivity
Location-aware
Lightweight

Examples:

SQLite (embedded)
Realm
Couchbase Lite

Challenges:

Synchronization with server
Offline operation
Conflict resolution
Battery efficiency

Synchronization Strategies:

1. Full Sync:

Download entire database
Simple but inefficient

2. Incremental Sync:

Only changed data
Requires tracking changes

3. Bidirectional Sync:

Client ↔ Server
Conflict resolution needed

Conflict Resolution:

Last Write Wins (LWW)
Version Vectors
Application-specific logic

Web Databases:

Backend (Server-Side):

Traditional RDBMS (MySQL, PostgreSQL)
NoSQL (MongoDB, Cassandra)
Cloud databases (Aurora, Cloud SQL)

Frontend (Client-Side):

IndexedDB (browser storage)
LocalStorage (key-value, 5-10MB limit)
WebSQL (deprecated)

Architecture Patterns:

1. Traditional (3-Tier):

Browser → Web Server → Database

2. RESTful API:

Frontend (React/Angular)
   ↓ HTTP/REST
Backend (Node.js/Django)
   ↓ SQL/NoSQL
Database

3. Microservices:

Frontend → API Gateway → Service1 (DB1)
                       → Service2 (DB2)
                       → Service3 (DB3)

4. Serverless:

Frontend → Cloud Functions → Managed Database
(AWS Lambda, Firebase)

Scalability Strategies:

Vertical Scaling:

Bigger server (more CPU, RAM)
Limited by hardware

Horizontal Scaling:

More servers
Requires distribution strategy

Caching:

Redis, Memcached
Reduce database load

Content Delivery Network (CDN):

Cache static content
Serve from edge locations

Cloud Databases:

SQL:

Amazon RDS
Google Cloud SQL
Azure SQL Database

NoSQL:

Amazon DynamoDB
MongoDB Atlas
Google Firestore

Data Warehouse:

Amazon Redshift
Google BigQuery
Snowflake

Advantages:

Managed (less ops)
Scalable
High availability
Pay-as-you-go

Disadvantages:

Vendor lock-in
Cost (can be high at scale)
Less control

Examples

COMPREHENSIVE EXAMPLES

EXAMPLE 1: Query Optimization

Original Query:

SELECT E.Name, D.DeptName
FROM Employee E, Department D, Salary S
WHERE E.DeptID = D.DeptID
  AND E.EmpID = S.EmpID
  AND D.DeptName = 'IT'
  AND S.Salary > 50000;

Inefficient Plan:

1. Employee × Department × Salary  -- Huge cartesian product!
2. σ(DeptName='IT' ∧ Salary>50000)
3. π(Name, DeptName)

Optimized Plan:

1. σ(DeptName='IT')(Department)  -- Filter early: 1 dept
2. σ(Salary>50000)(Salary)       -- Filter early: ~10% of rows
3. Result1 ⋈(DeptID) Employee    -- Join filtered dept with employees
4. Result2 ⋈(EmpID) Result2      -- Join with filtered salaries
5. π(Name, DeptName)(Result3)

Improvements:

Push selections down (reduce intermediate results)
Replace cartesian products with joins
Use indexes on DeptName and Salary

Cost Reduction: 1000x or more!

EXAMPLE 2: Join Algorithm Selection

Query:

SELECT *
FROM Orders O, Customer C
WHERE O.CustomerID = C.CustomerID;

Scenario A: No Index

Orders: 1M rows, 10,000 blocks
Customer: 100K rows, 1,000 blocks

Block Nested-Loop:

Cost = 10,000 + 10,000 × 1,000 = 10,010,000 block reads

Sort-Merge (M=100 buffers):

Sort Orders: 2 × 10,000 × ⌈log₉₉(100)⌉ = 40,000
Sort Customer: 2 × 1,000 × ⌈log₉₉(10)⌉ = 2,000
Merge: 10,000 + 1,000 = 11,000
Total = 53,000 block reads

Hash Join:

Partition: 2 × (10,000 + 1,000) = 22,000
Join: 10,000 + 1,000 = 11,000
Total = 33,000 block reads

Winner: Hash Join (300x faster than Nested-Loop!)

Scenario B: Index on Customer.CustomerID

Indexed Nested-Loop:

Cost = 10,000 + 1,000,000 × 2 = 2,010,000 block reads
(For each Order, 2 reads: index + data)

Winner (B): Still Hash Join! (16x faster)

EXAMPLE 3: B+-tree Operations

B+-tree (order 3, max 3 keys per node):

Initial Tree:

             [20]
            /    \
       [10,15]  [25,30]
       /  |  \    /  |  \
    [5,7][12,13][18,19][22,23][27,28][32,35]
     ↓     ↓       ↓      ↓      ↓      ↓
  (leaves linked)

Insert 14:

Search finds leaf [12,13]
Leaf full (3 keys max), must split
Split [12,13,14] → [12] and [13,14]
Promote 13 to parent [10,15] → [10,13,15]

Result:

             [20]
            /    \
      [10,13,15]  [25,30]
      /  |  |  \    /  |  \
   [5,7][12][13,14][18,19][22,23][27,28][32,35]

Insert 16, 17 (causes cascade):

Leaf [18,19] becomes [16,17,18,19] (overflow!)
Split → [16,17] and [18,19], promote 18
Parent [10,13,15] becomes [10,13,15,18] (overflow!)
Split parent → [10,13] and [15,18], promote 15 to root
Root [20] becomes [15,20]

Result:

           [15,20]
          /   |   \
    [10,13] [15,18] [25,30]
      ...     ...     ...

Delete 12:

Remove 12 from leaf [12]
Leaf empty (underflow!)
Try to borrow from sibling [13,14]
Success: Redistribute → [13] and [14]
Update parent key 13 → 14

EXAMPLE 4: Hash Index

Static Hash (M=5 buckets):

h(K) = K mod 5

Insert: 5, 15, 25, 3, 8, 13, 18, 23, 28

Bucket 0: 5 → 15 → 25 (overflow chain!)
Bucket 1: empty
Bucket 2: empty
Bucket 3: 3 → 8 → 13 → 18 → 23 → 28 (long chain!)
Bucket 4: empty

Problem: Clustering, poor distribution

Extendible Hash (d=2):

h(K) = last 2 bits of K

Directory (2² = 4 entries):
00 → Bucket A: [4, 8, 12]
01 → Bucket B: [5, 9, 13]
10 → Bucket C: [6, 10, 14]
11 → Bucket D: [7, 11, 15]

Insert 16 (binary: ...10000, last 2 bits: 00):
- Goes to Bucket A
- Bucket A full (assume capacity=3)

Split:
- Create Bucket A' for keys with last 3 bits 000
- Keep Bucket A for keys with last 3 bits 100
- Increase directory depth d=3

New Directory (2³ = 8 entries):
000 → Bucket A': [8]
001 → Bucket B
010 → Bucket C
011 → Bucket D
100 → Bucket A: [4, 12, 16]
101 → Bucket B
110 → Bucket C
111 → Bucket D

EXAMPLE 5: Distributed Query

Query:

SELECT *
FROM Orders O, Customer C
WHERE O.CustomerID = C.CustomerID
  AND C.City = 'Mumbai';

Data Distribution:

Orders: Horizontally partitioned by region (Site 1: North, Site 2: South)
Customer: Replicated at all sites

Strategy 1: Centralized

Ship Orders from Site 1 to coordinator
Ship Orders from Site 2 to coordinator
Join at coordinator

Cost: Transfer all Orders (expensive!)

Strategy 2: Distributed Join

Apply σ(City='Mumbai')(Customer) at each site
Join locally at each site
Union results

Cost: Transfer only Mumbai customers (small!)

Example Numbers:

Orders: 1M rows × 1KB = 1GB
Customers: 100K rows × 500B = 50MB
Mumbai Customers: 5K rows = 2.5MB

Cost Comparison:

Centralized: 1GB network transfer
Distributed: 2.5MB network transfer (400x less!)

EXAMPLE 6: Star Schema

Fact Table: FactSales

CREATE TABLE FactSales (
    SaleID INT PRIMARY KEY,
    DateKey INT,           -- FK to DimDate
    ProductKey INT,        -- FK to DimProduct
    CustomerKey INT,       -- FK to DimCustomer
    StoreKey INT,          -- FK to DimStore
    Quantity INT,
    TotalAmount DECIMAL(10,2)
);

Dimension Tables:

CREATE TABLE DimDate (
    DateKey INT PRIMARY KEY,
    Date DATE,
    Day INT,
    Month INT,
    Quarter INT,
    Year INT,
    DayOfWeek VARCHAR(10)
);

CREATE TABLE DimProduct (
    ProductKey INT PRIMARY KEY,
    ProductName VARCHAR(100),
    Category VARCHAR(50),
    Brand VARCHAR(50),
    UnitPrice DECIMAL(10,2)
);

CREATE TABLE DimCustomer (
    CustomerKey INT PRIMARY KEY,
    CustomerName VARCHAR(100),
    City VARCHAR(50),
    State VARCHAR(50),
    Country VARCHAR(50)
);

CREATE TABLE DimStore (
    StoreKey INT PRIMARY KEY,
    StoreName VARCHAR(100),
    City VARCHAR(50),
    Manager VARCHAR(100)
);

OLAP Query:

-- Total sales by quarter and product category
SELECT 
    D.Year,
    D.Quarter,
    P.Category,
    SUM(F.TotalAmount) AS TotalSales
FROM FactSales F
JOIN DimDate D ON F.DateKey = D.DateKey
JOIN DimProduct P ON F.ProductKey = P.ProductKey
WHERE D.Year = 2024
GROUP BY D.Year, D.Quarter, P.Category
ORDER BY D.Quarter, TotalSales DESC;

Roll-Up (Aggregate to Year):

SELECT 
    D.Year,
    P.Category,
    SUM(F.TotalAmount) AS TotalSales
FROM FactSales F
JOIN DimDate D ON F.DateKey = D.DateKey
JOIN DimProduct P ON F.ProductKey = P.ProductKey
WHERE D.Year = 2024
GROUP BY D.Year, P.Category;

Drill-Down (To Month):

SELECT 
    D.Year,
    D.Month,
    P.Category,
    SUM(F.TotalAmount) AS TotalSales
FROM FactSales F
JOIN DimDate D ON F.DateKey = D.DateKey
JOIN DimProduct P ON F.ProductKey = P.ProductKey
WHERE D.Year = 2024 AND D.Quarter = 1
GROUP BY D.Year, D.Month, P.Category;

EXAMPLE 7: Association Rule Mining

Transaction Database:

T1: {Bread, Milk, Eggs}
T2: {Bread, Butter}
T3: {Milk, Butter, Eggs}
T4: {Bread, Milk, Butter}
T5: {Bread, Milk, Eggs, Butter}

Apriori Algorithm (min_support=40%, min_confidence=60%):

Step 1: Find frequent 1-itemsets

{Bread}: 4/5 = 80% ✓
{Milk}: 4/5 = 80% ✓
{Butter}: 4/5 = 80% ✓
{Eggs}: 3/5 = 60% ✓

Step 2: Generate frequent 2-itemsets

{Bread, Milk}: 3/5 = 60% ✓
{Bread, Butter}: 3/5 = 60% ✓
{Milk, Butter}: 3/5 = 60% ✓
{Milk, Eggs}: 3/5 = 60% ✓
{Bread, Eggs}: 2/5 = 40% ✓
{Butter, Eggs}: 2/5 = 40% ✓

Step 3: Generate frequent 3-itemsets

{Bread, Milk, Butter}: 2/5 = 40% ✓
{Bread, Milk, Eggs}: 2/5 = 40% ✓
{Milk, Butter, Eggs}: 2/5 = 40% ✓

Step 4: Generate association rules

{Bread, Milk} ⇒ {Butter}: 
  Support = 40%, Confidence = 2/3 = 67% ✓

{Bread, Butter} ⇒ {Milk}:
  Support = 40%, Confidence = 2/3 = 67% ✓

{Milk, Butter} ⇒ {Bread}:
  Support = 40%, Confidence = 2/3 = 67% ✓

{Milk, Eggs} ⇒ {Bread}:
  Support = 40%, Confidence = 2/3 = 67% ✓

Actionable Insight:

Place Bread, Milk, Butter close together
Bundle discount for combo purchase

EXAMPLE 8: Mobile Database Synchronization

Scenario: Sales app with offline capability

Local SQLite (Mobile):

CREATE TABLE Orders (
    OrderID INTEGER PRIMARY KEY,
    CustomerID INTEGER,
    Amount DECIMAL(10,2),
    Status TEXT,
    CreatedAt DATETIME,
    ModifiedAt DATETIME,
    SyncStatus TEXT  -- 'pending', 'synced', 'conflict'
);

Server Database (PostgreSQL):

CREATE TABLE Orders (
    OrderID SERIAL PRIMARY KEY,
    CustomerID INTEGER,
    Amount DECIMAL(10,2),
    Status TEXT,
    CreatedAt TIMESTAMP,
    ModifiedAt TIMESTAMP,
    Version INTEGER  -- For conflict detection
);

Sync Process:

Upload (Mobile → Server):

def sync_upload():
    pending = db.execute("SELECT * FROM Orders WHERE SyncStatus='pending'")
    for order in pending:
        try:
            # Send to server
            response = api.post('/orders/sync', order)
            if response.status == 'success':
                db.execute("UPDATE Orders SET SyncStatus='synced' WHERE OrderID=?", order.id)
            elif response.status == 'conflict':
                db.execute("UPDATE Orders SET SyncStatus='conflict' WHERE OrderID=?", order.id)
        except NetworkError:
            break  # Will retry later

Download (Server → Mobile):

def sync_download():
    last_sync = get_last_sync_time()
    changes = api.get(f'/orders/changes?since={last_sync}')
    
    for change in changes:
        local = db.execute("SELECT * FROM Orders WHERE OrderID=?", change.id)
        
        if not local:
            # New record
            db.insert(change)
        elif local.ModifiedAt < change.ModifiedAt:
            # Server newer
            db.update(change)
        else:
            # Local newer (conflict)
            handle_conflict(local, change)

Conflict Resolution (Last Write Wins):

def handle_conflict(local, server):
    if server.ModifiedAt > local.ModifiedAt:
        # Server wins
        db.update(server)
    else:
        # Local wins, push again
        db.execute("UPDATE Orders SET SyncStatus='pending' WHERE OrderID=?", local.id)

EXAMPLE 9: NoSQL Data Modeling

Relational (Normalized):

CREATE TABLE Users (
    UserID INT PRIMARY KEY,
    Username VARCHAR(50),
    Email VARCHAR(100)
);

CREATE TABLE Posts (
    PostID INT PRIMARY KEY,
    UserID INT,
    Title VARCHAR(200),
    Content TEXT,
    CreatedAt DATETIME,
    FOREIGN KEY (UserID) REFERENCES Users(UserID)
);

CREATE TABLE Comments (
    CommentID INT PRIMARY KEY,
    PostID INT,
    UserID INT,
    Text TEXT,
    CreatedAt DATETIME,
    FOREIGN KEY (PostID) REFERENCES Posts(PostID)
);

Query (3 joins!):

SELECT U.Username, P.Title, C.Text
FROM Users U
JOIN Posts P ON U.UserID = P.UserID
JOIN Comments C ON P.PostID = C.PostID
WHERE U.UserID = 123;

NoSQL (MongoDB - Denormalized):

{
  "_id": ObjectId("..."),
  "username": "john_doe",
  "email": "john@example.com",
  "posts": [
    {
      "postId": 1,
      "title": "My First Post",
      "content": "Hello world!",
      "createdAt": ISODate("2024-01-15"),
      "comments": [
        {
          "commentId": 1,
          "userId": 456,
          "username": "jane_doe",
          "text": "Great post!",
          "createdAt": ISODate("2024-01-15")
        }
      ]
    }
  ]
}

Query (No joins!):

db.users.findOne({ _id: 123 })

Trade-offs:

Pros: Faster reads (no joins), natural document structure
Cons: Data duplication, update complexity (update username everywhere)

EXAMPLE 10: Cloud Database Auto-Scaling

AWS Aurora Configuration:

# Application Load
Time    Connections    CPU%    Action
08:00   100           20%     Normal (1 reader)
10:00   500           60%     Add reader (2 readers)
12:00   1500          80%     Add reader (3 readers)
14:00   2000          90%     Add reader (4 readers)
16:00   800           40%     Remove reader (3 readers)
20:00   200           25%     Remove reader (2 readers)
00:00   50            10%     Remove reader (1 reader)

Auto-Scaling Policy:

{
  "minCapacity": 1,
  "maxCapacity": 5,
  "scaleUpThreshold": {
    "cpu": 70,
    "connections": 1000
  },
  "scaleDownThreshold": {
    "cpu": 30,
    "connections": 200,
    "cooldownMinutes": 15
  }
}

Cost Optimization:

Peak hours (10:00-16:00): 4 instances × $0.50/hr = $2.00/hr × 6hrs = $12
Off-peak: 1-2 instances × $0.50/hr = ~$0.75/hr × 18hrs = $13.50
Daily Total: ~$25.50
vs Fixed 4 instances: $48/day
Savings: ~47%

Real-World Use

PRACTICAL GUIDELINES

1. Query Optimization Tips:

Use EXPLAIN/EXPLAIN ANALYZE:

-- PostgreSQL
EXPLAIN ANALYZE
SELECT * FROM Orders WHERE CustomerID = 123;

-- MySQL
EXPLAIN
SELECT * FROM Orders WHERE CustomerID = 123;

Output Analysis:

Seq Scan: Full table scan (BAD for large tables)
Index Scan: Using index (GOOD)
Index Only Scan: Only index, no table access (BEST)
Nested Loop: Good for small joins
Hash Join: Good for large equi-joins
Merge Join: Good for pre-sorted data

Common Problems and Fixes:

Problem: Full Table Scan

-- BAD
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- Function on column prevents index use

-- GOOD
SELECT * FROM Orders 
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
-- Index on OrderDate can be used

Problem: Type Mismatch

-- BAD
SELECT * FROM Orders WHERE OrderID = '123';  -- String for INT column

-- GOOD
SELECT * FROM Orders WHERE OrderID = 123;

Problem: OR Conditions

-- BAD
SELECT * FROM Orders 
WHERE Status = 'Pending' OR Status = 'Shipped';

-- GOOD
SELECT * FROM Orders 
WHERE Status IN ('Pending', 'Shipped');
-- Or even better with separate queries and UNION if selective

2. Indexing Strategy:

When to Create Index:

Columns in WHERE clause
Columns in JOIN conditions
Columns in ORDER BY/GROUP BY
Foreign keys

When NOT to Create Index:

Small tables (< 1000 rows)
Columns with low selectivity (e.g., Gender: M/F)
Frequently updated columns
Rarely queried columns

Index Types by Use Case:

-- B-tree (default): Range queries, sorted access
CREATE INDEX idx_orders_date ON Orders(OrderDate);

-- Hash: Exact match only (PostgreSQL)
CREATE INDEX idx_customers_email ON Customers USING HASH (Email);

-- Covering index: Includes all queried columns
CREATE INDEX idx_orders_covering ON Orders(CustomerID, OrderDate, TotalAmount);

-- Partial index: Only subset of rows
CREATE INDEX idx_active_orders ON Orders(OrderDate) 
WHERE Status = 'Active';

-- Composite index: Multiple columns
CREATE INDEX idx_orders_customer_date ON Orders(CustomerID, OrderDate);
-- Good for: WHERE CustomerID=? AND OrderDate=?
-- NOT for: WHERE OrderDate=? (wrong order!)

Index Maintenance:

-- PostgreSQL: Rebuild bloated index
REINDEX INDEX idx_orders_date;

-- MySQL: Rebuild table and indexes
OPTIMIZE TABLE Orders;

-- Check index usage
SELECT 
    schemaname, tablename, indexname, 
    idx_scan, idx_tup_read
FROM pg_stat_user_indexes
WHERE idx_scan = 0;  -- Never used!

3. Distributed Database Best Practices:

Data Partitioning:

-- PostgreSQL: Declarative partitioning
CREATE TABLE Orders (
    OrderID BIGINT,
    OrderDate DATE,
    ...
) PARTITION BY RANGE (OrderDate);

CREATE TABLE Orders_2023 PARTITION OF Orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE Orders_2024 PARTITION OF Orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

Benefits:

Query only relevant partitions
Drop old partitions easily
Better manageability

Replication Configuration:

-- PostgreSQL: Streaming replication
-- On primary
ALTER SYSTEM SET wal_level = 'replica';
ALTER SYSTEM SET max_wal_senders = 5;

-- On replica
primary_conninfo = 'host=primary_host port=5432 user=replicator'
hot_standby = on

Load Balancing:

# Application-level read/write splitting
def get_db_connection(readonly=False):
    if readonly:
        return connect_to_replica()  # Read from replica
    else:
        return connect_to_primary()  # Write to primary

# Usage
with get_db_connection(readonly=True) as conn:
    results = conn.execute("SELECT ...")  # Read query

with get_db_connection(readonly=False) as conn:
    conn.execute("UPDATE ...")  # Write query
    conn.commit()

4. Data Warehouse Development:

ETL Best Practices:

Incremental Load (Better than Full Load):

-- Track last processed timestamp
CREATE TABLE ETL_Metadata (
    TableName VARCHAR(50) PRIMARY KEY,
    LastLoadTime TIMESTAMP
);

-- Extract only new/modified records
INSERT INTO DW_Orders
SELECT *
FROM OLTP_Orders
WHERE ModifiedAt > (
    SELECT LastLoadTime 
    FROM ETL_Metadata 
    WHERE TableName = 'Orders'
);

-- Update metadata
UPDATE ETL_Metadata
SET LastLoadTime = CURRENT_TIMESTAMP
WHERE TableName = 'Orders';

Slowly Changing Dimensions (SCD Type 2):

CREATE TABLE DimCustomer (
    CustomerKey INT PRIMARY KEY,  -- Surrogate key
    CustomerID INT,                -- Business key
    CustomerName VARCHAR(100),
    City VARCHAR(50),
    ValidFrom DATE,
    ValidTo DATE,
    IsCurrent BOOLEAN
);

-- When customer moves to new city
-- Old record
UPDATE DimCustomer
SET ValidTo = CURRENT_DATE, IsCurrent = FALSE
WHERE CustomerID = 123 AND IsCurrent = TRUE;

-- New record
INSERT INTO DimCustomer (CustomerID, CustomerName, City, ValidFrom, IsCurrent)
VALUES (123, 'John Doe', 'New York', CURRENT_DATE, TRUE);

-- Now fact table maintains history
-- FactSales joins to CustomerKey (not CustomerID)
-- Can report sales by customer's city at time of sale

Aggregation Tables (OLAP Cubes):

-- Pre-aggregate for common queries
CREATE TABLE AggSalesByMonth AS
SELECT 
    DATE_TRUNC('month', OrderDate) AS Month,
    ProductID,
    SUM(Quantity) AS TotalQuantity,
    SUM(TotalAmount) AS TotalRevenue
FROM FactSales
GROUP BY Month, ProductID;

CREATE INDEX idx_agg_month ON AggSalesByMonth(Month, ProductID);

-- Query aggregate (much faster!)
SELECT * FROM AggSalesByMonth
WHERE Month = '2024-01-01';

5. Mobile Database Patterns:

Offline-First Architecture:

// React Native + SQLite
import SQLite from 'react-native-sqlite-storage';

class OfflineDB {
  constructor() {
    this.db = SQLite.openDatabase({ name: 'app.db' });
  }
  
  async createOrder(order) {
    // Always save locally first
    const localId = await this.db.executeSql(
      'INSERT INTO Orders (data, syncStatus) VALUES (?, ?)',
      [JSON.stringify(order), 'pending']
    );
    
    // Try to sync immediately if online
    if (await this.isOnline()) {
      this.syncPendingOrders();
    }
    
    return localId;
  }
  
  async syncPendingOrders() {
    const pending = await this.db.executeSql(
      'SELECT * FROM Orders WHERE syncStatus = ?',
      ['pending']
    );
    
    for (let order of pending.rows) {
      try {
        const response = await fetch('/api/orders', {
          method: 'POST',
          body: order.data
        });
        
        if (response.ok) {
          await this.db.executeSql(
            'UPDATE Orders SET syncStatus = ? WHERE id = ?',
            ['synced', order.id]
          );
        }
      } catch (error) {
        // Network error, will retry later
        console.log('Sync failed, will retry');
      }
    }
  }
  
  async isOnline() {
    return navigator.onLine;
  }
}

// Periodic background sync
setInterval(() => {
  if (db.isOnline()) {
    db.syncPendingOrders();
  }
}, 60000);  // Every minute

6. Performance Monitoring:

Key Metrics:

-- PostgreSQL: Slow queries
SELECT 
    query,
    calls,
    total_time / calls AS avg_time,
    min_time,
    max_time
FROM pg_stat_statements
WHERE total_time / calls > 1000  -- > 1 second avg
ORDER BY total_time DESC
LIMIT 10;

-- Cache hit ratio (should be > 90%)
SELECT 
    sum(blks_hit) * 100.0 / sum(blks_hit + blks_read) AS cache_hit_ratio
FROM pg_stat_database;

-- Index usage
SELECT 
    schemaname, tablename,
    idx_scan,  -- Index scans
    seq_scan,  -- Sequential scans
    idx_scan / (seq_scan + idx_scan + 0.001) AS index_usage_ratio
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- Table bloat
SELECT 
    schemaname, tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

7. Cloud Database Cost Optimization:

Strategies:

Use Read Replicas for Read-Heavy Workloads:

Primary (Expensive): Writes only
Replica 1, 2, 3 (Cheaper): Distribute reads

Schedule Non-Production Databases:

# Stop dev/test databases outside business hours
aws rds stop-db-instance --db-instance-identifier dev-db

# Schedule with Lambda + CloudWatch Events
# Save ~60% on dev/test costs

Right-Size Instances:

# Monitor actual usage
aws cloudwatch get-metric-statistics \
    --namespace AWS/RDS \
    --metric-name CPUUtilization \
    --dimensions Name=DBInstanceIdentifier,Value=my-db

# If consistently < 40%, downgrade instance
# db.m5.xlarge ($280/mo) → db.m5.large ($140/mo)

Use Reserved Instances:

On-Demand: $0.50/hr = $360/mo
1-Year Reserved: $0.30/hr = $216/mo (40% savings)
3-Year Reserved: $0.20/hr = $144/mo (60% savings)

Archive Old Data:

-- Move old data to cheaper storage (S3)
-- Orders > 2 years old
COPY (
    SELECT * FROM Orders 
    WHERE OrderDate < CURRENT_DATE - INTERVAL '2 years'
) TO '/tmp/old_orders.csv';

-- Upload to S3, then delete from DB
DELETE FROM Orders 
WHERE OrderDate < CURRENT_DATE - INTERVAL '2 years';

-- Query old data from S3 (AWS Athena) when needed

For exams

IMPORTANT EXAM QUESTIONS

Query Processing:

Explain the steps involved in query processing.
- Parsing: Check syntax, validate schema
- Translation: Convert SQL to relational algebra
- Optimization: Generate and evaluate execution plans
- Evaluation: Execute chosen plan
What is query optimization? Why is it important?
- Finding efficient execution plan among alternatives
- Same query can have vastly different costs (orders of magnitude)
- Critical for performance at scale
Compare selection algorithms (A1-A5). When is each used?
- A1 (Linear): No index, unsorted
- A2 (Binary): Sorted file
- A3 (Primary Index): Primary key equality
- A4 (Range): Range query with index
- A5 (Secondary): Secondary key query
Explain external sort-merge algorithm with an example.
- Pass 0: Create sorted runs of size M
- Pass 1+: Merge M-1 runs at a time
- Cost: 2br × (number of passes)
- Example with specific numbers
Compare join algorithms: Nested-Loop, Sort-Merge, Hash Join.
- Nested-Loop: Simple, expensive (nr × br + bs)
- Sort-Merge: Good for sorted/large joins
- Hash Join: Best for equi-joins, most common in practice

Indexing:

What is indexing? Compare B-tree and Hash indexes.
- Index: Data structure for fast retrieval
- B-tree: Range queries, sorted access, O(log n)
- Hash: Exact match only, O(1), no ordering
Explain B+-tree structure and operations.
- All data in leaves
- Leaves linked for range scans
- Insert: May cause splits
- Delete: May cause merges/redistribution
Why are B+-trees preferred over B-trees in databases?
- Higher fanout (more keys per node)
- Efficient range scans (leaf links)
- All data in leaves (simpler)
Explain extendible hashing with an example.
- Dynamic hash structure
- Directory grows/shrinks with d bits
- Buckets split on overflow
- No performance degradation with growth
When would you use a hash index vs B-tree index?
- Hash: Primary key lookups, equality predicates
- B-tree: Range queries, ORDER BY, default choice

Distributed Databases:

What is a distributed database? What are its advantages?
- Database across multiple sites
- Advantages: Autonomy, performance, reliability, scalability
Explain data fragmentation with examples.
- Horizontal: Split rows (geographical partitioning)
- Vertical: Split columns (separate sensitive data)
- Hybrid: Both
What is the difference between replication and fragmentation?
- Fragmentation: Split data (no duplicates)
- Replication: Copy data (duplicates)
- Can be combined
Explain Two-Phase Commit protocol.
- Phase 1 (Prepare): Coordinator asks participants to prepare
- Phase 2 (Commit/Abort): Coordinator decides based on votes
- Ensures atomicity in distributed transactions
What is CAP theorem? Explain with examples.
- Consistency, Availability, Partition tolerance
- Can achieve at most 2 of 3
- SQL: CA, NoSQL: AP or CP

Data Warehousing:

What is a data warehouse? How does it differ from OLTP?
- Subject-oriented, integrated, time-variant, non-volatile
- OLTP: Current data, transactional, normalized
- DW: Historical data, analytical, denormalized
Explain star schema with an example.
- Fact table (center): Measures
- Dimension tables (points): Context
- Simple structure, denormalized
What is the difference between star and snowflake schema?
- Star: Denormalized dimensions (simple, redundant)
- Snowflake: Normalized dimensions (complex, less redundant)
Explain OLAP operations: Roll-up, Drill-down, Slice, Dice.
- Roll-up: Aggregate to higher level (month → year)
- Drill-down: Detail to lower level (year → month)
- Slice: Fix one dimension
- Dice: Fix multiple dimensions
What is Slowly Changing Dimension? Explain Type 2 SCD.
- Dimensions that change over time
- Type 2: Keep history with validity dates
- New row for each change, maintains full history

Data Mining:

What is data mining? Name major techniques.
- Discovering patterns in large datasets
- Classification, Clustering, Association Rules, Regression, Anomaly Detection
Explain association rule mining with an example.
- Market basket analysis: {A, B} ⇒ {C}
- Support: Frequency of itemset
- Confidence: Conditional probability
- Apriori algorithm
What is the difference between classification and clustering?
- Classification: Supervised (predefined classes)
- Clustering: Unsupervised (discover groupings)

Advanced Topics:

What is MVCC? How does it differ from locking?
- Multi-Version Concurrency Control
- Keeps multiple versions of data
- Readers don't block writers
- Used in PostgreSQL, Oracle
Explain NoSQL databases. When are they preferred over SQL?
- Non-relational, schema-less
- Types: Document, Key-Value, Column-family, Graph
- Preferred for: Scale, flexibility, specific access patterns

QUICK REVISION

Query Processing:

Parse → Optimize → Execute
Cost-based optimization
Join algorithms: Nested-Loop < Sort-Merge < Hash

Indexing:

B+-tree: Default, range queries
Hash: Equality only
Trade-off: Fast reads vs slow writes

B+-tree:

Balanced, O(log n)
All data in leaves
Leaves linked
Insert/Delete may split/merge

Distributed DB:

Fragmentation: Split data
Replication: Copy data
2PC: Atomic distributed transactions
CAP: Pick 2 of 3

Data Warehouse:

Subject-oriented, integrated, time-variant
Star schema: Fact + Dimensions
OLAP: Roll-up, Drill-down, Slice, Dice
ETL: Extract, Transform, Load

Data Mining:

Association: Market basket
Classification: Supervised
Clustering: Unsupervised

Advanced:

MVCC: Multiple versions
NoSQL: Scale, flexibility
Mobile: Offline-first, sync

Key points

KEY TAKEAWAYS

✓ Query optimization can improve performance by orders of magnitude: Same query, different plans, vastly different costs

✓ Hash join is most common in practice: Efficient for equi-joins, used by most query optimizers

✓ B+-tree is the default index: Versatile, supports range queries, efficient for most workloads

✓ Push selections and projections down: Reduce intermediate result sizes early in execution

✓ Indexing trades read speed for write speed: Create indexes on frequently queried columns, avoid over-indexing

✓ Distributed databases offer scalability and reliability: But add complexity in query processing and transaction management

✓ CAP theorem guides distributed system design: Can't have all three: Consistency, Availability, Partition tolerance

✓ Data warehouses are optimized for analytics: Denormalized, historical, read-heavy workload

✓ Star schema simplifies OLAP: Fact table + dimension tables, intuitive for business users

✓ ETL is critical for data quality: Extract, Transform, Load - transformation is where data cleaning happens

✓ Data mining discovers hidden patterns: Classification, clustering, association rules enable data-driven decisions

✓ Mobile databases require offline-first design: Local storage + synchronization for reliable mobile apps

✓ NoSQL trades ACID for scale: Different models (document, key-value, graph) for different access patterns

✓ MVCC enables high concurrency: Readers don't block writers, used by PostgreSQL and Oracle

✓ Cloud databases offer elasticity: Auto-scaling, managed services, but watch costs and vendor lock-in

✓ EXPLAIN is your best friend: Always analyze query plans before optimizing

REMEMBER: Modern database systems are complex ecosystems. Understanding query processing, indexing strategies, and advanced architectures (distributed, warehouse, NoSQL) is essential for building scalable, high-performance applications. Always profile before optimizing, and choose the right tool for the job!

←

Transactions, ACID Properties, Concurrency Control, and Recovery