Transaction Processing
Imagine transferring $500 from your savings to checking account. The bank's system must: (1) deduct $500 from savings, (2) add $500 to checking. What if the system crashes between steps? You'd lose $500! Transactions ensure that either both steps happen, or neither does—no in-between states.
🛡️ Interactive: ACID Properties
Hover over each property to understand transaction guarantees
🔄 Transaction State Diagram
The Basics
From master.txt:
A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database using read and write operations.
DBMS enforces ACID:
- Atomicity: all-or-nothing; if a transaction aborts, its changes are not visible; if it commits, its changes become visible.
- Consistency: integrity constraints remain satisfied before and after the transaction.
- Isolation: changes in one transaction are not visible to others until commit.
- Durability: once committed, updates persist on disk even after failures.
This is why a money transfer must either complete both updates or roll back completely if the system fails mid-way.
Technical Details
Transaction states (master)
- Active: instructions are executing
- Partially committed: all operations done, changes in buffer/main memory
- Committed: changes made permanent in the database
- Failed: an instruction fails
- Aborted: rollback to undo partial effects
- Terminated: system is consistent and ready for new transactions
Schedules (master)
Serial schedule
No transaction starts until the running transaction ends.
Concurrent (non-serial) schedule
Operations of multiple transactions are interleaved. This improves CPU utilization but can lead to inconsistency if not controlled.
The notes classify concurrent schedules into serializable and non-serializable.
Serializability (master)
Conflict serializability
Conflicting operations satisfy all:
- belong to different transactions
- operate on the same data item
- at least one is a write
To test conflict serializability using a precedence graph:
- create a node for each transaction
- add an edge Ti -> Tj when an operation of Ti must occur before a conflicting operation of Tj
- schedule is conflict-serializable iff the graph has no cycle
View serializability
Two schedules are view equivalent if they preserve:
- initial reads
- final writes
- updated reads
Concurrency-control techniques (master)
Lock-based protocols
Lock modes:
- Shared (S): read
- Exclusive (X): read + write
Two-phase locking (2PL) ensures conflict-serializable schedules:
- Phase 1 (growing): acquire locks only
- Phase 2 (shrinking): release locks only
Notes highlight common drawbacks: deadlocks and cascading rollback.
Strict / rigorous 2PL (master):
- Strict 2PL: hold exclusive locks till commit/abort
- Rigorous 2PL: hold all locks till commit/abort (serial order matches commit order)
Timestamp-based protocols
Each transaction gets a timestamp; the timestamp order defines the serial order.
Maintain per item Q:
- WTS(Q): largest timestamp of any successful write(Q)
- RTS(Q): largest timestamp of any successful read(Q)
Rules from the notes:
- read(Q): reject and rollback if TS(Ti) <= WTS(Q); else allow and set RTS(Q) = max(RTS(Q), TS(Ti))
- write(Q): reject and rollback if TS(Ti) < RTS(Q) or TS(Ti) < WTS(Q); else allow and set WTS(Q) = TS(Ti)
Validation-based (optimistic) protocol
Three phases:
- read/execute to local variables
- validation
- write phase if validated; otherwise rollback
Recoverability and recovery (master)
Recoverable vs irrecoverable schedules
- Recoverable: if a transaction performs a dirty read, its commit is delayed until the writer commits or aborts.
- Irrecoverable: a transaction commits after reading an uncommitted value that is later rolled back.
Kinds mentioned: cascading, cascadeless, and strict schedules.
Failures
- transaction failure (logical/system)
- system crash
- disk failure / physical problems
Logs, undo, redo
System log stores transaction boundaries and update records.
- Undo of <Ti, X, V1, V2> restores old value V1.
- Redo restores new value V2.
Recovery techniques in the notes:
- Deferred update: no-undo/redo (redo at commit)
- Immediate update: undo/redo (log is written before applying updates)
Examples
Example 1: Transaction in SQL
-- Bank Transfer: Savings to Checking
BEGIN TRANSACTION;
-- Step 1: Deduct from savings
UPDATE Account
SET Balance = Balance - 500
WHERE AccountID = 101 AND AccountType = 'Savings';
-- Step 2: Add to checking
UPDATE Account
SET Balance = Balance + 500
WHERE AccountID = 101 AND AccountType = 'Checking';
-- Verify balance constraints
IF (SELECT Balance FROM Account WHERE AccountID=101 AND AccountType='Savings') < 0
ROLLBACK;
ELSE
COMMIT;
END TRANSACTION;
Example 2: Deadlock Scenario
| Time | Transaction T1 | Transaction T2 |
|---|---|---|
| t1 | BEGIN | |
| t2 | BEGIN | |
| t3 | LOCK A (Exclusive) | |
| t4 | LOCK B (Exclusive) | |
| t5 | UPDATE A SET ... | |
| t6 | UPDATE B SET ... | |
| t7 | LOCK B (Wait for T2) | |
| t8 | LOCK A (Wait for T1) | |
| t9 | DEADLOCK DETECTED | |
| t10 | T2 ABORTED (Victim) | |
| t11 | LOCK B (Acquired) | |
| t12 | COMMIT |
Example 3: Recovery Scenario
Log Contents:
<T1 start>
<T1, X, 100, 150>
<T1, Y, 200, 250>
<T1 commit>
<T2 start>
<T2, X, 150, 200>
<T2, Y, 250, 300>
--- CRASH ---
Recovery:
- Redo T1 (committed): X=150, Y=250
- Undo T2 (not committed): X=150 (revert T2's change), Y=250
Final State: X=150, Y=250
Example 4: Isolation Levels in Action
-- Session 1
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates Balance to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Reads 500 (DIRTY READ!)
COMMIT;
-- With READ COMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Still reads 1000
COMMIT;
-- With SERIALIZABLE (PostgreSQL)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT SUM(Balance) FROM Account; -- Reads sum
-- (Session 2 inserts new account)
SELECT SUM(Balance) FROM Account; -- Same sum (no phantom)
COMMIT;
Real-World Use
Best Practices:
-
Transaction Design:
- Keep transactions short
- Acquire locks in consistent order (prevent deadlock)
- Avoid user interaction within transaction
- Handle exceptions (rollback on error)
-
Choosing Isolation Level:
- READ UNCOMMITTED: Never (except analytics)
- READ COMMITTED: Default for most applications
- REPEATABLE READ: Financial transactions, reports
- SERIALIZABLE: Critical data, when correctness > performance
-
Performance Tuning:
- Monitor lock wait times
- Identify long-running transactions
- Use row-level locking (not table-level)
- Consider optimistic locking for low contention
-
Deadlock Prevention:
- Access tables in same order
- Use timeouts
- Keep transactions short
- Retry aborted transactions
-
Application-Level Considerations:
- Implement retry logic
- Use connection pooling
- Consider eventual consistency (NoSQL)
- Saga pattern for distributed transactions
Real-World Usage:
- Banking: SERIALIZABLE for transfers
- E-commerce: READ COMMITTED for order processing
- Social media: READ COMMITTED, eventual consistency for likes
- Stock trading: SERIALIZABLE with strict locking
- Analytics: READ UNCOMMITTED (stale data acceptable)
For exams
Key Questions:
- Explain ACID properties with examples
- Draw transaction state diagram
- What problems arise from concurrent transactions? Give examples.
- Explain Two-Phase Locking protocol. How does it ensure serializability?
- Compare timestamp-based vs lock-based concurrency control
- What is a deadlock? Explain detection and prevention strategies.
- Explain Write-Ahead Logging and its importance in recovery
- Describe ARIES recovery algorithm (Analysis, Redo, Undo)
- Compare SQL isolation levels. Which allows which anomalies?
- Given a schedule, determine if it's serializable
- Show recovery steps given transaction log and crash point
- Explain MVCC. Why do readers not block writers?
Key points
Core Concepts:
• ACID properties guarantee reliable transactions
• Atomicity: All or nothing (enforced by logging and recovery)
• Consistency: Maintain constraints (enforced by application + DBMS)
• Isolation: Concurrent transactions don't interfere (concurrency control)
• Durability: Committed changes survive crashes (write-ahead logging)
• Two-Phase Locking ensures serializability but can deadlock
• MVCC allows high concurrency by keeping multiple versions
• Recovery uses logs to redo committed and undo uncommitted transactions
• Isolation levels balance consistency with performance
Quick Quiz
1. What does the 'A' in ACID stand for?
2. Which property ensures changes persist after crashes?
3. What does Isolation prevent?