Transaction Processing

Unit 4•CLO05

Imagine transferring $500 from your savings to checking account. The bank's system must: (1) deduct $500 from savings, (2) add $500 to checking. What if the system crashes between steps? You'd lose $500! Transactions ensure that either both steps happen, or neither does—no in-between states.

🛡️ Interactive: ACID Properties

Hover over each property to understand transaction guarantees

⚛️

Atomicity

⚖️

Consistency

🔒

Isolation

💾

Durability

🔄 Transaction State Diagram

The Basics

From master.txt:

A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database using read and write operations.

DBMS enforces ACID:

Atomicity: all-or-nothing; if a transaction aborts, its changes are not visible; if it commits, its changes become visible.
Consistency: integrity constraints remain satisfied before and after the transaction.
Isolation: changes in one transaction are not visible to others until commit.
Durability: once committed, updates persist on disk even after failures.

This is why a money transfer must either complete both updates or roll back completely if the system fails mid-way.

Technical Details

Transaction states (master)

Active: instructions are executing
Partially committed: all operations done, changes in buffer/main memory
Committed: changes made permanent in the database
Failed: an instruction fails
Aborted: rollback to undo partial effects
Terminated: system is consistent and ready for new transactions

Schedules (master)

Serial schedule

No transaction starts until the running transaction ends.

Concurrent (non-serial) schedule

Operations of multiple transactions are interleaved. This improves CPU utilization but can lead to inconsistency if not controlled.

The notes classify concurrent schedules into serializable and non-serializable.

Serializability (master)

Conflict serializability

Conflicting operations satisfy all:

belong to different transactions
operate on the same data item
at least one is a write

To test conflict serializability using a precedence graph:

create a node for each transaction
add an edge Ti -> Tj when an operation of Ti must occur before a conflicting operation of Tj
schedule is conflict-serializable iff the graph has no cycle

View serializability

Two schedules are view equivalent if they preserve:

initial reads
final writes
updated reads

Concurrency-control techniques (master)

Lock-based protocols

Lock modes:

Shared (S): read
Exclusive (X): read + write

Two-phase locking (2PL) ensures conflict-serializable schedules:

Two-Phase Locking (2PL) Phases

Phase 1 (growing): acquire locks only
Phase 2 (shrinking): release locks only

Notes highlight common drawbacks: deadlocks and cascading rollback.

Strict / rigorous 2PL (master):

Strict 2PL: hold exclusive locks till commit/abort
Rigorous 2PL: hold all locks till commit/abort (serial order matches commit order)

Timestamp-based protocols

Each transaction gets a timestamp; the timestamp order defines the serial order.

Maintain per item Q:

WTS(Q): largest timestamp of any successful write(Q)
RTS(Q): largest timestamp of any successful read(Q)

Rules from the notes:

read(Q): reject and rollback if TS(Ti) <= WTS(Q); else allow and set RTS(Q) = max(RTS(Q), TS(Ti))
write(Q): reject and rollback if TS(Ti) < RTS(Q) or TS(Ti) < WTS(Q); else allow and set WTS(Q) = TS(Ti)

Validation-based (optimistic) protocol

Three phases:

read/execute to local variables
validation
write phase if validated; otherwise rollback

Recoverability and recovery (master)

Recoverable vs irrecoverable schedules

Recoverable: if a transaction performs a dirty read, its commit is delayed until the writer commits or aborts.
Irrecoverable: a transaction commits after reading an uncommitted value that is later rolled back.

Kinds mentioned: cascading, cascadeless, and strict schedules.

Failures

transaction failure (logical/system)
system crash
disk failure / physical problems

Logs, undo, redo

System log stores transaction boundaries and update records.

Undo of <Ti, X, V1, V2> restores old value V1.
Redo restores new value V2.

Recovery techniques in the notes:

Deferred update: no-undo/redo (redo at commit)
Immediate update: undo/redo (log is written before applying updates)

Examples

Example 1: Transaction in SQL

-- Bank Transfer: Savings to Checking
BEGIN TRANSACTION;

-- Step 1: Deduct from savings
UPDATE Account
SET Balance = Balance - 500
WHERE AccountID = 101 AND AccountType = 'Savings';

-- Step 2: Add to checking
UPDATE Account
SET Balance = Balance + 500
WHERE AccountID = 101 AND AccountType = 'Checking';

-- Verify balance constraints
IF (SELECT Balance FROM Account WHERE AccountID=101 AND AccountType='Savings') < 0
ROLLBACK;
ELSE
COMMIT;
END TRANSACTION;

Example 2: Deadlock Scenario

Time	Transaction T1	Transaction T2
t1	BEGIN
t2		BEGIN
t3	LOCK A (Exclusive)
t4		LOCK B (Exclusive)
t5	UPDATE A SET ...
t6		UPDATE B SET ...
t7	LOCK B (Wait for T2)
t8		LOCK A (Wait for T1)
t9	DEADLOCK DETECTED
t10	T2 ABORTED (Victim)
t11	LOCK B (Acquired)
t12	COMMIT

Example 3: Recovery Scenario

Log Contents:
<T1 start>
<T1, X, 100, 150>
<T1, Y, 200, 250>
<T1 commit>
<T2 start>
<T2, X, 150, 200>
<T2, Y, 250, 300>
--- CRASH ---

Recovery:

Redo T1 (committed): X=150, Y=250
Undo T2 (not committed): X=150 (revert T2's change), Y=250

Final State: X=150, Y=250

Example 4: Isolation Levels in Action

-- Session 1
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates Balance to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Reads 500 (DIRTY READ!)
COMMIT;

-- With READ COMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Still reads 1000
COMMIT;

-- With SERIALIZABLE (PostgreSQL)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT SUM(Balance) FROM Account; -- Reads sum
-- (Session 2 inserts new account)
SELECT SUM(Balance) FROM Account; -- Same sum (no phantom)
COMMIT;

Real-World Use

Best Practices:

Transaction Design:
- Keep transactions short
- Acquire locks in consistent order (prevent deadlock)
- Avoid user interaction within transaction
- Handle exceptions (rollback on error)
Choosing Isolation Level:
- READ UNCOMMITTED: Never (except analytics)
- READ COMMITTED: Default for most applications
- REPEATABLE READ: Financial transactions, reports
- SERIALIZABLE: Critical data, when correctness > performance
Performance Tuning:
- Monitor lock wait times
- Identify long-running transactions
- Use row-level locking (not table-level)
- Consider optimistic locking for low contention
Deadlock Prevention:
- Access tables in same order
- Use timeouts
- Keep transactions short
- Retry aborted transactions
Application-Level Considerations:
- Implement retry logic
- Use connection pooling
- Consider eventual consistency (NoSQL)
- Saga pattern for distributed transactions

Real-World Usage:

Banking: SERIALIZABLE for transfers
E-commerce: READ COMMITTED for order processing
Social media: READ COMMITTED, eventual consistency for likes
Stock trading: SERIALIZABLE with strict locking
Analytics: READ UNCOMMITTED (stale data acceptable)

For exams

Key Questions:

Explain ACID properties with examples
Draw transaction state diagram
What problems arise from concurrent transactions? Give examples.
Explain Two-Phase Locking protocol. How does it ensure serializability?
Compare timestamp-based vs lock-based concurrency control
What is a deadlock? Explain detection and prevention strategies.
Explain Write-Ahead Logging and its importance in recovery
Describe ARIES recovery algorithm (Analysis, Redo, Undo)
Compare SQL isolation levels. Which allows which anomalies?
Given a schedule, determine if it's serializable
Show recovery steps given transaction log and crash point
Explain MVCC. Why do readers not block writers?

Key points

Core Concepts:

• ACID properties guarantee reliable transactions
• Atomicity: All or nothing (enforced by logging and recovery)
• Consistency: Maintain constraints (enforced by application + DBMS)
• Isolation: Concurrent transactions don't interfere (concurrency control)
• Durability: Committed changes survive crashes (write-ahead logging)
• Two-Phase Locking ensures serializability but can deadlock
• MVCC allows high concurrency by keeping multiple versions
• Recovery uses logs to redo committed and undo uncommitted transactions
• Isolation levels balance consistency with performance

Quick Quiz

1. What does the 'A' in ACID stand for?

2. Which property ensures changes persist after crashes?

3. What does Isolation prevent?