Transaction Processing

Unit 4CLO05

Imagine transferring $500 from your savings to checking account. The bank's system must: (1) deduct $500 from savings, (2) add $500 to checking. What if the system crashes between steps? You'd lose $500! Transactions ensure that either both steps happen, or neither does—no in-between states.

🛡️ Interactive: ACID Properties

Hover over each property to understand transaction guarantees

⚛️
A
Atomicity
⚖️
C
Consistency
🔒
I
Isolation
💾
D
Durability

🔄 Transaction State Diagram

ACTIVEPARTIALLYCOMMITTEDCOMMITTEDFAILEDABORTEDLast statementexecutedSuccessErrorErrorRollback✓ Green = Success Path⚠ Orange = Processing✗ Red = Failure Path

The Basics

From master.txt:

A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database using read and write operations.

DBMS enforces ACID:

  • Atomicity: all-or-nothing; if a transaction aborts, its changes are not visible; if it commits, its changes become visible.
  • Consistency: integrity constraints remain satisfied before and after the transaction.
  • Isolation: changes in one transaction are not visible to others until commit.
  • Durability: once committed, updates persist on disk even after failures.

This is why a money transfer must either complete both updates or roll back completely if the system fails mid-way.

Technical Details

Transaction states (master)

  • Active: instructions are executing
  • Partially committed: all operations done, changes in buffer/main memory
  • Committed: changes made permanent in the database
  • Failed: an instruction fails
  • Aborted: rollback to undo partial effects
  • Terminated: system is consistent and ready for new transactions

Schedules (master)

Serial schedule

No transaction starts until the running transaction ends.

Concurrent (non-serial) schedule

Operations of multiple transactions are interleaved. This improves CPU utilization but can lead to inconsistency if not controlled.

The notes classify concurrent schedules into serializable and non-serializable.

Serializability (master)

Conflict serializability

Conflicting operations satisfy all:

  1. belong to different transactions
  2. operate on the same data item
  3. at least one is a write

To test conflict serializability using a precedence graph:

  1. create a node for each transaction
  2. add an edge Ti -> Tj when an operation of Ti must occur before a conflicting operation of Tj
  3. schedule is conflict-serializable iff the graph has no cycle

View serializability

Two schedules are view equivalent if they preserve:

  1. initial reads
  2. final writes
  3. updated reads

Concurrency-control techniques (master)

Lock-based protocols

Lock modes:

  • Shared (S): read
  • Exclusive (X): read + write

Two-phase locking (2PL) ensures conflict-serializable schedules:

Two-Phase Locking (2PL) Phases

  • Phase 1 (growing): acquire locks only
  • Phase 2 (shrinking): release locks only

Notes highlight common drawbacks: deadlocks and cascading rollback.

Strict / rigorous 2PL (master):

  • Strict 2PL: hold exclusive locks till commit/abort
  • Rigorous 2PL: hold all locks till commit/abort (serial order matches commit order)

Timestamp-based protocols

Each transaction gets a timestamp; the timestamp order defines the serial order.

Maintain per item Q:

  • WTS(Q): largest timestamp of any successful write(Q)
  • RTS(Q): largest timestamp of any successful read(Q)

Rules from the notes:

  • read(Q): reject and rollback if TS(Ti) <= WTS(Q); else allow and set RTS(Q) = max(RTS(Q), TS(Ti))
  • write(Q): reject and rollback if TS(Ti) < RTS(Q) or TS(Ti) < WTS(Q); else allow and set WTS(Q) = TS(Ti)

Validation-based (optimistic) protocol

Three phases:

  1. read/execute to local variables
  2. validation
  3. write phase if validated; otherwise rollback

Recoverability and recovery (master)

Recoverable vs irrecoverable schedules

  • Recoverable: if a transaction performs a dirty read, its commit is delayed until the writer commits or aborts.
  • Irrecoverable: a transaction commits after reading an uncommitted value that is later rolled back.

Kinds mentioned: cascading, cascadeless, and strict schedules.

Failures

  • transaction failure (logical/system)
  • system crash
  • disk failure / physical problems

Logs, undo, redo

System log stores transaction boundaries and update records.

  • Undo of <Ti, X, V1, V2> restores old value V1.
  • Redo restores new value V2.

Recovery techniques in the notes:

  • Deferred update: no-undo/redo (redo at commit)
  • Immediate update: undo/redo (log is written before applying updates)

Examples

Example 1: Transaction in SQL

-- Bank Transfer: Savings to Checking
BEGIN TRANSACTION;

-- Step 1: Deduct from savings
UPDATE Account
SET Balance = Balance - 500
WHERE AccountID = 101 AND AccountType = 'Savings';

-- Step 2: Add to checking
UPDATE Account
SET Balance = Balance + 500
WHERE AccountID = 101 AND AccountType = 'Checking';

-- Verify balance constraints
IF (SELECT Balance FROM Account WHERE AccountID=101 AND AccountType='Savings') < 0
ROLLBACK;
ELSE
COMMIT;
END TRANSACTION;

Example 2: Deadlock Scenario

TimeTransaction T1Transaction T2
t1BEGIN
t2BEGIN
t3LOCK A (Exclusive)
t4LOCK B (Exclusive)
t5UPDATE A SET ...
t6UPDATE B SET ...
t7LOCK B (Wait for T2)
t8LOCK A (Wait for T1)
t9DEADLOCK DETECTED
t10T2 ABORTED (Victim)
t11LOCK B (Acquired)
t12COMMIT

Example 3: Recovery Scenario

Log Contents:
<T1 start>
<T1, X, 100, 150>
<T1, Y, 200, 250>
<T1 commit>
<T2 start>
<T2, X, 150, 200>
<T2, Y, 250, 300>
--- CRASH ---

Recovery:

  1. Redo T1 (committed): X=150, Y=250
  2. Undo T2 (not committed): X=150 (revert T2's change), Y=250

Final State: X=150, Y=250

Example 4: Isolation Levels in Action

-- Session 1
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates Balance to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Reads 500 (DIRTY READ!)
COMMIT;

-- With READ COMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
SELECT Balance FROM Account WHERE ID=101; -- Reads 1000
-- (Session 2 updates to 500 but doesn't commit)
SELECT Balance FROM Account WHERE ID=101; -- Still reads 1000
COMMIT;

-- With SERIALIZABLE (PostgreSQL)
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT SUM(Balance) FROM Account; -- Reads sum
-- (Session 2 inserts new account)
SELECT SUM(Balance) FROM Account; -- Same sum (no phantom)
COMMIT;

Real-World Use

Best Practices:

  1. Transaction Design:

    • Keep transactions short
    • Acquire locks in consistent order (prevent deadlock)
    • Avoid user interaction within transaction
    • Handle exceptions (rollback on error)
  2. Choosing Isolation Level:

    • READ UNCOMMITTED: Never (except analytics)
    • READ COMMITTED: Default for most applications
    • REPEATABLE READ: Financial transactions, reports
    • SERIALIZABLE: Critical data, when correctness > performance
  3. Performance Tuning:

    • Monitor lock wait times
    • Identify long-running transactions
    • Use row-level locking (not table-level)
    • Consider optimistic locking for low contention
  4. Deadlock Prevention:

    • Access tables in same order
    • Use timeouts
    • Keep transactions short
    • Retry aborted transactions
  5. Application-Level Considerations:

    • Implement retry logic
    • Use connection pooling
    • Consider eventual consistency (NoSQL)
    • Saga pattern for distributed transactions

Real-World Usage:

  • Banking: SERIALIZABLE for transfers
  • E-commerce: READ COMMITTED for order processing
  • Social media: READ COMMITTED, eventual consistency for likes
  • Stock trading: SERIALIZABLE with strict locking
  • Analytics: READ UNCOMMITTED (stale data acceptable)

For exams

Key Questions:

  1. Explain ACID properties with examples
  2. Draw transaction state diagram
  3. What problems arise from concurrent transactions? Give examples.
  4. Explain Two-Phase Locking protocol. How does it ensure serializability?
  5. Compare timestamp-based vs lock-based concurrency control
  6. What is a deadlock? Explain detection and prevention strategies.
  7. Explain Write-Ahead Logging and its importance in recovery
  8. Describe ARIES recovery algorithm (Analysis, Redo, Undo)
  9. Compare SQL isolation levels. Which allows which anomalies?
  10. Given a schedule, determine if it's serializable
  11. Show recovery steps given transaction log and crash point
  12. Explain MVCC. Why do readers not block writers?

Key points

Core Concepts:

• ACID properties guarantee reliable transactions
• Atomicity: All or nothing (enforced by logging and recovery)
• Consistency: Maintain constraints (enforced by application + DBMS)
• Isolation: Concurrent transactions don't interfere (concurrency control)
• Durability: Committed changes survive crashes (write-ahead logging)
• Two-Phase Locking ensures serializability but can deadlock
• MVCC allows high concurrency by keeping multiple versions
• Recovery uses logs to redo committed and undo uncommitted transactions
• Isolation levels balance consistency with performance

Quick Quiz

1. What does the 'A' in ACID stand for?

2. Which property ensures changes persist after crashes?

3. What does Isolation prevent?