Relational Database Design

Unit 3•CLO03

Database design is the art and science of organizing data to minimize redundancy while ensuring integrity. Poor design leads to data anomalies—inconsistencies that creep in during updates, deletions, or insertions. Normalization is the systematic process that transforms poorly designed schemas into well-structured ones.

🎯 Interactive: Normalization Forms

Select a normal form to learn its rules and examples

🎯 Normalization Process Flow

The Basics

Normalization decomposes relations with anomalies into smaller, well-structured relations. The goal is to reduce redundancy and avoid anomalies without losing information.

From the master notes, the classic anomalies are:

Update anomaly: the same fact is repeated in multiple rows, so changing it requires multiple updates (and missing one row makes the database inconsistent).
Insertion anomaly: you cannot insert a new fact because another attribute (unrelated to that fact) would be forced to NULL or missing.
Deletion anomaly: deleting one fact unintentionally deletes another important fact.

The engine behind normalization is the functional dependency (FD).

X → Y means: if two tuples agree on X, they must agree on Y.
A candidate key can be tested using attribute closure (the same technique used in the master.txt 2NF/3NF examples).

Technical Details

Functional dependencies and inference

Armstrong's axioms

Reflexivity: if Y ⊆ X then X → Y
Augmentation: if X → Y then XZ → YZ
Transitivity: if X → Y and Y → Z then X → Z

Common derived rules: union, decomposition, pseudotransitivity.

Attribute closure (to find keys)

Compute X+ under the FD set. If X+ contains all attributes of R, then X is a superkey; if no proper subset is a superkey, then X is a candidate key.

Normal forms (progressive refinement)

Normalization Flow

1NF

All attribute values are atomic.

2NF (master definition)

In 1NF, and no non-prime attribute is partially dependent on a candidate key (key should not be “broken”).

3NF (master definitions)

In 2NF, and no non-prime attribute is transitively dependent on the key.

Practical FD test used in notes: for every non-trivial X → Y, either:

X is a superkey, or
Y is a prime attribute.

BCNF

For every non-trivial FD X → Y, X must be a superkey.

Decomposition properties

Lossless join

Decomposition should reconstruct the original relation without spurious tuples.

Dependency preservation (master)

In a decomposition R -> R1, R2, ..., every dependency of R must either hold completely inside some decomposed relation, or be derivable from the union of dependencies across the decomposed relations.

Master example: if R(A,B,C,D) has A → BC and we decompose into R1(ABC) and R2(AD), it is dependency-preserving because A → BC is enforced in R1(ABC).

Examples

Worked example (2NF) — based on master.txt

Given R(A,B,C,D) with FD set { AB → CD, B → C }.

Step 1: Candidate key (closure method)

Compute AB+:
- From AB → CD, we get C and D
- So AB+ = ABCD (all attributes)
Therefore AB is a candidate key.

Step 2: Check 2NF

Prime attributes: A, B
Non-prime attributes: C, D
AB → CD is fine (full dependency on the key)
B → C violates 2NF (a non-prime attribute depends on a proper subset of a composite key)

Step 3: Convert to 2NF (master decomposition)

R1(B, C)
R2(A, B, D)

Worked example (3NF) — based on master.txt

Given R(X,Y,Z) with FD set { X → Y, Y → Z }.

Step 1: Candidate key

X+ = XYZ so X is a candidate key.

Step 2: Check 3NF

FD Y → Z violates 3NF because Y is not a superkey and Z is not prime.

Step 3: Convert to 3NF (master decomposition)

R1(X, Y)
R2(Y, Z)

Real-World Use

Practical design workflow

Draft relations from requirements.
List likely FDs (business rules, uniqueness constraints).
Use closure to confirm candidate keys (as in master examples).
Normalize to 3NF for most OLTP designs.
Ensure decompositions are lossless and preferably dependency-preserving (important both in exams and in real enforcement).

Engineering trade-offs

Over-normalization increases joins; denormalization may be justified for read-heavy reporting.
If BCNF breaks dependency preservation and complicates enforcement, 3NF is often the better compromise.
Enforce FDs using constraints: PK/FK/UNIQUE/CHECK (and sometimes triggers).

For exams

Exam prompts

Find candidate keys using attribute closure.
Identify whether a relation is in 2NF/3NF/BCNF given FDs.
Convert to 2NF/3NF using decomposition (like the master 2NF/3NF worked problems).
Define and illustrate update/insertion/deletion anomalies.
State Armstrong's axioms and use them to infer an FD.
Check decomposition for lossless join and dependency preservation.
Explain the practical difference between 3NF and BCNF.

Key points

Anomalies are the symptom; FDs are the cause.
Closure is the most reliable tool to reason about keys.
2NF removes partial dependency; 3NF removes transitive dependency.
Good decompositions are lossless and ideally dependency-preserving.
3NF is commonly used in practice; BCNF is stricter but may complicate enforcement.

Quick Quiz

1. What does 1NF require?

2. Which is the strictest normal form?

3. Normalization helps reduce which problem?

←

Relational Query Languages

Query Processing and Optimization

→