Keys, Constraints, Functional Dependencies, and Normalization
Poor database design leads to redundancy, inconsistency, and anomalies. A student's address stored in 10 different tables? Deleting a course accidentally deletes all student records? These nightmares are preventable through proper database design using keys, constraints, and normalization. Understanding these concepts is fundamental to creating efficient, maintainable databases.
The Basics
Keys
Keys uniquely identify tuples and establish relationships between tables.
1. Super Key
- Any set of attributes that uniquely identifies a tuple
- Can have redundant attributes
- Example: {StudentID}, {StudentID, Name}, {StudentID, Email} are all super keys
2. Candidate Key
- Minimal super key (no redundant attributes)
- Multiple candidate keys possible
- Example: {StudentID}, {Email} both uniquely identify students
3. Primary Key
- Selected candidate key
- One per table
- Cannot be NULL
- Example: StudentID chosen as primary key
4. Foreign Key
- References primary key in another table
- Establishes relationships
- Can be NULL (unless specified NOT NULL)
- Enforces referential integrity
5. Composite Key
- Primary key with multiple attributes
- Used when single attribute insufficient
- Example: {StudentID, CourseID} in Enrollment table
6. Alternate Key
- Candidate keys not chosen as primary key
- Example: If StudentID is primary, Email is alternate key
Integrity Constraints
1. Domain Constraints
- Define valid values for attributes
- Data types, CHECK constraints
- Example: Age BETWEEN 18 AND 100
2. Entity Integrity
- Primary key cannot be NULL
- Ensures each entity uniquely identifiable
- Enforced automatically by PRIMARY KEY constraint
3. Referential Integrity
- Foreign key must reference existing primary key OR be NULL
- Maintains consistency across relationships
- CASCADE options handle dependent operations
Functional Dependencies (FDs)
Definition: X → Y means if two tuples agree on X, they must agree on Y
Example: StudentID → Name (StudentID determines Name)
Properties:
- Reflexive: If Y ⊆ X, then X → Y
- Transitive: If X → Y and Y → Z, then X → Z
- Does NOT hold: If X → Y, then Y → X (not symmetric)
Armstrong's Axioms
Three fundamental rules for deriving FDs:
1. Reflexivity: If Y ⊆ X, then X → Y
Example: {StudentID, Name} → StudentID
2. Augmentation: If X → Y, then XZ → YZ
Example: If StudentID → Name, then {StudentID, CourseID} → {Name, CourseID}
3. Transitivity: If X → Y and Y → Z, then X → Z
Example: StudentID → DeptID and DeptID → DeptName, so StudentID → DeptName
Derived Rules:
4. Union: If X → Y and X → Z, then X → YZ
5. Decomposition: If X → YZ, then X → Y and X → Z
6. Pseudotransitivity: If X → Y and WY → Z, then WX → Z
Normalization
Process of organizing data to minimize redundancy and avoid anomalies.
Anomalies:
Update Anomaly: Same data repeated, changing one instance leaves inconsistency
Insertion Anomaly: Cannot insert data without unrelated data
Deletion Anomaly: Deleting data unintentionally deletes other data
Normal Forms:
1NF (First Normal Form):
- All attributes contain atomic (indivisible) values
- No repeating groups or arrays
- Each cell contains single value
2NF (Second Normal Form):
- In 1NF
- No partial dependencies (non-prime attribute dependent on part of candidate key)
- Only applies to tables with composite keys
3NF (Third Normal Form):
- In 2NF
- No transitive dependencies (non-prime → non-prime)
- For every X → Y: Either X is superkey OR Y is prime attribute
BCNF (Boyce-Codd Normal Form):
- Stricter than 3NF
- For every X → Y: X must be superkey
- Eliminates all anomalies due to FDs
4NF (Fourth Normal Form):
- In BCNF
- No multivalued dependencies
- MVD: X →→ Y means X determines set of Y values independently
5NF (Fifth Normal Form / PJNF):
- In 4NF
- No join dependencies
- Cannot be decomposed further without loss
Technical Details
DETAILED KEY CONCEPTS
Attribute Closure (Finding Keys):
Algorithm to find X+:
- Result := X
- For each FD Y → Z in F:
If Y ⊆ Result, then Result := Result ∪ Z - Repeat until Result doesn't change
Example:
Given R(A, B, C, D) with FDs: {A → B, B → C, C → D}
Find A+:
- Start: A+ = {A}
- Apply A → B: A+ = {A, B}
- Apply B → C: A+ = {A, B, C}
- Apply C → D: A+ = {A, B, C, D}
- Conclusion: A is a superkey (contains all attributes)
Minimal Cover (Canonical Cover):
Minimal set of FDs equivalent to original set.
Algorithm:
- Decompose: Convert right sides to single attributes
- Remove redundant: Test each FD for redundancy
- Remove extraneous: Test each attribute in left side
Example:
F = {A → BC, B → C, A → B, AB → C}
Step 1: Decompose
F' = {A → B, A → C, B → C, A → B, AB → C}
Step 2: Remove redundant A → B (duplicate)
F'' = {A → B, A → C, B → C, AB → C}
Step 3: Remove AB → C (derivable from A → B and B → C)
Minimal Cover: {A → B, B → C, A → C}
Actually, A → C derivable from A → B → C
Final Minimal Cover: {A → B, B → C}
NORMAL FORMS IN DEPTH
1NF (First Normal Form)
Violations:
- Multivalued attributes: Phone = {123-4567, 234-5678}
- Composite attributes not broken down: Name = "John Doe"
- Repeating groups: Course1, Course2, Course3 columns
Example Violation:
| StudentID | Name | Phones |
|---|---|---|
| 101 | Alice | 123-4567, 234-5678 |
1NF Solution:
| StudentID | Name | Phone |
|---|---|---|
| 101 | Alice | 123-4567 |
| 101 | Alice | 234-5678 |
2NF (Second Normal Form)
Only applies to tables with composite primary keys.
Violation: Non-prime attribute partially dependent on primary key
Example:
R(StudentID, CourseID, StudentName, CourseName, Grade)
Primary Key: {StudentID, CourseID}
FDs:
- StudentID → StudentName (partial dependency!)
- CourseID → CourseName (partial dependency!)
- {StudentID, CourseID} → Grade
Problems:
- Update: Change student name in multiple rows
- Insert: Can't add student without enrolling in course
- Delete: Delete last enrollment, lose student info
2NF Solution:
Decompose:
- Student(StudentID, StudentName)
- Course(CourseID, CourseName)
- Enrollment(StudentID, CourseID, Grade)
3NF (Third Normal Form)
Violation: Transitive dependency (non-prime → non-prime)
Example:
R(StudentID, DeptID, DeptName, DeptHead)
Primary Key: StudentID
FDs:
- StudentID → DeptID
- DeptID → DeptName (transitive!)
- DeptID → DeptHead (transitive!)
Problems:
- Update: Change dept name in multiple student records
- Insert: Can't add department without students
- Delete: Delete last student in dept, lose dept info
3NF Solution:
- Student(StudentID, DeptID)
- Department(DeptID, DeptName, DeptHead)
BCNF (Boyce-Codd Normal Form)
Stricter than 3NF: Every determinant must be a superkey
Example where 3NF ≠ BCNF:
R(Professor, Course, Student)
FDs:
- {Professor, Course} → Student
- {Student, Course} → Professor
Candidate Keys: {Professor, Course}, {Student, Course}
All attributes are prime, so it's in 3NF.
But: Student, Course → Professor where {Student, Course} is not a superkey in context.
Actually, {Student, Course} IS a candidate key, so this IS in BCNF.
Better Example:
R(Court, StartTime, EndTime)
FD: Court → EndTime (one court can only end at one time, based on rules)
Candidate Key: {Court, StartTime}
Court → EndTime violates BCNF (Court is not a superkey)
BCNF Solution:
- R1(Court, EndTime)
- R2(Court, StartTime)
4NF (Fourth Normal Form)
Multivalued Dependency (MVD): X →→ Y
Meaning: For each X value, there's a set of Y values independent of other attributes.
Example:
R(Professor, Course, Phone)
| Professor | Course | Phone |
|---|---|---|
| Dr. Smith | DB | 111-1111 |
| Dr. Smith | DB | 222-2222 |
| Dr. Smith | AI | 111-1111 |
| Dr. Smith | AI | 222-2222 |
Professor →→ Course (Professor determines set of courses)
Professor →→ Phone (Professor determines set of phones)
These are independent!
Problems:
- Redundancy: Phone numbers repeated for each course
- Update: Change phone, must update multiple rows
- Insert: Can't add phone without adding for all courses
4NF Solution:
- ProfessorCourse(Professor, Course)
- ProfessorPhone(Professor, Phone)
5NF (Fifth Normal Form / Project-Join Normal Form)
Join Dependency: R can be losslessly decomposed into R1, R2, ..., Rn
Example:
R(Supplier, Part, Project)
Meaning: Supplier supplies Part to Project
Business Rules:
- If supplier S supplies part P
- And project J uses part P
- And supplier S works on project J
- Then supplier S supplies part P to project J
This creates a join dependency!
5NF Solution:
- SupplierPart(Supplier, Part)
- ProjectPart(Project, Part)
- SupplierProject(Supplier, Project)
Verification: JOIN these three recovers original without spurious tuples
DECOMPOSITION PROPERTIES
1. Lossless Join Decomposition
Decomposition R → R1, R2 is lossless if:
R = R1 ⋈ R2
Test: R1 ∩ R2 → R1 or R1 ∩ R2 → R2
Example:
R(A, B, C) with A → B
Decompose: R1(A, B), R2(A, C)
R1 ∩ R2 = {A}
A → B (attributes of R1), so lossless!
2. Dependency Preservation
All FDs can be checked without joining.
Example:
R(A, B, C) with FDs: A → B, B → C
Decompose: R1(A, B), R2(B, C)
Check FDs:
- A → B: Check in R1 ✓
- B → C: Check in R2 ✓
- Dependency preserved!
Trade-off: BCNF guarantees lossless but may not preserve dependencies
3NF guarantees both lossless and dependency preservation
NORMALIZATION ALGORITHM
3NF Synthesis Algorithm:
Input: Relation R, FDs F
Output: 3NF decomposition
- Find minimal cover Fc
- For each FD X → Y in Fc, create relation(X, Y)
- If no relation contains a candidate key of R, add one
- Eliminate redundant relations
BCNF Decomposition Algorithm:
Input: Relation R, FDs F
Output: BCNF decomposition
- If R in BCNF, done
- Find FD X → Y that violates BCNF
- Decompose: R1 = XY, R2 = R - (Y - X)
- Recursively decompose R1 and R2
Example:
R(A, B, C, D)
F = {A → B, B → C, C → D}
Minimal Cover: Same (already minimal)
3NF Synthesis:
- R1(A, B) from A → B
- R2(B, C) from B → C
- R3(C, D) from C → D
- Candidate key: A (check: A+ = ABCD)
- R1 contains A, so done
Result: R1(A, B), R2(B, C), R3(C, D)
Examples
COMPREHENSIVE EXAMPLES
EXAMPLE 1: Finding Keys
R(A, B, C, D, E)
FDs: {A → B, BC → E, ED → A}
Find all candidate keys:
Method: Attribute Classification
- Left side only: C, D (must be in every key)
- Right side only: B, E (never in key)
- Both sides: A
- Neither: None
Start with {C, D}:
CD+ = {C, D}
Not a key (doesn't include all attributes)
Try adding A: ACD+
- A → B: ACD+ = {A, C, D, B}
- BC → E: ACD+ = {A, C, D, B, E}
- All attributes! ACD is a candidate key.
Try adding others systematically...
Result: Candidate keys = {ACD}
EXAMPLE 2: 1NF Conversion
Unnormalized:
| OrderID | CustomerName | Items |
|---|---|---|
| 1 | John | Book, Pen, Notebook |
| 2 | Jane | Laptop |
Problems:
- Items column contains multiple values (not atomic)
- Can't easily search for specific item
- Can't efficiently count items per order
1NF Solution:
| OrderID | CustomerName | Item |
|---|---|---|
| 1 | John | Book |
| 1 | John | Pen |
| 1 | John | Notebook |
| 2 | Jane | Laptop |
EXAMPLE 3: 2NF Conversion
Table in 1NF:
StudentCourse(StudentID, CourseID, StudentName, CourseName, Instructor, Grade)
Primary Key: {StudentID, CourseID}
FDs:
- StudentID → StudentName (partial dependency!)
- CourseID → CourseName (partial dependency!)
- CourseID → Instructor (partial dependency!)
- {StudentID, CourseID} → Grade
Anomalies:
- Update: Change student name requires updating all enrollment records
- Insert: Can't add student without enrolling in a course
- Delete: Drop last course, lose student information
2NF Solution:
Student(StudentID, StudentName)
Course(CourseID, CourseName, Instructor)
Enrollment(StudentID, CourseID, Grade)
Verification:
- No partial dependencies
- All non-prime attributes fully dependent on entire primary key
EXAMPLE 4: 3NF Conversion
Table in 2NF:
Employee(EmpID, DeptID, DeptName, DeptLocation, Salary)
Primary Key: EmpID
FDs:
- EmpID → DeptID
- EmpID → Salary
- DeptID → DeptName (transitive!)
- DeptID → DeptLocation (transitive!)
Transitive Path: EmpID → DeptID → DeptName
Anomalies:
- Update: Change department name in all employee records
- Insert: Can't add department without employees
- Delete: Remove last employee, lose department info
3NF Solution:
Employee(EmpID, DeptID, Salary)
Department(DeptID, DeptName, DeptLocation)
EXAMPLE 5: BCNF Conversion
Table in 3NF:
ClassSchedule(Classroom, Period, Course, Professor)
FDs:
- {Classroom, Period} → Course
- {Classroom, Period} → Professor
- {Professor, Course} → Classroom (assuming one prof teaches one course in one room)
Candidate Keys:
- {Classroom, Period}
- {Professor, Course, Period} (need Period to uniquely identify)
Actually, let's simplify:
Better Example:
R(StudentID, Major, Advisor)
FDs:
- StudentID → Major
- Major → Advisor (each major has one advisor)
Candidate Key: StudentID
Check BCNF:
- StudentID → Major: StudentID is superkey ✓
- Major → Advisor: Major is NOT superkey ✗
Violates BCNF!
BCNF Solution:
R1(Major, Advisor)
R2(StudentID, Major)
Verification:
- R1: Major → Advisor, Major is candidate key ✓
- R2: StudentID → Major, StudentID is candidate key ✓
Note: This might not preserve the FD StudentID → Advisor directly
To check: Join R1 and R2, derive StudentID → Advisor through Major
EXAMPLE 6: 4NF Conversion
Table with MVD:
ProfessorCourseTextbook(Professor, Course, Textbook)
| Professor | Course | Textbook |
|---|---|---|
| Smith | DB | Book A |
| Smith | DB | Book B |
| Smith | AI | Book A |
| Smith | AI | Book B |
MVDs:
- Professor →→ Course (Smith teaches {DB, AI})
- Professor →→ Textbook (Smith uses {Book A, Book B})
Problem: Every course-textbook combination created (cross product)
4NF Solution:
ProfessorCourse(Professor, Course)
ProfessorTextbook(Professor, Textbook)
| Professor | Course |
|---|---|
| Smith | DB |
| Smith | AI |
| Professor | Textbook |
|---|---|
| Smith | Book A |
| Smith | Book B |
EXAMPLE 7: 5NF Conversion
Relation:
SupplyPartProject(Supplier, Part, Project)
Business Rules:
- Supplier supplies specific parts
- Project uses specific parts
- Supplier works on specific projects
- If conditions 1, 2, 3 met → Supplier supplies Part to Project
Data:
| Supplier | Part | Project |
|---|---|---|
| S1 | P1 | J1 |
| S1 | P2 | J1 |
| S2 | P1 | J2 |
Join Dependency: Can losslessly decompose into three binary relations
5NF Solution:
SupplierPart(Supplier, Part)
PartProject(Part, Project)
SupplierProject(Supplier, Project)
Verification:
JOIN these three:
SupplierPart ⋈ PartProject ⋈ SupplierProject = Original relation
EXAMPLE 8: Lossless vs Lossy Decomposition
Lossless Example:
R(A, B, C) with FD: A → B
Decompose: R1(A, B), R2(A, C)
Test:
- R1 ∩ R2 = {A}
- A → AB (attributes of R1) ✓
- Lossless!
Lossy Example:
R(A, B, C) with FD: A → B
Decompose: R1(A, B), R2(B, C)
Original Data:
| A | B | C |
|---|---|---|
| a1 | b1 | c1 |
| a2 | b1 | c2 |
R1:
| A | B |
|---|---|
| a1 | b1 |
| a2 | b1 |
R2:
| B | C |
|---|---|
| b1 | c1 |
| b1 | c2 |
Join R1 ⋈ R2:
| A | B | C |
|---|---|---|
| a1 | b1 | c1 |
| a1 | b1 | c2 |
| a2 | b1 | c1 |
| a2 | b1 | c2 |
Spurious tuples created! Lossy decomposition!
EXAMPLE 9: Dependency Preservation
R(A, B, C) with FDs: {A → B, B → C, A → C}
Decomposition 1:
R1(A, B), R2(B, C)
Check:
- A → B: In R1 ✓
- B → C: In R2 ✓
- A → C: Derivable (A → B in R1, B → C in R2, transitivity)
- Dependency preserved!
Decomposition 2:
R1(A, B), R2(A, C)
Check:
- A → B: In R1 ✓
- B → C: Cannot check without JOIN ✗
- NOT dependency preserved
EXAMPLE 10: Complete Normalization Workflow
Original Relation:
R(OrderID, OrderDate, CustomerID, CustomerName, CustomerCity, ProductID, ProductName, Quantity, UnitPrice)
Sample Data:
| OrderID | OrderDate | CustomerID | CustomerName | CustomerCity | ProductID | ProductName | Qty | Price |
|---|---|---|---|---|---|---|---|---|
| 1 | 2024-01-01 | C1 | Alice | NYC | P1 | Laptop | 2 | 1000 |
| 1 | 2024-01-01 | C1 | Alice | NYC | P2 | Mouse | 1 | 25 |
| 2 | 2024-01-02 | C2 | Bob | LA | P1 | Laptop | 1 | 1000 |
FDs:
- OrderID → OrderDate, CustomerID
- CustomerID → CustomerName, CustomerCity
- ProductID → ProductName, UnitPrice
- {OrderID, ProductID} → Quantity
Primary Key: {OrderID, ProductID}
Check 1NF: All atomic ✓
Check 2NF:
- CustomerID → CustomerName (partial dependency on OrderID part)
- CustomerID → CustomerCity (partial dependency)
- ProductID → ProductName (partial dependency)
- ProductID → UnitPrice (partial dependency)
Violates 2NF!
Convert to 2NF:
Order(OrderID, OrderDate, CustomerID)
Customer(CustomerID, CustomerName, CustomerCity)
Product(ProductID, ProductName, UnitPrice)
OrderDetail(OrderID, ProductID, Quantity)
Check 3NF:
In Order: OrderID → CustomerID → CustomerName (transitive!)
Violates 3NF! Customer info transitively dependent
Already fixed: Customer separated in 2NF conversion
Final Schema (3NF):
- Order(OrderID, OrderDate, CustomerID)
- Customer(CustomerID, CustomerName, CustomerCity)
- Product(ProductID, ProductName, UnitPrice)
- OrderDetail(OrderID, ProductID, Quantity)
Check BCNF:
All determinants are candidate keys ✓
In BCNF!
Real-World Use
PRACTICAL DATABASE DESIGN
Design Workflow:
-
Requirements Analysis
- Identify entities
- Identify attributes
- Identify relationships
- Document business rules
-
Conceptual Design (ER Model)
- Create ER diagram
- Define entities and relationships
- Identify keys
-
Logical Design (Relational Model)
- Convert ER to tables
- Define FDs
- Normalize to 3NF/BCNF
-
Physical Design
- Choose data types
- Create indexes
- Consider denormalization for performance
-
Implementation
- CREATE TABLE statements
- Add constraints
- Create relationships
Real-World Considerations:
When to Denormalize:
- Read-heavy applications (more SELECTs than UPDATEs)
- Performance critical queries
- Data warehouse / reporting
- Calculated columns (storing aggregates)
Example:
Instead of joining Order and OrderDetail for total:
Add TotalAmount column to Order (updated by trigger)
When to Normalize:
- Write-heavy applications
- Data integrity critical (banking, medical)
- Frequent updates
- OLTP systems
Design Patterns:
1. Lookup Tables:
-- Instead of storing country names repeatedly
Country(CountryID, CountryName)
Customer(CustomerID, Name, CountryID)
2. Type Tables:
-- For categorization
ProductType(TypeID, TypeName)
Product(ProductID, ProductName, TypeID)
3. Audit Tables:
-- Track changes
EmployeeHistory(HistoryID, EmpID, OldSalary, NewSalary, ChangeDate)
4. Bridge Tables (Many-to-Many):
Student(StudentID, Name)
Course(CourseID, CourseName)
Enrollment(StudentID, CourseID, Grade)
Common Mistakes:
Mistake 1: Over-normalization
Bad: Separate table for every attribute
Good: Balance normalization with performance
Mistake 2: Under-normalization
Bad: All data in one table
Good: At least 3NF for OLTP
Mistake 3: Ignoring Business Rules
Bad: Generic schema ignoring domain
Good: Model specific business logic
Mistake 4: Poor Naming
Bad: Table1, Col1, FK1
Good: Employee, FirstName, DepartmentID
Mistake 5: Missing Constraints
Bad: No foreign keys, no checks
Good: Enforce referential integrity
Tools:
-
ER Diagram Tools:
- MySQL Workbench
- draw.io
- ERDPlus
- Lucidchart
-
Normalization Tools:
- Paper and pencil (best for learning!)
- Spreadsheet for FD analysis
-
SQL Tools:
- MySQL Workbench
- pgAdmin (PostgreSQL)
- SQL Server Management Studio
- DBeaver (multi-platform)
Performance Tips:
- Index foreign keys for join performance
- Index WHERE clause columns frequently queried
- Composite indexes for multi-column queries
- Avoid over-indexing (slows INSERT/UPDATE)
- Analyze query plans to find bottlenecks
Documentation:
Always document:
- ER diagrams
- Table descriptions
- FDs and business rules
- Normalization decisions
- Denormalization rationale
- Index strategy
Industry Standards:
- OLTP (Banking, E-commerce): 3NF minimum, BCNF preferred
- OLAP (Data Warehouse): Star schema (denormalized for queries)
- Hybrid: 3NF for transaction tables, materialized views for reporting
For exams
IMPORTANT EXAM QUESTIONS
Keys:
-
Define and distinguish between super key, candidate key, and primary key with examples.
- Super key: Any set uniquely identifying tuple
- Candidate key: Minimal super key
- Primary key: Selected candidate key
-
Find all candidate keys for R(A,B,C,D) with FDs: {A→B, B→C, D→A}
Solution: Find attribute closure for combinations starting with attributes only on left/neither side -
Explain composite key and foreign key with examples.
- Composite: Multiple attributes form key
- Foreign: References primary key in another table
Functional Dependencies:
-
Define functional dependency. Give examples.
X → Y: If two tuples agree on X, they agree on Y -
State Armstrong's Axioms and derive union rule.
Reflexivity, Augmentation, Transitivity
Union: From X→Y and X→Z, derive X→YZ -
Find attribute closure of AB for R(A,B,C,D) with FDs: {A→C, B→D, C→B}
AB+ = {A,B,C,D} -
What is minimal cover? Find minimal cover for F = {A→B, B→C, A→C, AB→D}
Remove redundant FDs: {A→B, B→C, AB→D}
(A→C derivable from A→B→C)
Normalization:
-
Explain 1NF, 2NF, 3NF, and BCNF with examples.
1NF: Atomic values
2NF: No partial dependency
3NF: No transitive dependency
BCNF: Every determinant is superkey -
Given relation and FDs, determine normal form.
Check each normal form definition systematically -
Convert given relation to 2NF/3NF/BCNF. Show steps.
Identify violations, decompose, verify -
Explain update, insertion, and deletion anomalies with examples.
- Update: Redundant data, inconsistent updates
- Insertion: Can't insert without unrelated data
- Deletion: Lose important data unintentionally
-
What is the difference between 3NF and BCNF?
3NF: X→Y where X superkey OR Y prime
BCNF: X→Y where X must be superkey
BCNF stricter -
Explain 4NF and multivalued dependencies.
MVD: X→→Y independent set of values
4NF: No non-trivial MVDs -
What is 5NF? Give an example.
No join dependencies
Cannot decompose further without loss
Decomposition:
-
What is lossless join decomposition? How to test?
R = R1 ⋈ R2
Test: R1∩R2 → R1 or R1∩R2 → R2 -
Explain dependency preservation. Why important?
Check FDs without joining
Important for constraint enforcement -
Show lossy decomposition example with spurious tuples.
Provide data showing join creates extra tuples -
3NF vs BCNF trade-off: dependency preservation.
3NF preserves dependencies, BCNF may not
3NF often preferred in practice
Algorithms:
-
Describe 3NF synthesis algorithm.
Minimal cover → relation per FD → add key → remove redundant -
Describe BCNF decomposition algorithm.
Find violation → split → recurse
Constraints:
-
Explain entity integrity and referential integrity.
Entity: PK not NULL
Referential: FK references existing PK or NULL -
What are CASCADE options in foreign keys?
CASCADE: Delete/update related rows
SET NULL: Set FK to NULL
RESTRICT: Reject operation
QUICK REVISION
Keys Quick Reference:
- Super Key ⊇ Candidate Key ⊇ Primary Key
- Composite Key: Multiple attributes
- Foreign Key: References PK in another table
Armstrong's Axioms:
- Reflexivity: Y⊆X ⇒ X→Y
- Augmentation: X→Y ⇒ XZ→YZ
- Transitivity: X→Y, Y→Z ⇒ X→Z
Normal Forms:
- 1NF: Atomic values
- 2NF: 1NF + No partial dependency
- 3NF: 2NF + No transitive dependency
- BCNF: Every determinant is superkey
- 4NF: BCNF + No MVDs
- 5NF: 4NF + No join dependencies
Anomalies:
- Update: Redundancy causes inconsistency
- Insert: Can't insert without unrelated data
- Delete: Lose unrelated data
Decomposition:
- Lossless: R = R1 ⋈ R2
- Dependency Preserving: Check FDs without join
Test Shortcuts:
- Attribute closure: Find keys
- Minimal cover: Simplify FDs
- Check BCNF: Every left side a superkey?
- Lossless: Common attributes determine one side?
Key points
KEY TAKEAWAYS
✓ Keys uniquely identify tuples: Super key ⊇ Candidate key ⊇ Primary key
✓ Foreign keys establish relationships and enforce referential integrity
✓ Functional dependency X → Y: X determines Y (X and Y uniquely paired)
✓ Armstrong's Axioms are foundation for deriving FDs: Reflexivity, Augmentation, Transitivity
✓ Attribute closure (X+) finds all attributes determined by X (used to find keys)
✓ Normalization eliminates anomalies through structured decomposition
✓ 1NF requires atomic values (no arrays, no repeating groups)
✓ 2NF eliminates partial dependencies (applies to composite keys)
✓ 3NF eliminates transitive dependencies (non-prime → non-prime)
✓ BCNF is stricter than 3NF: Every determinant must be superkey
✓ 4NF eliminates multivalued dependencies (independent sets of values)
✓ 5NF eliminates join dependencies (cannot decompose further)
✓ Lossless decomposition: Join recovers original relation without spurious tuples
✓ Dependency preservation: Check constraints without joining tables
✓ Trade-off: BCNF may not preserve dependencies; 3NF guarantees both lossless and preservation
✓ Practical design: Normalize to 3NF/BCNF for OLTP, denormalize for OLAP/reporting
✓ Anomalies prevented: Update anomaly, insertion anomaly, deletion anomaly
REMEMBER: Normalization is not about perfection—it's about eliminating redundancy and anomalies while maintaining data integrity. Most real-world OLTP systems use 3NF as a good balance between theory and practice!