Relational Algebra and Operations

Unit 2•CLO04

Relational algebra provides a formal foundation for relational database operations. It's the theoretical basis for SQL and helps understand how queries are optimized. Just as arithmetic has operators (+, -, ×, ÷), relational algebra has operators that manipulate tables.

The Basics

Relational algebra is a procedural query language consisting of operations that take one or two relations as input and produce a new relation as output.

Fundamental Operations:

1. Selection (σ): Selects rows that satisfy a condition
σcondition(Relation)

2. Projection (π): Selects specific columns
πcolumns(Relation)

3. Union (∪): Combines tuples from two relations
R ∪ S

4. Set Difference (−): Tuples in R but not in S
R − S

5. Cartesian Product (×): All combinations of tuples
R × S

6. Rename (ρ): Renames relation or attributes
ρnew_name(Relation)

Derived Operations:

7. Intersection (∩): R ∩ S = R − (R − S)

8. Join (⋈): Combines related tuples from two relations

Natural Join (⋈): Equi-join on common attributes
Theta Join (⋈θ): Join with arbitrary condition
Equi-join: Join with equality condition
Outer Joins: Include unmatched tuples
• Left Outer Join (⟕): All from left, matched from right
• Right Outer Join (⟖): All from right, matched from left
• Full Outer Join (⟗): All from both

9. Division (÷): R ÷ S finds tuples in R that match all tuples in S

Properties:

Closure Property: Result of operation is also a relation

Composability: Operations can be nested
πname(σsalary>50000(Employee))

Technical Details

DETAILED OPERATIONS:

1. SELECTION (σ)

Syntax: σ<condition>(Relation)

Selects tuples that satisfy the condition.

Conditions:

Comparison: =, ≠, <, >, ≤, ≥
Logical: ∧ (AND), ∨ (OR), ¬ (NOT)

Properties:

Commutative: σc1(σc2(R)) = σc2(σc1(R))
Combine: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
Idempotent: σc(σc(R)) = σc(R)

Cost: O(n) where n = number of tuples

2. PROJECTION (π)

Syntax: π<attribute_list>(Relation)

Selects specified attributes, removes duplicates.

Properties:

Not commutative: πA(πB(R)) ≠ πB(πA(R)) unless A ⊆ B
Idempotent: πA(πA(R)) = πA(R)
Cascade: πA1(πA2(...πAn(R))) = πA1(R) if A1 ⊆ A2 ⊆ ... ⊆ An

Cost: O(n) + duplicate elimination

3. UNION (∪)

Syntax: R ∪ S

Combines tuples from both relations, removes duplicates.

Requirements (Union-compatible):

Same number of attributes
Corresponding attributes have compatible domains

Properties:

Commutative: R ∪ S = S ∪ R
Associative: (R ∪ S) ∪ T = R ∪ (S ∪ T)
Idempotent: R ∪ R = R

Cost: O(n + m) where n = |R|, m = |S|

4. SET DIFFERENCE (−)

Syntax: R − S

Tuples in R but not in S.

Requirements: Union-compatible

Properties:

NOT commutative: R − S ≠ S − R
NOT associative: (R − S) − T ≠ R − (S − T)

Identity: R − ∅ = R
Subset: If R ⊆ S, then R − S = ∅

5. CARTESIAN PRODUCT (×)

Syntax: R × S

All possible combinations of tuples from R and S.

Result:

Attributes: All from R, all from S (rename if duplicates)
Tuples: |R| × |S|

Properties:

NOT commutative (attribute order differs)
Associative: (R × S) × T = R × (S × T)

Cost: O(n × m)

6. RENAME (ρ)

Syntax:

ρX(R): Rename relation R to X
ρ(A1, A2, ..., An)(R): Rename attributes
ρX(A1, A2, ..., An)(R): Rename both

Used to:

Resolve name conflicts
Self-joins
Clarify attribute names

7. INTERSECTION (∩)

Syntax: R ∩ S

Tuples in both R and S.

Derived: R ∩ S = R − (R − S)
Also: R ∩ S = S − (S − R)

Requirements: Union-compatible

Properties:

Commutative: R ∩ S = S ∩ R
Associative: (R ∩ S) ∩ T = R ∩ (S ∩ T)
Idempotent: R ∩ R = R

8. JOIN OPERATIONS

Natural Join (⋈)

Syntax: R ⋈ S

Equi-join on all common attributes, one copy retained.

Algorithm:

Find common attributes
Equi-join on common attributes
Project to remove duplicate columns

Example:
Employee(EmpID, Name, DeptID)
Department(DeptID, DeptName)

Employee ⋈ Department:
Result has: EmpID, Name, DeptID, DeptName

Theta Join (⋈θ)

Syntax: R ⋈θ S where θ is a condition

General join with arbitrary condition.

Example: Employee ⋈salary>50000 Payroll

Expressed as: σsalary>50000(Employee × Payroll)

Equi-Join

Theta join where condition is equality (=).

Outer Joins

Left Outer Join (R ⟕ S):

All tuples from R
Matched tuples from S
Unmatched tuples from R padded with NULL for S attributes

Right Outer Join (R ⟖ S):

All tuples from S
Matched tuples from R
Unmatched tuples from S padded with NULL for R attributes

Full Outer Join (R ⟗ S):

All tuples from both
Unmatched tuples padded with NULL

9. DIVISION (÷)

Syntax: R ÷ S

Finds tuples in R that are associated with ALL tuples in S.

Use case: "Find X such that for all Y in S, (X, Y) is in R"

Example: Find students enrolled in ALL courses
Students_Courses ÷ Required_Courses

Algorithm:
R(A, B) ÷ S(B) = πA(R) − πA((πA(R) × S) − R)

QUERY OPTIMIZATION WITH RELATIONAL ALGEBRA

Equivalence Rules:

Cascade of σ: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
Commutativity of σ: σc1(σc2(R)) = σc2(σc1(R))
Cascade of π: πlist1(πlist2(...(πlistn(R)))) = πlist1(R)
Commuting σ with π: πA1,A2,...,An(σc(R)) = σc(πA1,A2,...,An(R)) if c only uses A1,...,An
Commutativity of ⋈: R ⋈ S = S ⋈ R
Associativity of ⋈: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
Distribute σ over ⋈: σc(R ⋈ S) = σc(R) ⋈ S (if c only uses R attributes)
Distribute π over ⋈: πL(R ⋈ S) = πL1(R) ⋈ πL2(S) where L = L1 ∪ L2

Optimization Strategy:

Perform selections early (reduce tuples)
Perform projections early (reduce attributes)
Combine selections and products into joins
Use most restrictive selection first
Evaluate selections in order of decreasing cardinality reduction

Examples

EXAMPLE SCHEMA:

Employee Table:

EmpID	Name	DeptID	Salary
101	Alice	10	60000
102	Bob	20	55000
103	Charlie	10	70000
104	Diana	30	50000

Department Table:

DeptID	DeptName	Location
10	IT	NYC
20	HR	LA
30	Sales	Chicago

Project Table:

ProjID	ProjName	DeptID
P1	Alpha	10
P2	Beta	20

EXAMPLE 1: SELECTION

Query: Find employees with salary > 55000

Relational Algebra:
σsalary>55000(Employee)

Result:

EmpID	Name	DeptID	Salary
101	Alice	10	60000
103	Charlie	10	70000

SQL:
SELECT * FROM Employee WHERE Salary > 55000;

EXAMPLE 2: PROJECTION

Query: Get employee names and salaries

Relational Algebra:
πName,Salary(Employee)

Result:

Name	Salary
Alice	60000
Bob	55000
Charlie	70000
Diana	50000

SQL:
SELECT Name, Salary FROM Employee;

EXAMPLE 3: COMBINED SELECTION AND PROJECTION

Query: Names of employees in Dept 10

Relational Algebra:
πName(σDeptID=10(Employee))

Result:

Name
Alice
Charlie

SQL:
SELECT Name FROM Employee WHERE DeptID = 10;

EXAMPLE 4: UNION

Query: All IDs from Employee and Department tables

Relational Algebra:
πEmpID(Employee) ∪ πDeptID(Department)

Note: Must rename for union-compatibility
ρ(ID)(πEmpID(Employee)) ∪ ρ(ID)(πDeptID(Department))

SQL:
SELECT EmpID AS ID FROM Employee
UNION
SELECT DeptID AS ID FROM Department;

EXAMPLE 5: SET DIFFERENCE

Query: Department IDs that have no employees

Relational Algebra:
πDeptID(Department) − πDeptID(Employee)

Result:

DeptID
(None in this example, all depts have employees)

SQL:
SELECT DeptID FROM Department
EXCEPT
SELECT DISTINCT DeptID FROM Employee;

EXAMPLE 6: CARTESIAN PRODUCT

Query: All combinations of employees and projects

Relational Algebra:
Employee × Project

Result: 4 employees × 2 projects = 8 tuples

SQL:
SELECT * FROM Employee, Project;
-- or
SELECT * FROM Employee CROSS JOIN Project;

EXAMPLE 7: NATURAL JOIN

Query: Employees with their department names

Relational Algebra:
Employee ⋈ Department

Result:

EmpID	Name	DeptID	Salary	DeptName	Location
101	Alice	10	60000	IT	NYC
102	Bob	20	55000	HR	LA
103	Charlie	10	70000	IT	NYC
104	Diana	30	50000	Sales	Chicago

SQL:
SELECT *
FROM Employee
NATURAL JOIN Department;
-- or explicitly
SELECT e.*, d.DeptName, d.Location
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID;

EXAMPLE 8: THETA JOIN

Query: Employees in departments located in NYC or LA

Relational Algebra:
Employee ⋈Location IN ('NYC','LA') Department

SQL:
SELECT e.*
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.Location IN ('NYC', 'LA');

EXAMPLE 9: LEFT OUTER JOIN

Assume we add a department with no employees:

Department Table (updated):

DeptID	DeptName	Location
10	IT	NYC
20	HR	LA
30	Sales	Chicago
40	Finance	Boston

Query: All departments with their employees (including depts with no employees)

Relational Algebra:
Department ⟕ Employee

Result:

DeptID	DeptName	Location	EmpID	Name	Salary
10	IT	NYC	101	Alice	60000
10	IT	NYC	103	Charlie	70000
20	HR	LA	102	Bob	55000
30	Sales	Chicago	104	Diana	50000
40	Finance	Boston	NULL	NULL	NULL

SQL:
SELECT *
FROM Department d
LEFT OUTER JOIN Employee e ON d.DeptID = e.DeptID;

EXAMPLE 10: DIVISION

Enrollment Table:

StudentID	CourseID
S1	C1
S1	C2
S1	C3
S2	C1
S2	C3
S3	C1
S3	C2
S3	C3

Required_Courses Table:

CourseID
C1
C2
C3

Query: Find students enrolled in ALL required courses

Relational Algebra:
πStudentID,CourseID(Enrollment) ÷ πCourseID(Required_Courses)

Result:

StudentID
S1
S3

SQL (using double negation):
SELECT DISTINCT StudentID
FROM Enrollment e1
WHERE NOT EXISTS (
SELECT CourseID
FROM Required_Courses
WHERE CourseID NOT IN (
SELECT CourseID
FROM Enrollment e2
WHERE e2.StudentID = e1.StudentID
)
);

EXAMPLE 11: COMPLEX QUERY

Query: Names and salaries of IT department employees earning > 55000

Relational Algebra:
πName,Salary(σDeptName='IT' ∧ Salary>55000(Employee ⋈ Department))

Optimized (push selection down):
πName,Salary(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))

Result:

Name	Salary
Alice	60000
Charlie	70000

SQL:
SELECT e.Name, e.Salary
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.DeptName = 'IT' AND e.Salary > 55000;

EXAMPLE 12: QUERY OPTIMIZATION

Original Query:
πName(σDeptName='IT'(Employee × Department))

Problems:

Cartesian product creates huge intermediate result
Selection after product

Optimized Query:
πName(σDeptID=DeptID(σSalary>55000(Employee) × σDeptName='IT'(Department)))

Better:
πName(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))

Best:
πName(σSalary>55000(Employee ⋈ σDeptName='IT'(Department)))

Real-World Use

PRACTICAL APPLICATIONS:

1. Database Query Optimization:

Understanding relational algebra helps optimize SQL queries.

Bad Query:
SELECT Name
FROM Employee, Department
WHERE Employee.DeptID = Department.DeptID
AND Department.DeptName = 'IT'
AND Salary > 55000;

Execution Plan:

Cartesian product (expensive!)
Filter conditions
Project Name

Better Query (optimizer does this):
SELECT e.Name
FROM (SELECT * FROM Employee WHERE Salary > 55000) e
JOIN (SELECT * FROM Department WHERE DeptName = 'IT') d
ON e.DeptID = d.DeptID;

2. Query Translation:

SQL → Relational Algebra → Optimized Algebra → Execution Plan

3. Understanding Join Algorithms:

Different join strategies implement ⋈:

Nested Loop Join: Iterate outer, for each tuple iterate inner
Hash Join: Hash one relation, probe with other
Sort-Merge Join: Sort both, merge

4. Design Query Processors:

Relational algebra forms basis of query processors in DBMS.

5. Verify Query Correctness:

Two SQL queries equivalent if their relational algebra expressions equivalent.

CONVERSION EXAMPLES:

SQL to Relational Algebra:

SQL:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 60000;

Relational Algebra:
πName,Salary(σDeptID=10 ∧ Salary>60000(Employee))

SQL:
SELECT e.Name, d.DeptName
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE e.Salary > 55000;

Relational Algebra:
πName,DeptName(σSalary>55000(Employee ⋈ Department))

Relational Algebra to SQL:

πName(σSalary>(SELECT AVG(Salary) FROM Employee)(Employee))

SQL:
SELECT Name
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);

TOOLS FOR PRACTICE:

RelaX (Web-based):
- Visual relational algebra
- Execute operations
- See results
RA (Relational Algebra Interpreter):
- Command-line tool
- Text-based queries
Database Textbook Simulators:
- Many textbooks provide online tools

INTERVIEW QUESTIONS:

Explain difference between natural join and theta join
Why is projection not commutative?
How to express intersection using other operations?
What is the result cardinality of R × S?
Give SQL for relational division
Why push selections down in query optimization?
Explain left vs right vs full outer join with examples

For exams

IMPORTANT EXAM QUESTIONS:

1. Define the following relational algebra operations with syntax and examples:

Selection (σ)
Projection (π)
Union (∪)
Set Difference (−)
Cartesian Product (×)
Join (⋈)

2. What are the conditions for two relations to be union-compatible?
Answer: Same number of attributes, corresponding attributes have compatible domains

3. Explain the difference between natural join and equi-join.
Answer: Natural join removes duplicate columns, equi-join keeps all columns

4. Given two tables, perform the following operations:

σcondition(R)
πattributes(R)
R ∪ S
R − S
R × S
R ⋈ S
R ⟕ S (Left Outer Join)

5. Convert the following SQL to relational algebra:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 50000;

6. Convert the following relational algebra to SQL:
πName,DeptName(σSalary>55000(Employee ⋈ Department))

7. Explain query optimization using relational algebra equivalence rules.
Show how to optimize:
πName(σDeptName='IT'(Employee × Department))

8. What is relational algebra division? Give an example use case.
Answer: R ÷ S finds tuples in R associated with ALL tuples in S
Example: Find students enrolled in all required courses

9. Compare and contrast the following:

Inner Join vs Outer Join
Left Outer Join vs Right Outer Join vs Full Outer Join
Natural Join vs Theta Join

10. Given a complex query, show step-by-step evaluation:
πName(σDeptID=10 ∧ Salary>60000(Employee ⋈ Department))

11. Explain the properties of relational algebra operations:
- Which operations are commutative?
- Which operations are associative?
- Which operations are idempotent?

12. Show how to express the following derived operations using fundamental operations:
- Intersection: R ∩ S = R − (R − S)
- Theta Join: R ⋈θ S = σθ(R × S)
- Natural Join using theta join

QUICK REVISION FORMULAS:

Basic Operations:

σcondition(R): Filter rows
πattributes(R): Filter columns
R ∪ S: Union (remove duplicates)
R − S: Difference (in R, not in S)
R × S: Cartesian product (all combinations)
ρnew_name(R): Rename

Derived Operations:

R ∩ S = R − (R − S): Intersection
R ⋈ S: Natural join (equi-join on common attributes)
R ⋈θ S = σθ(R × S): Theta join
R ⟕ S: Left outer join
R ⟖ S: Right outer join
R ⟗ S: Full outer join
R ÷ S: Division

Cardinality:

|σc(R)| ≤ |R|
|πA(R)| ≤ |R|
|R ∪ S| ≤ |R| + |S|
|R − S| ≤ |R|
|R × S| = |R| × |S|
|R ⋈ S| ≤ |R| × |S|

Optimization Rules:

Push selections down
Push projections down
Combine selections: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
Combine products with selections into joins
Use most restrictive selection first

Key points

KEY TAKEAWAYS:

✓ Relational algebra is the theoretical foundation of SQL and query processing

✓ Six fundamental operations: Selection (σ), Projection (π), Union (∪), Set Difference (−), Cartesian Product (×), Rename (ρ)

✓ Derived operations: Intersection, Join, Division (can be expressed using fundamental operations)

✓ Selection (σ) filters rows; Projection (π) filters columns

✓ Union requires union-compatible relations (same number/type of attributes)

✓ Cartesian product creates all combinations; size = |R| × |S|

✓ Natural join (⋈) is equi-join on common attributes with duplicate columns removed

✓ Outer joins preserve unmatched tuples from one or both relations

✓ Division (÷) finds "for all" relationships (e.g., students in ALL courses)

✓ Query optimization uses equivalence rules to rewrite queries efficiently

✓ Push selections down early to reduce intermediate result sizes

✓ Combine products with selections to form joins (more efficient)

✓ Understanding relational algebra helps write better SQL and understand query plans

✓ Closure property: Result of any operation is also a relation (can nest operations)

✓ Composability: Operations can be combined to form complex queries

REMEMBER: Relational algebra is procedural (how to compute), SQL is declarative (what to compute). The DBMS translates SQL to optimized relational algebra for execution!

←

Database Models: Relational, Network, Hierarchical, Object-Oriented

Tuple and Domain Relational Calculus

→