Relational Algebra and Operations

Unit 2CLO04

Relational algebra provides a formal foundation for relational database operations. It's the theoretical basis for SQL and helps understand how queries are optimized. Just as arithmetic has operators (+, -, ×, ÷), relational algebra has operators that manipulate tables.

The Basics

Relational algebra is a procedural query language consisting of operations that take one or two relations as input and produce a new relation as output.

Fundamental Operations:

1. Selection (σ): Selects rows that satisfy a condition
σcondition(Relation)

2. Projection (π): Selects specific columns
πcolumns(Relation)

3. Union (∪): Combines tuples from two relations
R ∪ S

4. Set Difference (−): Tuples in R but not in S
R − S

5. Cartesian Product (×): All combinations of tuples
R × S

6. Rename (ρ): Renames relation or attributes
ρnew_name(Relation)

Derived Operations:

7. Intersection (∩): R ∩ S = R − (R − S)

8. Join (⋈): Combines related tuples from two relations

  • Natural Join (⋈): Equi-join on common attributes
  • Theta Join (⋈θ): Join with arbitrary condition
  • Equi-join: Join with equality condition
  • Outer Joins: Include unmatched tuples
    • Left Outer Join (⟕): All from left, matched from right
    • Right Outer Join (⟖): All from right, matched from left
    • Full Outer Join (⟗): All from both

9. Division (÷): R ÷ S finds tuples in R that match all tuples in S

Properties:

Closure Property: Result of operation is also a relation

Composability: Operations can be nested
πname(σsalary>50000(Employee))

Technical Details

DETAILED OPERATIONS:

1. SELECTION (σ)

Syntax: σ<condition>(Relation)

Selects tuples that satisfy the condition.

Conditions:

  • Comparison: =, ≠, <, >, ≤, ≥
  • Logical: ∧ (AND), ∨ (OR), ¬ (NOT)

Properties:

  • Commutative: σc1(σc2(R)) = σc2(σc1(R))
  • Combine: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
  • Idempotent: σc(σc(R)) = σc(R)

Cost: O(n) where n = number of tuples

2. PROJECTION (π)

Syntax: π<attribute_list>(Relation)

Selects specified attributes, removes duplicates.

Properties:

  • Not commutative: πA(πB(R)) ≠ πB(πA(R)) unless A ⊆ B
  • Idempotent: πA(πA(R)) = πA(R)
  • Cascade: πA1(πA2(...πAn(R))) = πA1(R) if A1 ⊆ A2 ⊆ ... ⊆ An

Cost: O(n) + duplicate elimination

3. UNION (∪)

Syntax: R ∪ S

Combines tuples from both relations, removes duplicates.

Requirements (Union-compatible):

  • Same number of attributes
  • Corresponding attributes have compatible domains

Properties:

  • Commutative: R ∪ S = S ∪ R
  • Associative: (R ∪ S) ∪ T = R ∪ (S ∪ T)
  • Idempotent: R ∪ R = R

Cost: O(n + m) where n = |R|, m = |S|

4. SET DIFFERENCE (−)

Syntax: R − S

Tuples in R but not in S.

Requirements: Union-compatible

Properties:

  • NOT commutative: R − S ≠ S − R
  • NOT associative: (R − S) − T ≠ R − (S − T)

Identity: R − ∅ = R
Subset: If R ⊆ S, then R − S = ∅

5. CARTESIAN PRODUCT (×)

Syntax: R × S

All possible combinations of tuples from R and S.

Result:

  • Attributes: All from R, all from S (rename if duplicates)
  • Tuples: |R| × |S|

Properties:

  • NOT commutative (attribute order differs)
  • Associative: (R × S) × T = R × (S × T)

Cost: O(n × m)

6. RENAME (ρ)

Syntax:

  • ρX(R): Rename relation R to X
  • ρ(A1, A2, ..., An)(R): Rename attributes
  • ρX(A1, A2, ..., An)(R): Rename both

Used to:

  • Resolve name conflicts
  • Self-joins
  • Clarify attribute names

7. INTERSECTION (∩)

Syntax: R ∩ S

Tuples in both R and S.

Derived: R ∩ S = R − (R − S)
Also: R ∩ S = S − (S − R)

Requirements: Union-compatible

Properties:

  • Commutative: R ∩ S = S ∩ R
  • Associative: (R ∩ S) ∩ T = R ∩ (S ∩ T)
  • Idempotent: R ∩ R = R

8. JOIN OPERATIONS

Natural Join (⋈)

Syntax: R ⋈ S

Equi-join on all common attributes, one copy retained.

Algorithm:

  1. Find common attributes
  2. Equi-join on common attributes
  3. Project to remove duplicate columns

Example:
Employee(EmpID, Name, DeptID)
Department(DeptID, DeptName)

Employee ⋈ Department:
Result has: EmpID, Name, DeptID, DeptName

Theta Join (⋈θ)

Syntax: R ⋈θ S where θ is a condition

General join with arbitrary condition.

Example: Employee ⋈salary>50000 Payroll

Expressed as: σsalary>50000(Employee × Payroll)

Equi-Join

Theta join where condition is equality (=).

Outer Joins

Left Outer Join (R ⟕ S):

  • All tuples from R
  • Matched tuples from S
  • Unmatched tuples from R padded with NULL for S attributes

Right Outer Join (R ⟖ S):

  • All tuples from S
  • Matched tuples from R
  • Unmatched tuples from S padded with NULL for R attributes

Full Outer Join (R ⟗ S):

  • All tuples from both
  • Unmatched tuples padded with NULL

9. DIVISION (÷)

Syntax: R ÷ S

Finds tuples in R that are associated with ALL tuples in S.

Use case: "Find X such that for all Y in S, (X, Y) is in R"

Example: Find students enrolled in ALL courses
Students_Courses ÷ Required_Courses

Algorithm:
R(A, B) ÷ S(B) = πA(R) − πA((πA(R) × S) − R)

QUERY OPTIMIZATION WITH RELATIONAL ALGEBRA

Equivalence Rules:

  1. Cascade of σ: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
  2. Commutativity of σ: σc1(σc2(R)) = σc2(σc1(R))
  3. Cascade of π: πlist1(πlist2(...(πlistn(R)))) = πlist1(R)
  4. Commuting σ with π: πA1,A2,...,An(σc(R)) = σc(πA1,A2,...,An(R)) if c only uses A1,...,An
  5. Commutativity of ⋈: R ⋈ S = S ⋈ R
  6. Associativity of ⋈: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
  7. Distribute σ over ⋈: σc(R ⋈ S) = σc(R) ⋈ S (if c only uses R attributes)
  8. Distribute π over ⋈: πL(R ⋈ S) = πL1(R) ⋈ πL2(S) where L = L1 ∪ L2

Optimization Strategy:

  1. Perform selections early (reduce tuples)
  2. Perform projections early (reduce attributes)
  3. Combine selections and products into joins
  4. Use most restrictive selection first
  5. Evaluate selections in order of decreasing cardinality reduction

Examples

EXAMPLE SCHEMA:

Employee Table:

EmpIDNameDeptIDSalary
101Alice1060000
102Bob2055000
103Charlie1070000
104Diana3050000

Department Table:

DeptIDDeptNameLocation
10ITNYC
20HRLA
30SalesChicago

Project Table:

ProjIDProjNameDeptID
P1Alpha10
P2Beta20

EXAMPLE 1: SELECTION

Query: Find employees with salary > 55000

Relational Algebra:
σsalary>55000(Employee)

Result:

EmpIDNameDeptIDSalary
101Alice1060000
103Charlie1070000

SQL:
SELECT * FROM Employee WHERE Salary > 55000;

EXAMPLE 2: PROJECTION

Query: Get employee names and salaries

Relational Algebra:
πName,Salary(Employee)

Result:

NameSalary
Alice60000
Bob55000
Charlie70000
Diana50000

SQL:
SELECT Name, Salary FROM Employee;

EXAMPLE 3: COMBINED SELECTION AND PROJECTION

Query: Names of employees in Dept 10

Relational Algebra:
πName(σDeptID=10(Employee))

Result:

Name
Alice
Charlie

SQL:
SELECT Name FROM Employee WHERE DeptID = 10;

EXAMPLE 4: UNION

Query: All IDs from Employee and Department tables

Relational Algebra:
πEmpID(Employee) ∪ πDeptID(Department)

Note: Must rename for union-compatibility
ρ(ID)(πEmpID(Employee)) ∪ ρ(ID)(πDeptID(Department))

SQL:
SELECT EmpID AS ID FROM Employee
UNION
SELECT DeptID AS ID FROM Department;

EXAMPLE 5: SET DIFFERENCE

Query: Department IDs that have no employees

Relational Algebra:
πDeptID(Department) − πDeptID(Employee)

Result:

DeptID
(None in this example, all depts have employees)

SQL:
SELECT DeptID FROM Department
EXCEPT
SELECT DISTINCT DeptID FROM Employee;

EXAMPLE 6: CARTESIAN PRODUCT

Query: All combinations of employees and projects

Relational Algebra:
Employee × Project

Result: 4 employees × 2 projects = 8 tuples

SQL:
SELECT * FROM Employee, Project;
-- or
SELECT * FROM Employee CROSS JOIN Project;

EXAMPLE 7: NATURAL JOIN

Query: Employees with their department names

Relational Algebra:
Employee ⋈ Department

Result:

EmpIDNameDeptIDSalaryDeptNameLocation
101Alice1060000ITNYC
102Bob2055000HRLA
103Charlie1070000ITNYC
104Diana3050000SalesChicago

SQL:
SELECT *
FROM Employee
NATURAL JOIN Department;
-- or explicitly
SELECT e.*, d.DeptName, d.Location
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID;

EXAMPLE 8: THETA JOIN

Query: Employees in departments located in NYC or LA

Relational Algebra:
Employee ⋈Location IN ('NYC','LA') Department

SQL:
SELECT e.*
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.Location IN ('NYC', 'LA');

EXAMPLE 9: LEFT OUTER JOIN

Assume we add a department with no employees:

Department Table (updated):

DeptIDDeptNameLocation
10ITNYC
20HRLA
30SalesChicago
40FinanceBoston

Query: All departments with their employees (including depts with no employees)

Relational Algebra:
Department ⟕ Employee

Result:

DeptIDDeptNameLocationEmpIDNameSalary
10ITNYC101Alice60000
10ITNYC103Charlie70000
20HRLA102Bob55000
30SalesChicago104Diana50000
40FinanceBostonNULLNULLNULL

SQL:
SELECT *
FROM Department d
LEFT OUTER JOIN Employee e ON d.DeptID = e.DeptID;

EXAMPLE 10: DIVISION

Enrollment Table:

StudentIDCourseID
S1C1
S1C2
S1C3
S2C1
S2C3
S3C1
S3C2
S3C3

Required_Courses Table:

CourseID
C1
C2
C3

Query: Find students enrolled in ALL required courses

Relational Algebra:
πStudentID,CourseID(Enrollment) ÷ πCourseID(Required_Courses)

Result:

StudentID
S1
S3

SQL (using double negation):
SELECT DISTINCT StudentID
FROM Enrollment e1
WHERE NOT EXISTS (
SELECT CourseID
FROM Required_Courses
WHERE CourseID NOT IN (
SELECT CourseID
FROM Enrollment e2
WHERE e2.StudentID = e1.StudentID
)
);

EXAMPLE 11: COMPLEX QUERY

Query: Names and salaries of IT department employees earning > 55000

Relational Algebra:
πName,Salary(σDeptName='IT' ∧ Salary>55000(Employee ⋈ Department))

Optimized (push selection down):
πName,Salary(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))

Result:

NameSalary
Alice60000
Charlie70000

SQL:
SELECT e.Name, e.Salary
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.DeptName = 'IT' AND e.Salary > 55000;

EXAMPLE 12: QUERY OPTIMIZATION

Original Query:
πName(σDeptName='IT'(Employee × Department))

Problems:

  1. Cartesian product creates huge intermediate result
  2. Selection after product

Optimized Query:
πName(σDeptID=DeptID(σSalary>55000(Employee) × σDeptName='IT'(Department)))

Better:
πName(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))

Best:
πName(σSalary>55000(Employee ⋈ σDeptName='IT'(Department)))

Real-World Use

PRACTICAL APPLICATIONS:

1. Database Query Optimization:

Understanding relational algebra helps optimize SQL queries.

Bad Query:
SELECT Name
FROM Employee, Department
WHERE Employee.DeptID = Department.DeptID
AND Department.DeptName = 'IT'
AND Salary > 55000;

Execution Plan:

  1. Cartesian product (expensive!)
  2. Filter conditions
  3. Project Name

Better Query (optimizer does this):
SELECT e.Name
FROM (SELECT * FROM Employee WHERE Salary > 55000) e
JOIN (SELECT * FROM Department WHERE DeptName = 'IT') d
ON e.DeptID = d.DeptID;

2. Query Translation:

SQL → Relational Algebra → Optimized Algebra → Execution Plan

3. Understanding Join Algorithms:

Different join strategies implement ⋈:

  • Nested Loop Join: Iterate outer, for each tuple iterate inner
  • Hash Join: Hash one relation, probe with other
  • Sort-Merge Join: Sort both, merge

4. Design Query Processors:

Relational algebra forms basis of query processors in DBMS.

5. Verify Query Correctness:

Two SQL queries equivalent if their relational algebra expressions equivalent.

CONVERSION EXAMPLES:

SQL to Relational Algebra:

SQL:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 60000;

Relational Algebra:
πName,Salary(σDeptID=10 ∧ Salary>60000(Employee))

SQL:
SELECT e.Name, d.DeptName
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE e.Salary > 55000;

Relational Algebra:
πName,DeptName(σSalary>55000(Employee ⋈ Department))

Relational Algebra to SQL:

πName(σSalary>(SELECT AVG(Salary) FROM Employee)(Employee))

SQL:
SELECT Name
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);

TOOLS FOR PRACTICE:

  1. RelaX (Web-based):

    • Visual relational algebra
    • Execute operations
    • See results
  2. RA (Relational Algebra Interpreter):

    • Command-line tool
    • Text-based queries
  3. Database Textbook Simulators:

    • Many textbooks provide online tools

INTERVIEW QUESTIONS:

  1. Explain difference between natural join and theta join
  2. Why is projection not commutative?
  3. How to express intersection using other operations?
  4. What is the result cardinality of R × S?
  5. Give SQL for relational division
  6. Why push selections down in query optimization?
  7. Explain left vs right vs full outer join with examples

For exams

IMPORTANT EXAM QUESTIONS:

1. Define the following relational algebra operations with syntax and examples:

  • Selection (σ)
  • Projection (π)
  • Union (∪)
  • Set Difference (−)
  • Cartesian Product (×)
  • Join (⋈)

2. What are the conditions for two relations to be union-compatible?
Answer: Same number of attributes, corresponding attributes have compatible domains

3. Explain the difference between natural join and equi-join.
Answer: Natural join removes duplicate columns, equi-join keeps all columns

4. Given two tables, perform the following operations:

  • σcondition(R)
  • πattributes(R)
  • R ∪ S
  • R − S
  • R × S
  • R ⋈ S
  • R ⟕ S (Left Outer Join)

5. Convert the following SQL to relational algebra:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 50000;

6. Convert the following relational algebra to SQL:
πName,DeptName(σSalary>55000(Employee ⋈ Department))

7. Explain query optimization using relational algebra equivalence rules.
Show how to optimize:
πName(σDeptName='IT'(Employee × Department))

8. What is relational algebra division? Give an example use case.
Answer: R ÷ S finds tuples in R associated with ALL tuples in S
Example: Find students enrolled in all required courses

9. Compare and contrast the following:

  • Inner Join vs Outer Join
  • Left Outer Join vs Right Outer Join vs Full Outer Join
  • Natural Join vs Theta Join

10. Given a complex query, show step-by-step evaluation:
πName(σDeptID=10 ∧ Salary>60000(Employee ⋈ Department))

11. Explain the properties of relational algebra operations:
- Which operations are commutative?
- Which operations are associative?
- Which operations are idempotent?

12. Show how to express the following derived operations using fundamental operations:
- Intersection: R ∩ S = R − (R − S)
- Theta Join: R ⋈θ S = σθ(R × S)
- Natural Join using theta join

QUICK REVISION FORMULAS:

Basic Operations:

  • σcondition(R): Filter rows
  • πattributes(R): Filter columns
  • R ∪ S: Union (remove duplicates)
  • R − S: Difference (in R, not in S)
  • R × S: Cartesian product (all combinations)
  • ρnew_name(R): Rename

Derived Operations:

  • R ∩ S = R − (R − S): Intersection
  • R ⋈ S: Natural join (equi-join on common attributes)
  • R ⋈θ S = σθ(R × S): Theta join
  • R ⟕ S: Left outer join
  • R ⟖ S: Right outer join
  • R ⟗ S: Full outer join
  • R ÷ S: Division

Cardinality:

  • |σc(R)| ≤ |R|
  • |πA(R)| ≤ |R|
  • |R ∪ S| ≤ |R| + |S|
  • |R − S| ≤ |R|
  • |R × S| = |R| × |S|
  • |R ⋈ S| ≤ |R| × |S|

Optimization Rules:

  1. Push selections down
  2. Push projections down
  3. Combine selections: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
  4. Combine products with selections into joins
  5. Use most restrictive selection first

Key points

KEY TAKEAWAYS:

Relational algebra is the theoretical foundation of SQL and query processing

Six fundamental operations: Selection (σ), Projection (π), Union (∪), Set Difference (−), Cartesian Product (×), Rename (ρ)

Derived operations: Intersection, Join, Division (can be expressed using fundamental operations)

Selection (σ) filters rows; Projection (π) filters columns

Union requires union-compatible relations (same number/type of attributes)

Cartesian product creates all combinations; size = |R| × |S|

Natural join (⋈) is equi-join on common attributes with duplicate columns removed

Outer joins preserve unmatched tuples from one or both relations

Division (÷) finds "for all" relationships (e.g., students in ALL courses)

Query optimization uses equivalence rules to rewrite queries efficiently

Push selections down early to reduce intermediate result sizes

Combine products with selections to form joins (more efficient)

Understanding relational algebra helps write better SQL and understand query plans

Closure property: Result of any operation is also a relation (can nest operations)

Composability: Operations can be combined to form complex queries

REMEMBER: Relational algebra is procedural (how to compute), SQL is declarative (what to compute). The DBMS translates SQL to optimized relational algebra for execution!