Relational Algebra and Operations
Relational algebra provides a formal foundation for relational database operations. It's the theoretical basis for SQL and helps understand how queries are optimized. Just as arithmetic has operators (+, -, ×, ÷), relational algebra has operators that manipulate tables.
The Basics
Relational algebra is a procedural query language consisting of operations that take one or two relations as input and produce a new relation as output.
Fundamental Operations:
1. Selection (σ): Selects rows that satisfy a condition
σcondition(Relation)
2. Projection (π): Selects specific columns
πcolumns(Relation)
3. Union (∪): Combines tuples from two relations
R ∪ S
4. Set Difference (−): Tuples in R but not in S
R − S
5. Cartesian Product (×): All combinations of tuples
R × S
6. Rename (ρ): Renames relation or attributes
ρnew_name(Relation)
Derived Operations:
7. Intersection (∩): R ∩ S = R − (R − S)
8. Join (⋈): Combines related tuples from two relations
- Natural Join (⋈): Equi-join on common attributes
- Theta Join (⋈θ): Join with arbitrary condition
- Equi-join: Join with equality condition
- Outer Joins: Include unmatched tuples
• Left Outer Join (⟕): All from left, matched from right
• Right Outer Join (⟖): All from right, matched from left
• Full Outer Join (⟗): All from both
9. Division (÷): R ÷ S finds tuples in R that match all tuples in S
Properties:
Closure Property: Result of operation is also a relation
Composability: Operations can be nested
πname(σsalary>50000(Employee))
Technical Details
DETAILED OPERATIONS:
1. SELECTION (σ)
Syntax: σ<condition>(Relation)
Selects tuples that satisfy the condition.
Conditions:
- Comparison: =, ≠, <, >, ≤, ≥
- Logical: ∧ (AND), ∨ (OR), ¬ (NOT)
Properties:
- Commutative: σc1(σc2(R)) = σc2(σc1(R))
- Combine: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
- Idempotent: σc(σc(R)) = σc(R)
Cost: O(n) where n = number of tuples
2. PROJECTION (π)
Syntax: π<attribute_list>(Relation)
Selects specified attributes, removes duplicates.
Properties:
- Not commutative: πA(πB(R)) ≠ πB(πA(R)) unless A ⊆ B
- Idempotent: πA(πA(R)) = πA(R)
- Cascade: πA1(πA2(...πAn(R))) = πA1(R) if A1 ⊆ A2 ⊆ ... ⊆ An
Cost: O(n) + duplicate elimination
3. UNION (∪)
Syntax: R ∪ S
Combines tuples from both relations, removes duplicates.
Requirements (Union-compatible):
- Same number of attributes
- Corresponding attributes have compatible domains
Properties:
- Commutative: R ∪ S = S ∪ R
- Associative: (R ∪ S) ∪ T = R ∪ (S ∪ T)
- Idempotent: R ∪ R = R
Cost: O(n + m) where n = |R|, m = |S|
4. SET DIFFERENCE (−)
Syntax: R − S
Tuples in R but not in S.
Requirements: Union-compatible
Properties:
- NOT commutative: R − S ≠ S − R
- NOT associative: (R − S) − T ≠ R − (S − T)
Identity: R − ∅ = R
Subset: If R ⊆ S, then R − S = ∅
5. CARTESIAN PRODUCT (×)
Syntax: R × S
All possible combinations of tuples from R and S.
Result:
- Attributes: All from R, all from S (rename if duplicates)
- Tuples: |R| × |S|
Properties:
- NOT commutative (attribute order differs)
- Associative: (R × S) × T = R × (S × T)
Cost: O(n × m)
6. RENAME (ρ)
Syntax:
- ρX(R): Rename relation R to X
- ρ(A1, A2, ..., An)(R): Rename attributes
- ρX(A1, A2, ..., An)(R): Rename both
Used to:
- Resolve name conflicts
- Self-joins
- Clarify attribute names
7. INTERSECTION (∩)
Syntax: R ∩ S
Tuples in both R and S.
Derived: R ∩ S = R − (R − S)
Also: R ∩ S = S − (S − R)
Requirements: Union-compatible
Properties:
- Commutative: R ∩ S = S ∩ R
- Associative: (R ∩ S) ∩ T = R ∩ (S ∩ T)
- Idempotent: R ∩ R = R
8. JOIN OPERATIONS
Natural Join (⋈)
Syntax: R ⋈ S
Equi-join on all common attributes, one copy retained.
Algorithm:
- Find common attributes
- Equi-join on common attributes
- Project to remove duplicate columns
Example:
Employee(EmpID, Name, DeptID)
Department(DeptID, DeptName)
Employee ⋈ Department:
Result has: EmpID, Name, DeptID, DeptName
Theta Join (⋈θ)
Syntax: R ⋈θ S where θ is a condition
General join with arbitrary condition.
Example: Employee ⋈salary>50000 Payroll
Expressed as: σsalary>50000(Employee × Payroll)
Equi-Join
Theta join where condition is equality (=).
Outer Joins
Left Outer Join (R ⟕ S):
- All tuples from R
- Matched tuples from S
- Unmatched tuples from R padded with NULL for S attributes
Right Outer Join (R ⟖ S):
- All tuples from S
- Matched tuples from R
- Unmatched tuples from S padded with NULL for R attributes
Full Outer Join (R ⟗ S):
- All tuples from both
- Unmatched tuples padded with NULL
9. DIVISION (÷)
Syntax: R ÷ S
Finds tuples in R that are associated with ALL tuples in S.
Use case: "Find X such that for all Y in S, (X, Y) is in R"
Example: Find students enrolled in ALL courses
Students_Courses ÷ Required_Courses
Algorithm:
R(A, B) ÷ S(B) = πA(R) − πA((πA(R) × S) − R)
QUERY OPTIMIZATION WITH RELATIONAL ALGEBRA
Equivalence Rules:
- Cascade of σ: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
- Commutativity of σ: σc1(σc2(R)) = σc2(σc1(R))
- Cascade of π: πlist1(πlist2(...(πlistn(R)))) = πlist1(R)
- Commuting σ with π: πA1,A2,...,An(σc(R)) = σc(πA1,A2,...,An(R)) if c only uses A1,...,An
- Commutativity of ⋈: R ⋈ S = S ⋈ R
- Associativity of ⋈: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)
- Distribute σ over ⋈: σc(R ⋈ S) = σc(R) ⋈ S (if c only uses R attributes)
- Distribute π over ⋈: πL(R ⋈ S) = πL1(R) ⋈ πL2(S) where L = L1 ∪ L2
Optimization Strategy:
- Perform selections early (reduce tuples)
- Perform projections early (reduce attributes)
- Combine selections and products into joins
- Use most restrictive selection first
- Evaluate selections in order of decreasing cardinality reduction
Examples
EXAMPLE SCHEMA:
Employee Table:
| EmpID | Name | DeptID | Salary |
|---|---|---|---|
| 101 | Alice | 10 | 60000 |
| 102 | Bob | 20 | 55000 |
| 103 | Charlie | 10 | 70000 |
| 104 | Diana | 30 | 50000 |
Department Table:
| DeptID | DeptName | Location |
|---|---|---|
| 10 | IT | NYC |
| 20 | HR | LA |
| 30 | Sales | Chicago |
Project Table:
| ProjID | ProjName | DeptID |
|---|---|---|
| P1 | Alpha | 10 |
| P2 | Beta | 20 |
EXAMPLE 1: SELECTION
Query: Find employees with salary > 55000
Relational Algebra:
σsalary>55000(Employee)
Result:
| EmpID | Name | DeptID | Salary |
|---|---|---|---|
| 101 | Alice | 10 | 60000 |
| 103 | Charlie | 10 | 70000 |
SQL:
SELECT * FROM Employee WHERE Salary > 55000;
EXAMPLE 2: PROJECTION
Query: Get employee names and salaries
Relational Algebra:
πName,Salary(Employee)
Result:
| Name | Salary |
|---|---|
| Alice | 60000 |
| Bob | 55000 |
| Charlie | 70000 |
| Diana | 50000 |
SQL:
SELECT Name, Salary FROM Employee;
EXAMPLE 3: COMBINED SELECTION AND PROJECTION
Query: Names of employees in Dept 10
Relational Algebra:
πName(σDeptID=10(Employee))
Result:
| Name |
|---|
| Alice |
| Charlie |
SQL:
SELECT Name FROM Employee WHERE DeptID = 10;
EXAMPLE 4: UNION
Query: All IDs from Employee and Department tables
Relational Algebra:
πEmpID(Employee) ∪ πDeptID(Department)
Note: Must rename for union-compatibility
ρ(ID)(πEmpID(Employee)) ∪ ρ(ID)(πDeptID(Department))
SQL:
SELECT EmpID AS ID FROM Employee
UNION
SELECT DeptID AS ID FROM Department;
EXAMPLE 5: SET DIFFERENCE
Query: Department IDs that have no employees
Relational Algebra:
πDeptID(Department) − πDeptID(Employee)
Result:
| DeptID |
|---|
| (None in this example, all depts have employees) |
SQL:
SELECT DeptID FROM Department
EXCEPT
SELECT DISTINCT DeptID FROM Employee;
EXAMPLE 6: CARTESIAN PRODUCT
Query: All combinations of employees and projects
Relational Algebra:
Employee × Project
Result: 4 employees × 2 projects = 8 tuples
SQL:
SELECT * FROM Employee, Project;
-- or
SELECT * FROM Employee CROSS JOIN Project;
EXAMPLE 7: NATURAL JOIN
Query: Employees with their department names
Relational Algebra:
Employee ⋈ Department
Result:
| EmpID | Name | DeptID | Salary | DeptName | Location |
|---|---|---|---|---|---|
| 101 | Alice | 10 | 60000 | IT | NYC |
| 102 | Bob | 20 | 55000 | HR | LA |
| 103 | Charlie | 10 | 70000 | IT | NYC |
| 104 | Diana | 30 | 50000 | Sales | Chicago |
SQL:
SELECT *
FROM Employee
NATURAL JOIN Department;
-- or explicitly
SELECT e.*, d.DeptName, d.Location
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID;
EXAMPLE 8: THETA JOIN
Query: Employees in departments located in NYC or LA
Relational Algebra:
Employee ⋈Location IN ('NYC','LA') Department
SQL:
SELECT e.*
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.Location IN ('NYC', 'LA');
EXAMPLE 9: LEFT OUTER JOIN
Assume we add a department with no employees:
Department Table (updated):
| DeptID | DeptName | Location |
|---|---|---|
| 10 | IT | NYC |
| 20 | HR | LA |
| 30 | Sales | Chicago |
| 40 | Finance | Boston |
Query: All departments with their employees (including depts with no employees)
Relational Algebra:
Department ⟕ Employee
Result:
| DeptID | DeptName | Location | EmpID | Name | Salary |
|---|---|---|---|---|---|
| 10 | IT | NYC | 101 | Alice | 60000 |
| 10 | IT | NYC | 103 | Charlie | 70000 |
| 20 | HR | LA | 102 | Bob | 55000 |
| 30 | Sales | Chicago | 104 | Diana | 50000 |
| 40 | Finance | Boston | NULL | NULL | NULL |
SQL:
SELECT *
FROM Department d
LEFT OUTER JOIN Employee e ON d.DeptID = e.DeptID;
EXAMPLE 10: DIVISION
Enrollment Table:
| StudentID | CourseID |
|---|---|
| S1 | C1 |
| S1 | C2 |
| S1 | C3 |
| S2 | C1 |
| S2 | C3 |
| S3 | C1 |
| S3 | C2 |
| S3 | C3 |
Required_Courses Table:
| CourseID |
|---|
| C1 |
| C2 |
| C3 |
Query: Find students enrolled in ALL required courses
Relational Algebra:
πStudentID,CourseID(Enrollment) ÷ πCourseID(Required_Courses)
Result:
| StudentID |
|---|
| S1 |
| S3 |
SQL (using double negation):
SELECT DISTINCT StudentID
FROM Enrollment e1
WHERE NOT EXISTS (
SELECT CourseID
FROM Required_Courses
WHERE CourseID NOT IN (
SELECT CourseID
FROM Enrollment e2
WHERE e2.StudentID = e1.StudentID
)
);
EXAMPLE 11: COMPLEX QUERY
Query: Names and salaries of IT department employees earning > 55000
Relational Algebra:
πName,Salary(σDeptName='IT' ∧ Salary>55000(Employee ⋈ Department))
Optimized (push selection down):
πName,Salary(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))
Result:
| Name | Salary |
|---|---|
| Alice | 60000 |
| Charlie | 70000 |
SQL:
SELECT e.Name, e.Salary
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE d.DeptName = 'IT' AND e.Salary > 55000;
EXAMPLE 12: QUERY OPTIMIZATION
Original Query:
πName(σDeptName='IT'(Employee × Department))
Problems:
- Cartesian product creates huge intermediate result
- Selection after product
Optimized Query:
πName(σDeptID=DeptID(σSalary>55000(Employee) × σDeptName='IT'(Department)))
Better:
πName(σSalary>55000(Employee) ⋈ σDeptName='IT'(Department))
Best:
πName(σSalary>55000(Employee ⋈ σDeptName='IT'(Department)))
Real-World Use
PRACTICAL APPLICATIONS:
1. Database Query Optimization:
Understanding relational algebra helps optimize SQL queries.
Bad Query:
SELECT Name
FROM Employee, Department
WHERE Employee.DeptID = Department.DeptID
AND Department.DeptName = 'IT'
AND Salary > 55000;
Execution Plan:
- Cartesian product (expensive!)
- Filter conditions
- Project Name
Better Query (optimizer does this):
SELECT e.Name
FROM (SELECT * FROM Employee WHERE Salary > 55000) e
JOIN (SELECT * FROM Department WHERE DeptName = 'IT') d
ON e.DeptID = d.DeptID;
2. Query Translation:
SQL → Relational Algebra → Optimized Algebra → Execution Plan
3. Understanding Join Algorithms:
Different join strategies implement ⋈:
- Nested Loop Join: Iterate outer, for each tuple iterate inner
- Hash Join: Hash one relation, probe with other
- Sort-Merge Join: Sort both, merge
4. Design Query Processors:
Relational algebra forms basis of query processors in DBMS.
5. Verify Query Correctness:
Two SQL queries equivalent if their relational algebra expressions equivalent.
CONVERSION EXAMPLES:
SQL to Relational Algebra:
SQL:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 60000;
Relational Algebra:
πName,Salary(σDeptID=10 ∧ Salary>60000(Employee))
SQL:
SELECT e.Name, d.DeptName
FROM Employee e
JOIN Department d ON e.DeptID = d.DeptID
WHERE e.Salary > 55000;
Relational Algebra:
πName,DeptName(σSalary>55000(Employee ⋈ Department))
Relational Algebra to SQL:
πName(σSalary>(SELECT AVG(Salary) FROM Employee)(Employee))
SQL:
SELECT Name
FROM Employee
WHERE Salary > (SELECT AVG(Salary) FROM Employee);
TOOLS FOR PRACTICE:
-
RelaX (Web-based):
- Visual relational algebra
- Execute operations
- See results
-
RA (Relational Algebra Interpreter):
- Command-line tool
- Text-based queries
-
Database Textbook Simulators:
- Many textbooks provide online tools
INTERVIEW QUESTIONS:
- Explain difference between natural join and theta join
- Why is projection not commutative?
- How to express intersection using other operations?
- What is the result cardinality of R × S?
- Give SQL for relational division
- Why push selections down in query optimization?
- Explain left vs right vs full outer join with examples
For exams
IMPORTANT EXAM QUESTIONS:
1. Define the following relational algebra operations with syntax and examples:
- Selection (σ)
- Projection (π)
- Union (∪)
- Set Difference (−)
- Cartesian Product (×)
- Join (⋈)
2. What are the conditions for two relations to be union-compatible?
Answer: Same number of attributes, corresponding attributes have compatible domains
3. Explain the difference between natural join and equi-join.
Answer: Natural join removes duplicate columns, equi-join keeps all columns
4. Given two tables, perform the following operations:
- σcondition(R)
- πattributes(R)
- R ∪ S
- R − S
- R × S
- R ⋈ S
- R ⟕ S (Left Outer Join)
5. Convert the following SQL to relational algebra:
SELECT Name, Salary
FROM Employee
WHERE DeptID = 10 AND Salary > 50000;
6. Convert the following relational algebra to SQL:
πName,DeptName(σSalary>55000(Employee ⋈ Department))
7. Explain query optimization using relational algebra equivalence rules.
Show how to optimize:
πName(σDeptName='IT'(Employee × Department))
8. What is relational algebra division? Give an example use case.
Answer: R ÷ S finds tuples in R associated with ALL tuples in S
Example: Find students enrolled in all required courses
9. Compare and contrast the following:
- Inner Join vs Outer Join
- Left Outer Join vs Right Outer Join vs Full Outer Join
- Natural Join vs Theta Join
10. Given a complex query, show step-by-step evaluation:
πName(σDeptID=10 ∧ Salary>60000(Employee ⋈ Department))
11. Explain the properties of relational algebra operations:
- Which operations are commutative?
- Which operations are associative?
- Which operations are idempotent?
12. Show how to express the following derived operations using fundamental operations:
- Intersection: R ∩ S = R − (R − S)
- Theta Join: R ⋈θ S = σθ(R × S)
- Natural Join using theta join
QUICK REVISION FORMULAS:
Basic Operations:
- σcondition(R): Filter rows
- πattributes(R): Filter columns
- R ∪ S: Union (remove duplicates)
- R − S: Difference (in R, not in S)
- R × S: Cartesian product (all combinations)
- ρnew_name(R): Rename
Derived Operations:
- R ∩ S = R − (R − S): Intersection
- R ⋈ S: Natural join (equi-join on common attributes)
- R ⋈θ S = σθ(R × S): Theta join
- R ⟕ S: Left outer join
- R ⟖ S: Right outer join
- R ⟗ S: Full outer join
- R ÷ S: Division
Cardinality:
- |σc(R)| ≤ |R|
- |πA(R)| ≤ |R|
- |R ∪ S| ≤ |R| + |S|
- |R − S| ≤ |R|
- |R × S| = |R| × |S|
- |R ⋈ S| ≤ |R| × |S|
Optimization Rules:
- Push selections down
- Push projections down
- Combine selections: σc1(σc2(R)) = σ(c1 ∧ c2)(R)
- Combine products with selections into joins
- Use most restrictive selection first
Key points
KEY TAKEAWAYS:
✓ Relational algebra is the theoretical foundation of SQL and query processing
✓ Six fundamental operations: Selection (σ), Projection (π), Union (∪), Set Difference (−), Cartesian Product (×), Rename (ρ)
✓ Derived operations: Intersection, Join, Division (can be expressed using fundamental operations)
✓ Selection (σ) filters rows; Projection (π) filters columns
✓ Union requires union-compatible relations (same number/type of attributes)
✓ Cartesian product creates all combinations; size = |R| × |S|
✓ Natural join (⋈) is equi-join on common attributes with duplicate columns removed
✓ Outer joins preserve unmatched tuples from one or both relations
✓ Division (÷) finds "for all" relationships (e.g., students in ALL courses)
✓ Query optimization uses equivalence rules to rewrite queries efficiently
✓ Push selections down early to reduce intermediate result sizes
✓ Combine products with selections to form joins (more efficient)
✓ Understanding relational algebra helps write better SQL and understand query plans
✓ Closure property: Result of any operation is also a relation (can nest operations)
✓ Composability: Operations can be combined to form complex queries
REMEMBER: Relational algebra is procedural (how to compute), SQL is declarative (what to compute). The DBMS translates SQL to optimized relational algebra for execution!