Introduction to Database

Unit 1β€’CLO01

Database systems form the backbone of modern information management, handling everything from banking transactions to social media interactions. A Database Management System (DBMS) is a collection of interrelated data and a set of programs to access that data, providing a convenient and efficient way to store and retrieve database information. The primary goal is to manage data in a way that minimizes redundancy and ensures data integrity while allowing multiple users concurrent access.

Evolution of Database Models

See how database technology evolved over time

Database Models Evolution

Evolution of Database Models

The Basics

A database is a collection of related data organized to serve multiple applications efficiently. Data represents recordable facts with implicit meaning, such as names, addresses, and balances. A DBMS serves as an interface between the database and end users or application programs.

Data vs Information (Quick Comparison)

DataInformation
Raw facts and figuresProcessed, organized, and meaningful output
May be unorganized (numbers, symbols, text)Interpretable and useful for decision-making
Input to a systemOutput of processing
Example: 95, 88, 76Example: Average = 86.33, Grade = A
By itself may not add valueAdds value and context

Key Characteristics of the Database Approach

  1. Self-describing nature: Database contains not only data but also metadata (data about data) in the system catalog

  2. Insulation between programs and data: Program-data independence allows changing data structure without modifying programs

  3. Support for multiple views: Different users can have different perspectives of the same data

  4. Sharing of data and multiuser transaction processing: Multiple users can access database simultaneously while maintaining consistency

Advantages of DBMS

Advantages of Using a DBMS

  • Controlling redundancy: Eliminates duplicate data storage
  • Restricting unauthorized access: Security and authorization mechanisms
  • Providing persistent storage: Data outlives programs that create it
  • Providing backup and recovery: Protection against hardware/software failures
  • Enforcing integrity constraints: Rules to maintain data accuracy
  • Permitting inference and actions using rules: Database triggers and stored procedures

Drawbacks of File Systems

Disadvantages of File-Based Systems

The file-based approach suffers from data redundancy, inconsistency, difficulty in accessing data, data isolation, integrity problems, atomicity issues, concurrent access anomalies, and security problems β€” all of which are addressed by a DBMS.

Technical Details

DATABASE MODELS EVOLUTION:

Database Models Overview

  1. HIERARCHICAL MODEL (1960s-1970s):

Hierarchical Data Model

Structure: Tree-like hierarchy with parent-child relationships
Characteristics:
β€’ One-to-many (1:N) relationships only
β€’ Single root for each tree
β€’ Each child has exactly one parent
β€’ Navigation through explicit paths

Example: IMS (Information Management System) by IBM
Limitations:
β€’ Difficulty handling many-to-many relationships
β€’ Data redundancy when representing complex relationships
β€’ Limited query flexibility
β€’ Reorganization is difficult

  1. NETWORK MODEL (1970s):

Network Data Model

Structure: Graph-based with records (nodes) and sets (edges)
Characteristics:
β€’ Many-to-many (M:N) relationships supported
β€’ Records connected through pointers
β€’ Set construct for 1:N relationships
β€’ CODASYL (Conference on Data Systems Languages) standard

Example: IDMS (Integrated Database Management System)
Advantages over Hierarchical:
β€’ More flexible relationship representation
β€’ Better for complex queries
Limitations:
β€’ Complex navigation logic
β€’ Difficult to modify structure
β€’ Application programs tightly coupled with database structure

  1. RELATIONAL MODEL (1970-present):

Relational Data Model

Foundation: E.F. Codd's relational model (1970)
Structure: Data organized in tables (relations)

Core Concepts:
β€’ Relation: Table with rows (tuples) and columns (attributes)
β€’ Domain: Set of allowed values for an attribute
β€’ Primary Key: Unique identifier for tuples
β€’ Foreign Key: Reference to primary key in another relation

Characteristics:
β€’ Simple tabular structure
β€’ Data independence (logical and physical)
β€’ Declarative query language (SQL)
β€’ Mathematical foundation (relational algebra/calculus)
β€’ ACID properties (Atomicity, Consistency, Isolation, Durability)

Advantages:
β€’ Structural independence: Changes in structure don't affect applications
β€’ Ad hoc query capability: Flexible data retrieval
β€’ Data integrity through constraints
β€’ Normalization to reduce redundancy

Examples: MySQL, PostgreSQL, Oracle, SQL Server, DB2

COMPARATIVE ANALYSIS:

FeatureHierarchicalNetworkRelational
StructureTreeGraphTables
Relationships1:NM:NAny
NavigationProceduralProceduralDeclarative
FlexibilityLowMediumHigh
Query LanguageProprietaryProprietarySQL (Standard)
Data IndependenceLowLowHigh
PerformanceFastFastModerate

Why Relational Model Dominates:

  1. Simplicity: Easy to understand and use
  2. Flexibility: Ad hoc queries without predefined access paths
  3. Data Independence: Separation of logical and physical views
  4. Standard Language: SQL is universally adopted
  5. Mathematical Foundation: Formal query optimization
  6. ACID Guarantees: Reliable transaction processing

Examples

EXAMPLE 1: HIERARCHICAL DATABASE

University Structure (Tree):
University
|
-------------------------
| |
Department Department
(Computer Science) (Mathematics)
| |
---------- ----------
| | | |
Faculty Students Faculty Students

Limitation: A student double-majoring requires duplication or complex workarounds.

EXAMPLE 2: NETWORK DATABASE

Student-Course Registration (Graph):
Students <-------- Enrollment --------> Courses
| | |
| (Properties: Grade, |
| Semester) |
| |
Departments Instructors

Network allows M:N relationships directly but requires complex pointer navigation.

EXAMPLE 3: RELATIONAL DATABASE

University Schema (Tables):

STUDENTS Table:

StudentIDNameMajorEmail
101John DoeCSjohn@uni.edu
102Jane SmithMathjane@uni.edu
103Bob JohnsonCSbob@uni.edu

COURSES Table:

CourseIDCourseNameCreditsDepartment
CS101Intro to Programming3CS
CS202Database Systems4CS
MATH101Calculus I4Math

ENROLLMENT Table (Junction):

StudentIDCourseIDSemesterGrade
101CS101Fall2024A
101CS202Fall2024B+
102MATH101Fall2024A
103CS101Fall2024B

SQL Query Examples:

-- Find all students enrolled in Database Systems
SELECT S.Name, S.Email
FROM STUDENTS S
JOIN ENROLLMENT E ON S.StudentID = E.StudentID
JOIN COURSES C ON E.CourseID = C.CourseID
WHERE C.CourseName = 'Database Systems';

-- Count enrollments per course
SELECT C.CourseName, COUNT(E.StudentID) as Enrollment_Count
FROM COURSES C
LEFT JOIN ENROLLMENT E ON C.CourseID = E.CourseID
GROUP BY C.CourseName;

REAL-WORLD APPLICATIONS:

  1. Banking Systems: Account management, transactions, customer data
    β€’ Relational model ensures ACID properties for financial transactions
    β€’ Complex queries for fraud detection and reporting

  2. Airline Reservation Systems: Flight schedules, bookings, passenger info
    β€’ High concurrency handling thousands of simultaneous bookings
    β€’ Complex relationships between flights, passengers, seats

  3. E-Commerce Platforms: Products, orders, customers, inventory
    β€’ Flexible schema for diverse product catalogs
    β€’ Transaction support for payment processing

  4. Healthcare Systems: Patient records, appointments, prescriptions
    β€’ Data integrity critical for patient safety
    β€’ HIPAA compliance through access control

Real-World Use

PRACTICAL USAGE AND IMPLEMENTATION:

  1. Choosing the Right Model:
    β€’ Use Relational for: General-purpose applications, ACID requirements, complex queries
    β€’ Consider NoSQL (modern evolution) for: Massive scale, flexible schema, specific access patterns

  2. Database Design Process:
    Step 1: Requirements Analysis - Understand what data to store and how it will be used
    Step 2: Conceptual Design - Create ER diagram showing entities and relationships
    Step 3: Logical Design - Convert ER to relational schema with normalization
    Step 4: Physical Design - Optimize with indexes, partitioning, clustering

  3. Real Implementation Example (MySQL):

-- Create database
CREATE DATABASE UniversityDB;
USE UniversityDB;

-- Create tables with constraints
CREATE TABLE Students (
StudentID INT PRIMARY KEY AUTO_INCREMENT,
Name VARCHAR(100) NOT NULL,
Email VARCHAR(100) UNIQUE,
Major VARCHAR(50),
EnrollmentDate DATE DEFAULT CURRENT_DATE
);

CREATE TABLE Courses (
CourseID VARCHAR(10) PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
Credits INT CHECK (Credits > 0),
Department VARCHAR(50)
);

CREATE TABLE Enrollment (
StudentID INT,
CourseID VARCHAR(10),
Semester VARCHAR(20),
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID, Semester),
FOREIGN KEY (StudentID) REFERENCES Students(StudentID)
ON DELETE CASCADE,
FOREIGN KEY (CourseID) REFERENCES Courses(CourseID)
ON UPDATE CASCADE
);

  1. Industry Best Practices:
    β€’ Always use foreign keys to maintain referential integrity
    β€’ Create indexes on frequently queried columns
    β€’ Regular backups and disaster recovery plans
    β€’ Monitor performance and optimize queries
    β€’ Implement proper access control and authentication

  2. Career Relevance:
    β€’ Database Administrator (DBA): Manages database systems, ensures performance and security
    β€’ Data Engineer: Designs and builds data pipelines
    β€’ Backend Developer: Implements database interactions in applications
    β€’ Data Analyst: Writes complex SQL queries for business insights

For exams

IMPORTANT EXAM QUESTIONS:

  1. Define Database and DBMS. What are the main characteristics of the database approach?
    Answer: Cover self-describing nature, program-data independence, multiple views, sharing

  2. List and explain the advantages of using a DBMS over file systems.
    Key points: Redundancy control, restricted access, persistent storage, backup/recovery, integrity constraints

  3. Compare and contrast Hierarchical, Network, and Relational database models.
    Create table showing: Structure, relationships supported, advantages, disadvantages, examples

  4. Why did the Relational Model become dominant over Hierarchical and Network models?
    Focus on: Simplicity, data independence, SQL standardization, flexibility

  5. Explain the concept of data independence with examples.
    Distinguish between logical and physical data independence

  6. What are the disadvantages of file-based systems that led to database systems?
    Points: Data redundancy, inconsistency, difficulty in access, integrity problems, atomicity issues, security problems

  7. Describe the structure and limitations of the Hierarchical database model.
    Include: Tree structure, 1:N relationships, navigation, IMS example

  8. Draw a simple example showing how M:N relationships are handled differently in Hierarchical, Network, and Relational models.

  9. What is a view in a database? Why are multiple views important?
    Explain: Security, customization, simplicity for different user groups

  10. Explain the term 'self-describing' nature of a database system.
    Focus on: System catalog, metadata, data dictionary

QUICK REVISION POINTS:
β€’ Database = Collection of related data
β€’ DBMS = Software to manage database
β€’ Hierarchical = Tree (1:N only)
β€’ Network = Graph (M:N supported)
β€’ Relational = Tables (Most flexible)
β€’ SQL = Standard query language for relational databases
β€’ Data Independence = Change structure without affecting programs

Key points

KEY TAKEAWAYS:

βœ“ A database is a collection of related data; DBMS is the software that manages it

βœ“ Database approach provides: data independence, controlled redundancy, concurrent access, backup/recovery, security

βœ“ Three major historical models: Hierarchical (tree), Network (graph), Relational (tables)

βœ“ Hierarchical model: Simple but limited to 1:N relationships, used in IBM's IMS

βœ“ Network model: More flexible with M:N relationships but complex navigation

βœ“ Relational model: Dominant today due to simplicity, SQL standardization, and data independence

βœ“ Relational databases organize data in tables with rows (tuples) and columns (attributes)

βœ“ SQL provides declarative querying - specify WHAT you want, not HOW to get it

βœ“ Modern applications almost exclusively use relational or NoSQL (evolution of database concepts)

βœ“ Understanding database evolution helps appreciate why certain design decisions were made

βœ“ Database systems are critical infrastructure for: banking, healthcare, e-commerce, government, education

βœ“ Career opportunities: DBA, Data Engineer, Backend Developer, Data Analyst all require database knowledge

REMEMBER: The shift from Hierarchical β†’ Network β†’ Relational represents increasing flexibility and decreasing complexity for users, while the system handles complexity internally.

ER Model Overview

Object-Oriented Data Model

Quick Quiz

1. Which database model uses a tree-like structure?

2. What does DBMS stand for?

3. Which model is most widely used today?