Unit V — Trees and Graphs

Syllabus Coverage: Binary Tree — representation, traversals (preorder, inorder, postorder). BST — insertion, deletion. AVL Tree. Graphs — representation (adjacency matrix, adjacency list). DFS, BFS, Topological sorting, Shortest Path algorithms. Priority Queues, Heap structures. Hashing — techniques and Hash functions. (09 Hours)

1 — Trees — Fundamentals

1.1 Tree Terminology
1.2 Binary Tree
1.3 Types of Binary Trees
1.4 K-ary Trees
1.5 Binary Tree Representations
1.6 Binary Tree Traversals
1.7 Constructing Binary Tree from Two Traversals
1.8 Expression Trees
1.9 Catalan Number & Counting Binary Trees
1.10 General Tree to Binary Tree Conversion

2 — Binary Search Tree (BST)

2.1 BST Property
2.2 BST Search
2.3 BST Insertion
2.4 BST Deletion
2.5 BST Complexity
2.6 Constructing BST from Preorder/Postorder
Practice Questions — BST

3 — AVL Tree

3.1 Concept & Balance Factor
3.2 AVL Rotations
3.3 AVL Insertion
3.4 AVL Deletion
3.5 Min/Max Nodes for AVL Tree of Height h
Practice Questions — AVL Tree

4 — Graphs

4.1 Graph Terminology
4.2 Graph Representations
4.3 Breadth-First Search (BFS)
4.4 Depth-First Search (DFS)
4.5 Topological Sorting
4.6 Shortest Path — Dijkstra’s Algorithm
Practice Questions — Graphs

5 — Priority Queues & Heaps

5.1 Priority Queue ADT
5.2 Binary Heap
5.3 Heap Operations
5.4 Build Heap — O(n) Algorithm
5.5 Number of Distinct Max Heaps
Practice Questions — Heaps

6 — Hashing

6.1 Introduction
6.2 Hash Functions
6.3 Collision Resolution
6.4 Digit Extraction & Other Hash Methods
6.5 Perfect, Minimal Perfect & Uniform Hashing
Practice Questions — Hashing

Comprehensive Unit V Practice Set

📝 View Complete Solutions →

Glossary — Key Terms

Term	Meaning
Tree	A hierarchical, non-linear data structure with a root node and subtrees (parent-child relationship).
Binary Tree	A tree where each node has at most 2 children (left and right).
BST	Binary Search Tree — left child < parent < right child.
AVL Tree	A self-balancing BST (height difference of subtrees ≤ 1).
Graph	A set of vertices (nodes) connected by edges.
BFS	Breadth-First Search — explores level by level using a queue.
DFS	Depth-First Search — explores as deep as possible using a stack/recursion.
Heap	A complete binary tree where parent ≥ children (max-heap) or parent ≤ children (min-heap).
Hashing	Mapping data to a fixed-size table index using a hash function for $O(1)$ average access.
Collision	When two different keys map to the same hash index.

1 — Trees — Fundamentals

1.1 Tree Terminology

            A           ← Root (level 0)
          /   \
        B       C       ← level 1
       / \       \
      D   E       F     ← level 2
         / \
        G   H           ← level 3

Term	Definition	Example from above
Root	Top-most node (no parent)	A
Parent	Node with children	B is parent of D, E
Child	Node below a parent	D, E are children of B
Leaf	Node with no children	D, G, H, F
Sibling	Nodes with same parent	B and C
Depth	Number of edges from root to node	Depth of E = 2
Height	Number of edges from node to deepest leaf	Height of B = 2
Height of tree	Height of root	3
Degree	Number of children of a node	Degree of E = 2, Degree of D = 0
Level	Depth + 1 (some books use depth = level)	Level of A = 0
Subtree	A node and all its descendants	Subtree rooted at B = {B, D, E, G, H}

1.2 Binary Tree

A Binary Tree is a tree data structure where each node has at most two children, typically called the left child and right child.

Properties:

Maximum nodes at level $l$ = $2^l$
Maximum total nodes with height $h$ = $2^{h+1} - 1$
Minimum height for $n$ nodes = $\lfloor \log_2 n \rfloor$
Number of leaf nodes $= n_0$, nodes with 2 children $= n_2$, then: $n_0 = n_2 + 1$

1.3 Types of Binary Trees

Type	Definition
Full (Strict) Binary Tree	Every node has 0 or 2 children (no node has exactly 1 child)
Complete Binary Tree	All levels completely filled except possibly the last, which is filled left to right
Perfect Binary Tree	All internal nodes have 2 children and all leaves are at the same level
Degenerate (Skewed) Tree	Each node has at most 1 child (essentially a linked list)

   Full:         Complete:        Perfect:       Skewed:
     A              A                A             A
    / \            / \             /   \            \
   B   C          B   C           B     C           B
  / \            / \   \         / \   / \           \
 D   E          D   E   F      D   E F   G           C
                                                      \
                                                       D

1.4 K-ary Trees

A K-ary tree is a rooted tree where each node has at most $k$ children. A binary tree is a special case with $k = 2$. A ternary tree has $k = 3$.

Key Formulas for K-ary Trees:

Property	Formula
Max nodes at level $l$	$k^l$
Max total nodes (height $h$)	$\displaystyle\frac{k^{h+1} - 1}{k - 1}$
Min height for $n$ nodes	$\lceil \log_k [n(k-1) + 1] \rceil - 1$
Internal nodes ($I$) & Total nodes ($N$)	$N = kI + 1$
Leaf nodes ($L$) & Internal nodes ($I$)	$L = (k-1)I + 1$
Leaf nodes ($L$) & Total nodes ($N$)	$L = \displaystyle\frac{N(k-1) + 1}{k}$

🔍 Understanding the Relationships:

$N = kI + 1$ — Each internal node contributes $k$ children. The “+1” accounts for the root, which is no one’s child. So total nodes = total children + 1 = kI + 1.

$L = (k-1)I + 1$ — From $N = I + L$ and $N = kI + 1$, we get $I + L = kI + 1$, so $L = (k-1)I + 1$.

For binary tree ($k=2$): $L = I + 1$ — the number of leaves is one more than internal nodes. This is the familiar result.

Example: A ternary tree ($k = 3$) with 4 internal nodes:

Total nodes: $N = 3(4) + 1 = 13$
Leaf nodes: $L = (3-1)(4) + 1 = 9$
Verify: $I + L = 4 + 9 = 13 = N$ ✔

Maximum nodes in a full K-ary tree of height 3 with $k = 3$:

\[\frac{3^{3+1} - 1}{3 - 1} = \frac{81 - 1}{2} = 40 \text{ nodes}\]

Quick Reference — Binary vs K-ary:

Property	Binary ($k=2$)	K-ary (general)
Max nodes at height $h$	$2^{h+1} - 1$	$(k^{h+1}-1)/(k-1)$
Leaves vs Internal	$L = I + 1$	$L = (k-1)I + 1$
Total vs Internal	$N = 2I + 1$	$N = kI + 1$

1.5 Binary Tree Representations

1. Array Representation

For a node at index $i$ (0-indexed):

Left child at index $2i + 1$
Right child at index $2i + 2$
Parent at index $\lfloor(i - 1) / 2\rfloor$

Tree:       A
           / \
          B   C
         / \
        D   E

Array: [A, B, C, D, E, _, _]
Index:  0  1  2  3  4  5  6

2. Linked Representation

typedef struct TreeNode {
    int data;
    struct TreeNode* left;
    struct TreeNode* right;
} TreeNode;

TreeNode* createTreeNode(int value) {
    TreeNode* node = (TreeNode*)malloc(sizeof(TreeNode));
    node->data = value;
    node->left = NULL;
    node->right = NULL;
    return node;
}

🔍 Why This Logic: Same pattern as linked list createNode, but with TWO child pointers (left, right) initialized to NULL. A freshly created tree node is a leaf — no children yet. The NULL values are critical: traversal algorithms check if (root == NULL) to know when they’ve hit a dead end.

1.6 Binary Tree Traversals

Traversal = Visiting every node in the tree exactly once in a specific order.

Three depth-first traversals:

In-order (LNR): Left → Node → Right
Pre-order (NLR): Node → Left → Right
Post-order (LRN): Left → Right → Node

Level-order traversal (BFS): Visit nodes level by level (left to right).

Example Tree

Recursive Implementations

// In-order: L → N → R
void inorder(TreeNode* root) {
    if (root == NULL) return;
    inorder(root->left);
    printf("%d ", root->data);
    inorder(root->right);
}
// Output: 4 2 5 1 3 6

// Pre-order: N → L → R
void preorder(TreeNode* root) {
    if (root == NULL) return;
    printf("%d ", root->data);
    preorder(root->left);
    preorder(root->right);
}
// Output: 1 2 4 5 3 6

// Post-order: L → R → N
void postorder(TreeNode* root) {
    if (root == NULL) return;
    postorder(root->left);
    postorder(root->right);
    printf("%d ", root->data);
}
// Output: 4 5 2 6 3 1

🔍 Why This Logic (All Three Traversals):

if (root == NULL) return — Base case for ALL traversals. When we reach a null child, there’s nothing to process. This is what stops the recursion from going infinitely.

The ONLY difference between the three is the position of printf:

In-order (LNR): Print BETWEEN the two recursive calls. For BSTs, this visits nodes in sorted order because all left-subtree values < node < all right-subtree values.

Pre-order (NLR): Print BEFORE recursive calls. Visits the root first — useful for copying/serializing a tree (you need to know the root before its children).

Post-order (LRN): Print AFTER recursive calls. Visits children before parent — useful for deletion (you must delete children before the parent) and evaluating expression trees.

Time: $O(n)$ for all three — every node is visited exactly once.

Space: $O(h)$ where $h$ is the tree height — the recursion stack depth. For a balanced tree $h = \log n$, for a skewed tree $h = n$.

Level-Order Traversal (using Queue)

void levelOrder(TreeNode* root) {
    if (root == NULL) return;
    
    Queue* q = createQueue();
    enqueue(q, root);
    
    while (!isEmpty(q)) {
        TreeNode* current = dequeue(q);
        printf("%d ", current->data);
        
        if (current->left)  enqueue(q, current->left);
        if (current->right) enqueue(q, current->right);
    }
}
// Output: 1 2 3 4 5 6

🔍 Why This Logic:

Why a queue? — A queue processes nodes in FIFO order. When we enqueue all children of the current level, they wait in line while the remaining nodes of the current level are processed first. This naturally produces level-by-level output.

enqueue(q, root) first — Seed the queue with the root. This is level 0.

Dequeue → process → enqueue children — For each node we process, we add its children to the BACK of the queue. By the time we finish all nodes at level $k$, all level $k+1$ nodes are in the queue.

Why not recursion? — The depth-first nature of recursion processes one branch fully before the other. Level-order needs to interleave between branches. A queue-based iterative approach is the natural fit.

This is BFS on a tree! — Level-order traversal IS breadth-first search applied to a tree structure.

Detailed In-order Trace for the tree above:

Call inorder(1)
  Call inorder(2)              ← go left
    Call inorder(4)            ← go left
      Call inorder(NULL) → return
      Print 4                  ✓
      Call inorder(NULL) → return
    Print 2                    ✓
    Call inorder(5)            ← go right
      Call inorder(NULL) → return
      Print 5                  ✓
      Call inorder(NULL) → return
  Print 1                      ✓
  Call inorder(3)              ← go right
    Call inorder(NULL) → return
    Print 3                    ✓
    Call inorder(6)            ← go right
      Call inorder(NULL) → return
      Print 6                  ✓
      Call inorder(NULL) → return

Result: 4 2 5 1 3 6

Quick Trick to Remember:

Pre-order → NLR → Node is printed first (pre = before)
In-order → LNR → Node is printed in the middle
Post-order → LRN → Node is printed last (post = after)

In-order traversal of a BST gives elements in sorted order!

1.7 Constructing Binary Tree from Two Traversals

Key Principle: A binary tree can be uniquely reconstructed from two traversals if one of them is inorder:

Inorder + Preorder → Unique tree ✔
Inorder + Postorder → Unique tree ✔
Preorder + Postorder → NOT unique (only works for full binary trees) ✗

Algorithm — Inorder + Preorder:

The first element of preorder is the root.
Find the root in the inorder sequence — elements to its left form the left subtree, elements to its right form the right subtree.
Recursively apply steps 1-2 to the left and right subtrees.

Given:

Inorder: D, B, E, A, F, C
Preorder: A, B, D, E, C, F

Step 1: Preorder first = A (root)

Step 2: In inorder, A splits: Left = {D, B, E}, Right = {F, C}

      A
     / \
  [D,B,E] [F,C]

Step 3: Left subtree — preorder portion = {B, D, E}. First = B (root). In inorder {D, B, E}: Left = {D}, Right = {E}

      A
     / \
    B   ?
   / \
  D   E

Step 4: Right subtree — preorder portion = {C, F}. First = C (root). In inorder {F, C}: Left = {F}, Right = {}

      A
     / \
    B   C
   / \ /
  D  E F

Verification:

Inorder (L-Root-R): D, B, E, A, F, C ✔
Preorder (Root-L-R): A, B, D, E, C, F ✔

typedef struct TreeNode {
    int data;
    struct TreeNode *left, *right;
} TreeNode;

TreeNode* buildTree(int inorder[], int preorder[], int inStart, int inEnd, int* preIndex) {
    if (inStart > inEnd) return NULL;
    
    TreeNode* node = (TreeNode*)malloc(sizeof(TreeNode));
    node->data = preorder[(*preIndex)++];
    node->left = node->right = NULL;
    
    if (inStart == inEnd) return node;  // Leaf node
    
    // Find root in inorder
    int inIndex;
    for (int i = inStart; i <= inEnd; i++) {
        if (inorder[i] == node->data) { inIndex = i; break; }
    }
    
    // Recursively build left and right subtrees
    node->left = buildTree(inorder, preorder, inStart, inIndex - 1, preIndex);
    node->right = buildTree(inorder, preorder, inIndex + 1, inEnd, preIndex);
    
    return node;
}

🔍 Line-by-Line Logic:

preorder[(*preIndex)++] — The preorder traversal visits root first. We consume elements from preorder one by one (left to right) as we build nodes. The preIndex is shared across all recursive calls.

Finding inIndex — Locating the root in inorder divides the array into left subtree elements (before root) and right subtree elements (after root).

Left subtree first — We build the left subtree BEFORE the right. This is critical because preorder is Root-Left-Right, so the next elements in preorder belong to the left subtree.

Why inorder is essential — Preorder/Postorder tell us which node is the root, but ONLY inorder tells us which nodes go left vs. right. Without inorder, the partition is ambiguous.

1.8 Expression Trees

Definition: An Expression Tree is a binary tree that represents an arithmetic expression:

Leaf nodes = operands (numbers/variables)
Internal nodes = operators (+, -, *, /)

Example: Expression (A + B) * (C - D)

        *
       / \
      +   -
     / \ / \
    A  B C  D

Traversals give different notations:

Traversal	Result	Notation
Inorder (L-Root-R)	A + B * C - D	Infix (with implicit parentheses)
Preorder (Root-L-R)	* + A B - C D	Prefix (Polish)
Postorder (L-R-Root)	A B + C D - *	Postfix (Reverse Polish)

Building an Expression Tree from Postfix:

Scan postfix expression left to right
If operand: Create a leaf node, push onto stack
If operator: Pop two nodes (right, then left), create a new node with operator, attach popped nodes as children, push new node

TreeNode* buildExprTree(char postfix[]) {
    TreeNode* stack[100];
    int top = -1;
    
    for (int i = 0; postfix[i] != '\0'; i++) {
        TreeNode* node = (TreeNode*)malloc(sizeof(TreeNode));
        node->data = postfix[i];
        node->left = node->right = NULL;
        
        if (isalnum(postfix[i])) {
            // Operand — push as leaf
            stack[++top] = node;
        } else {
            // Operator — pop two children
            node->right = stack[top--];
            node->left = stack[top--];
            stack[++top] = node;
        }
    }
    return stack[top];  // Root of expression tree
}

🔍 Why This Works: Postfix naturally gives us operands before their operator. When we see an operator, its two operands are the most recently seen values (on the stack). This is the same logic as postfix evaluation — but instead of computing, we’re building a tree.

1.9 Catalan Number & Counting Binary Trees

The $n$-th Catalan Number counts the number of structurally distinct binary trees with $n$ nodes:

\[C_n = \frac{1}{n+1}\binom{2n}{n} = \frac{(2n)!}{(n+1)! \cdot n!}\]

$n$	$C_n$	Meaning
0	1	1 empty tree
1	1	1 single-node tree
2	2	Left-child only OR right-child only
3	5	5 distinct shapes
4	14	14 distinct shapes
5	42	42 distinct shapes

The 5 distinct binary trees with 3 nodes:

  o      o      o        o      o
 /      /        \        \      |
o      o          o        o     o
 \    /            \      /      |
  o  o              o    o       o
                            (balanced)

Unlabeled vs Labeled Trees:

Unlabeled (structural): $C_n$ distinct shapes — Catalan number
Labeled (distinct values at each node): $C_n \times n!$ — each shape can have $n!$ different label arrangements

For $n = 3$: Unlabeled = 5 shapes. Labeled = $5 \times 6 = 30$ distinct labeled trees.

Catalan Number also counts:

Number of valid stack permutations of $n$ elements
Number of ways to parenthesize $n+1$ factors
Number of distinct BSTs with keys $1, 2, \ldots, n$
Number of paths from $(0,0)$ to $(n,n)$ that don’t cross the diagonal

1.10 General Tree to Binary Tree Conversion

Left-Child Right-Sibling (LCRS) Representation: Any general tree (where nodes can have any number of children) can be converted to a binary tree using the rule:

Left pointer → first (leftmost) child
Right pointer → next sibling

Example:

General Tree:               Binary Tree (LCRS):
       A                           A
     / | \                        /
    B  C  D                      B
   / \    |                     / \
  E   F   G                   E   C
                                \   \
                                 F   D
                                    /
                                   G

Conversion Steps:

Keep the leftmost child as the left child
Link siblings as right children of each other
Remove all links from parent to non-leftmost children

Step-by-step for the above tree:

A’s children: B, C, D → B is left child of A. C is right child of B. D is right child of C.
B’s children: E, F → E is left child of B. F is right child of E.
D’s children: G → G is left child of D.

Verification: Every node has at most 2 children (left = first child, right = next sibling) ✔

Why LCRS Matters:

Any $m$-ary tree (where each node has up to $m$ children) can be represented as a binary tree.
For 3-ary trees (ternary trees): each node has up to 3 children, but LCRS converts it to a binary tree.
Max internal nodes in an $m$-ary tree of height $h$: $\frac{m^h - 1}{m - 1}$ (geometric series)
For a complete 3-ary tree with $n$ internal nodes: number of leaves = $2n + 1$

2 — Binary Search Tree (BST)

2.1 BST Property

A Binary Search Tree is a binary tree where for every node:

All values in the left subtree are less than the node’s value
All values in the right subtree are greater than the node’s value
Both left and right subtrees are also BSTs

        50
       /  \
     30    70
    / \   / \
  20  40 60  80

In-order: 20 30 40 50 60 70 80 (sorted!)

2.2 BST Search

TreeNode* search(TreeNode* root, int key) {
    if (root == NULL || root->data == key)
        return root;
    
    if (key < root->data)
        return search(root->left, key);
    else
        return search(root->right, key);
}

🔍 Why This Logic:

root == NULL — Base case: reached a dead end, key doesn’t exist. Return NULL.

root->data == key — Found it! Return this node.

key < root->data → go left — The BST property guarantees everything smaller is in the left subtree. We can safely ignore the entire right subtree.

key > root->data → go right — Similarly, everything larger is in the right subtree.

Why $O(\log n)$ average? — Each comparison eliminates half the tree (like binary search on a sorted array). We follow one path from root to leaf, and a balanced tree has height $\log n$.

Why $O(n)$ worst case? — A skewed tree (all nodes on one side) has height $n$, so the search degrades to linear.

Search 40 in the BST above:

Step 1: Compare 40 with 50 → 40 < 50 → go LEFT
Step 2: Compare 40 with 30 → 40 > 30 → go RIGHT
Step 3: Compare 40 with 40 → FOUND! ✓

Only 3 comparisons (vs. up to 7 in linear search).

2.3 BST Insertion

New elements are always inserted at a leaf position.

TreeNode* insert(TreeNode* root, int value) {
    if (root == NULL)
        return createTreeNode(value);
    
    if (value < root->data)
        root->left = insert(root->left, value);
    else if (value > root->data)
        root->right = insert(root->right, value);
    // If value == root->data, do nothing (no duplicates)
    
    return root;
}

🔍 Line-by-Line Logic:

if (root == NULL) return createTreeNode(value) — We’ve found the correct empty spot! Create a new leaf node here. This is the base case.

root->left = insert(root->left, value) — The recursive call returns the (possibly new) subtree root. By assigning it back to root->left, we link the new node into the tree. If the left subtree existed, it returns unchanged. If it was NULL, it returns the new node.

This “return and reassign” pattern is elegant — it handles both the case where a new node is created and where the subtree is unchanged, without explicit if checks.

No duplicates — If value == root->data, we simply return without inserting. This is a design choice; some BSTs allow duplicates (e.g., in the right subtree).

New nodes are ALWAYS inserted as leaves — We traverse down to a NULL position and create the node there. No existing node is ever moved.

Insert 25 into the BST:

        50                    50
       /  \                  /  \
     30    70     →        30    70
    / \   / \             / \   / \
  20  40 60  80         20  40 60  80
                         \
                          25  ← new node

Path: 50 → 30 → 20 (right of 20 because 25 > 20)

2.4 BST Deletion

Three cases when deleting a node:

Case 1: Leaf Node (No Children)

Simply remove the node.

Delete 20:      50                 50
               /  \               /  \
             30    70    →      30    70
            / \   / \            \   / \
          20  40 60  80          40 60  80

Case 2: One Child

Replace the node with its only child.

Delete 30 (has right child 40):
        50                 50
       /  \               /  \
     30    70    →      40    70
      \   / \               / \
      40 60  80            60  80

Case 3: Two Children

Replace with in-order successor (smallest in right subtree) OR in-order predecessor (largest in left subtree), then delete that successor/predecessor.

Delete 50 (has two children):
        50                 60  ← in-order successor of 50
       /  \               /  \
     30    70    →      30    70
    / \   / \          / \      \
  20  40 60  80      20  40     80

Complete BST Deletion Code

TreeNode* findMin(TreeNode* root) {
    while (root->left != NULL)
        root = root->left;
    return root;
}

TreeNode* deleteNode(TreeNode* root, int key) {
    if (root == NULL) return NULL;
    
    if (key < root->data)
        root->left = deleteNode(root->left, key);
    else if (key > root->data)
        root->right = deleteNode(root->right, key);
    else {
        // Node found
        
        // Case 1: Leaf node
        if (root->left == NULL && root->right == NULL) {
            free(root);
            return NULL;
        }
        
        // Case 2: One child
        if (root->left == NULL) {
            TreeNode* temp = root->right;
            free(root);
            return temp;
        }
        if (root->right == NULL) {
            TreeNode* temp = root->left;
            free(root);
            return temp;
        }
        
        // Case 3: Two children
        TreeNode* successor = findMin(root->right);
        root->data = successor->data;
        root->right = deleteNode(root->right, successor->data);
    }
    return root;
}

🔍 Line-by-Line Logic:

findMin — The in-order successor (smallest value in right subtree) is found by going left as far as possible. This gives the smallest value that’s still larger than the deleted node.

Recursive navigation — if (key < root->data) go left, else go right. Same as search. Once found, handle three cases:

Case 1 (Leaf): Both children are NULL. Simply free and return NULL. The parent’s pointer gets NULL via the recursive reassignment.

Case 2 (One child): The node has exactly one child. Return that child (which replaces the deleted node). The subtree “moves up” one level. E.g., deleting a node with only a right child: return root->right — the parent now points to the grandchild.

Case 3 (Two children): The most complex case. We can’t just remove the node — both subtrees need a parent. Solution: copy the in-order successor’s value into this node, then delete the successor (which must be Case 1 or 2, since it has no left child).

Why in-order successor? — It’s the smallest value larger than the deleted node, so replacing with it preserves the BST property. In-order predecessor (largest in left subtree) works equally well.

2.5 BST Complexity

Operation	Average	Worst (Skewed)
Search	$O(\log n)$	$O(n)$
Insert	$O(\log n)$	$O(n)$
Delete	$O(\log n)$	$O(n)$
In-order traversal	$O(n)$	$O(n)$

Worst case happens when elements are inserted in sorted (ascending or descending) order — the tree becomes a skewed tree (essentially a linked list):

Insert 1, 2, 3, 4, 5 in order:
1
 \
  2
   \
    3
     \
      4
       \
        5       ← height = 4 = n-1 ⟹ O(n) per operation

This is why we need balanced BSTs like AVL trees!

2.6 Constructing BST from Preorder/Postorder

Key Insight: Unlike general binary trees, a BST can be uniquely reconstructed from preorder alone or postorder alone (because the BST property determines the left/right partition).

Algorithm — BST from Preorder:

First element is the root
Find the first element greater than root — everything before it goes to the left subtree, everything after goes to the right subtree
Recursively build left and right subtrees

Preorder: 10, 5, 1, 7, 40, 50

Root = 10
Elements < 10: {5, 1, 7} → left subtree. Elements > 10: {40, 50} → right subtree
Left: Root = 5, {1} left, {7} right
Right: Root = 40, {} left, {50} right

       10
      /  \
    5     40
   / \      \
  1   7     50

TreeNode* buildBSTFromPreorder(int preorder[], int* index, int min, int max, int n) {
    if (*index >= n) return NULL;
    
    int val = preorder[*index];
    if (val < min || val > max) return NULL;
    
    TreeNode* node = (TreeNode*)malloc(sizeof(TreeNode));
    node->data = val;
    (*index)++;
    
    node->left = buildBSTFromPreorder(preorder, index, min, val - 1, n);
    node->right = buildBSTFromPreorder(preorder, index, val + 1, max, n);
    
    return node;
}
// Call: int idx = 0; root = buildBSTFromPreorder(pre, &idx, INT_MIN, INT_MAX, n);

🔍 Line-by-Line Logic:

min, max bounds — Each recursive call narrows the valid range. The left subtree of a node with value val can only have values in [min, val-1], and the right subtree in [val+1, max].

if (val < min || val > max) return NULL — The current preorder value doesn’t belong in this subtree. Return NULL so the parent tries the other subtree.

Time: $O(n)$ — Each element is visited exactly once. The min/max bounds eliminate the need for searching.

BST from Postorder:

The logic is the same, but scan the postorder array from right to left and build the right subtree before the left:

TreeNode* buildBSTFromPostorder(int postorder[], int* index, int min, int max) {
    if (*index < 0) return NULL;
    
    int val = postorder[*index];
    if (val < min || val > max) return NULL;
    
    TreeNode* node = (TreeNode*)malloc(sizeof(TreeNode));
    node->data = val;
    (*index)--;
    
    // Build RIGHT subtree first (reverse of preorder logic)
    node->right = buildBSTFromPostorder(postorder, index, val + 1, max);
    node->left = buildBSTFromPostorder(postorder, index, min, val - 1);
    
    return node;
}
// Call: int idx = n - 1; root = buildBSTFromPostorder(post, &idx, INT_MIN, INT_MAX);

Number of distinct BSTs with keys $1, 2, \ldots, n$ = Catalan number $C_n$. This is because the structure of a BST is uniquely determined by the insertion order, and the number of distinct structures equals the Catalan number.

Practice Questions — BST

Q1. Construct a BST by inserting: 45, 15, 79, 90, 10, 55, 12, 20, 50

Q2. Delete nodes 15, 45, and 79 from the BST constructed in Q1. Show the tree after each deletion.

Q3. What is the in-order, pre-order, and post-order traversal of the tree in Q1?

Q4. Write recursive and iterative versions of BST search.

Q5. What happens if you insert a sorted array into a BST? How can you build a balanced BST from a sorted array?

3 — AVL Tree

3.1 Concept & Balance Factor

An AVL Tree (Adelson-Velsky and Landis) is a self-balancing BST where the balance factor of every node is $-1$, $0$, or $+1$.

\[\text{Balance Factor (BF)} = \text{Height(Left Subtree)} - \text{Height(Right Subtree)}\]

If $

> 1$ after an insertion or deletion, the tree is rebalanced using rotations.

Balanced (AVL):       Unbalanced (NOT AVL):
     30  BF=0              30  BF=2
    / \                   /
  20   40  BF=0         20    BF=1
                        /
                      10       ← BF of 30 = 2 > 1

3.2 AVL Rotations

Imbalance Type	When It Occurs	Rotation
LL (Left-Left)	Inserted in left subtree of left child	Single Right Rotation
RR (Right-Right)	Inserted in right subtree of right child	Single Left Rotation
LR (Left-Right)	Inserted in right subtree of left child	Left Rotation on child, then Right Rotation
RL (Right-Left)	Inserted in left subtree of right child	Right Rotation on child, then Left Rotation

LL Rotation (Right Rotation)

Problem (LL):             After Right Rotation:
       30                      20
      /                       / \
    20              →       10   30
   /
  10

TreeNode* rightRotate(TreeNode* y) {
    TreeNode* x = y->left;
    TreeNode* T2 = x->right;
    
    // Perform rotation
    x->right = y;
    y->left = T2;
    
    return x;  // New root
}

🔍 Line-by-Line Logic:

When to use: LL imbalance — the left subtree of the left child is too heavy.

x = y->left — x is the left child that will become the new root of this subtree.

T2 = x->right — Save x’s right subtree. This is the subtree that needs to be “rehomed” because x’s right will soon point to y.

x->right = y — The old root y becomes the right child of x. This is the actual rotation.

y->left = T2 — The saved subtree T2 goes to y’s left (where x used to be). This works because all values in T2 are between x and y (greater than x, less than y).

Returns x as the new subtree root. The BST property is preserved!

RR Rotation (Left Rotation)

Problem (RR):        After Left Rotation:
  10                      20
   \                     / \
    20          →      10   30
     \
      30

TreeNode* leftRotate(TreeNode* x) {
    TreeNode* y = x->right;
    TreeNode* T2 = y->left;
    
    // Perform rotation
    y->left = x;
    x->right = T2;
    
    return y;  // New root
}

🔍 Why This Logic:

Mirror of right rotation — Used for RR imbalance (right subtree of right child is too heavy).

y = x->right — The right child becomes the new root.

T2 = y->left — Save y’s left subtree (values between x and y).

y->left = x; x->right = T2 — x drops down to be y’s left child, and T2 fills the gap at x’s right.

LR and RL cases are just TWO rotations in sequence: LR = left-rotate child then right-rotate parent. RL = right-rotate child then left-rotate parent. They handle the “zig-zag” imbalance patterns.

LR Rotation (Left-Right Double Rotation)

Problem (LR):          Step 1: Left rotate 10    Step 2: Right rotate 30
    30                     30                         20
   /                      /                          / \
  10            →        20              →         10   30
   \                    /
    20                 10

RL Rotation (Right-Left Double Rotation)

Problem (RL):          Step 1: Right rotate 30    Step 2: Left rotate 10
  10                     10                           20
   \                      \                          / \
    30          →          20              →        10   30
   /                        \
  20                         30

3.3 AVL Insertion

Insert 3, 2, 1, 4, 5, 6, 7 into an AVL tree:

Insert 3:

  3 (BF=0)

Insert 2:

    3 (BF=1)
   /
  2 (BF=0)

Insert 1: (BF of 3 = 2 → LL case, right rotate)

      3 (BF=2)                    2 (BF=0)
     /                           / \
    2 (BF=1)       →           1     3
   /
  1

Right rotation at 3

Insert 4:

    2 (BF=-1)
   / \
  1   3 (BF=-1)
       \
        4

Insert 5: (BF of 3 = -2 → RR case, left rotate at 3)

    2 (BF=-2)                    2 (BF=-1)
   / \                          / \
  1   3 (BF=-2)     →        1     4 (BF=0)
       \                           / \
        4 (BF=-1)                3     5
         \
          5

Left rotation at 3
Then BF of 2 = -1, still ok

Insert 6: (BF of 2 = -2 → RR case, left rotate at 2)

      2 (BF=-2)                    4 (BF=0)
     / \                          / \
    1    4 (BF=-1)   →          2     5 (BF=-1)
        / \                    / \     \
       3   5 (BF=-1)         1   3     6
            \
             6

Left rotation at 2

Insert 7: (BF of 5 = -2 → RR case, left rotate at 5)

        4 (BF=-1)                    4 (BF=0)
       / \                          / \
      2     5 (BF=-2)   →        2     6 (BF=0)
     / \     \                   / \   / \
    1   3     6 (BF=-1)        1   3  5   7
               \
                7

Left rotation at 5

Final AVL tree:

Height = 2 for 7 nodes (optimal: $\lfloor\log_2 7\rfloor = 2$)

AVL trees guarantee $O(\log n)$ for search, insert and delete — even in the worst case. Unlike plain BSTs which can degrade to $O(n)$.

Operation	BST (worst)	AVL (worst)
Search	$O(n)$	$O(\log n)$
Insert	$O(n)$	$O(\log n)$
Delete	$O(n)$	$O(\log n)$

3.4 AVL Deletion

AVL Deletion follows the same three cases as BST deletion, but after deletion, the tree may become unbalanced. We must check balance factors and apply rotations bottom-up along the path from the deleted node to the root.

Steps:

Delete the node using standard BST deletion (3 cases: leaf, one child, two children)
Starting from the deleted node’s parent, walk up to the root
At each node, recompute the balance factor
If $ |BF| > 1 $, apply the appropriate rotation (LL, RR, LR, RL)
Unlike insertion, deletion may require rotations at multiple nodes along the path

Delete 8 from this AVL tree:

Step 1: Delete 8 (leaf node — just remove it)

Step 2: Check balance factors:

Node 5: BF = height(left) - height(right) = 2 - 0 = 2 → UNBALANCED!
Node 5’s left child (3) has BF = 1 → LL case (left-heavy, left child is left-heavy)

Step 3: Right rotation at 5:

All balance factors are now -1, 0, or 1. ✔

Deletion vs Insertion in AVL:

Insertion can cause at most one rebalancing (one single or double rotation)
Deletion can cause rebalancing at multiple nodes up to the root — the rotation at one level can cause imbalance at a higher level
Deletion is therefore slightly more complex than insertion

3.5 Min/Max Nodes for AVL Tree of Height h

Minimum nodes in an AVL tree of height $h$:

\[N(h) = N(h-1) + N(h-2) + 1\]

with base cases $N(0) = 1$, $N(1) = 2$.

This creates a Fibonacci-like sequence.

Height $h$	Min nodes $N(h)$	Max nodes
0	1	1
1	2	3
2	4	7
3	7	15
4	12	31
5	20	63

Maximum nodes in an AVL tree of height $h$ (same as any binary tree):

\[N_{\max}(h) = 2^{h+1} - 1\]

Why Fibonacci-like?

An AVL tree of height $h$ with minimum nodes must have subtrees of heights $h-1$ and $h-2$ (the maximum allowed imbalance). So:

One subtree is an AVL tree of height $h-1$ with minimum nodes → $N(h-1)$
The other subtree is an AVL tree of height $h-2$ with minimum nodes → $N(h-2)$
Plus the root → $+1$

This is why AVL trees have height $O(\log n)$: since $N(h) \approx \phi^h$ (where $\phi = 1.618…$), we get $h \leq 1.44 \log_2(n+2)$.

Maximum height of an AVL tree with $n$ nodes: $h \leq 1.44 \log_2 n$

Practice Questions — AVL Tree

Q1. Insert 50, 20, 60, 10, 8, 15, 32, 46, 11, 48 into an AVL tree. Show rotations at each rebalancing step.

Q2. What is a balance factor? For what values of BF is a tree AVL-balanced?

Q3. Explain all four types of AVL rotations with diagrams.

Q4. What is the maximum height of an AVL tree with $n$ nodes?

Q5. Compare BST and AVL tree. Why don’t we always use AVL trees?

4 — Graphs

4.1 Graph Terminology

A Graph $G = (V, E)$ consists of:

$V$ = set of vertices (nodes)
$E$ = set of edges (connections between vertices)

Term	Definition
Directed Graph (Digraph)	Edges have direction: $(u, v) \neq (v, u)$
Undirected Graph	Edges have no direction: $\{u, v\} = \{v, u\}$
Weighted Graph	Edges have associated weights/costs
Adjacent	Two vertices connected by an edge
Degree	Number of edges connected to a vertex
In-degree / Out-degree	(For directed) Number of incoming / outgoing edges
Path	Sequence of vertices connected by edges
Cycle	A path that starts and ends at the same vertex
Connected Graph	Every vertex is reachable from every other vertex
DAG	Directed Acyclic Graph — directed graph with no cycles

Undirected:          Directed:           Weighted:
  1 --- 2             1 → 2               1 --5-- 2
  |   / |             ↓   ↓               |      /|
  |  /  |             3 → 4               3    2  7
  | /   |                                 |  /    |
  3 --- 4                                 4 ---3- 5

4.2 Graph Representations

1. Adjacency Matrix

A 2D array of size $V \times V$. matrix[i][j] = 1 if edge exists from vertex $i$ to $j$.

Graph:            Adjacency Matrix:
  0 - 1             0  1  2  3
  |   |         0 [ 0  1  1  0 ]
  2 - 3         1 [ 1  0  0  1 ]
                2 [ 1  0  0  1 ]
                3 [ 0  1  1  0 ]

#define V 4
int adj[V][V] = {0};

void addEdge(int u, int v) {
    adj[u][v] = 1;
    adj[v][u] = 1;  // For undirected graph
}

🔍 Why This Logic:

Symmetric matrix — For an undirected graph, if there’s an edge from $u$ to $v$, there’s also one from $v$ to $u$. So we set BOTH adj[u][v] and adj[v][u].

Why $O(V^2)$ space? — We allocate a full $V \times V$ matrix even if the graph has very few edges. For sparse graphs (few edges), this wastes a lot of memory.

$O(1)$ edge check — The major advantage: just look up adj[u][v]. No traversal needed.

For directed graphs, only set adj[u][v] = 1 (remove the second line).

Space: $O(V^2)$
Check edge: $O(1)$
Find all neighbors: $O(V)$

2. Adjacency List

An array of linked lists. list[i] stores all neighbors of vertex $i$.

Graph:            Adjacency List:
  0 - 1          0 → [1] → [2]
  |   |          1 → [0] → [3]
  2 - 3          2 → [0] → [3]
                 3 → [1] → [2]

typedef struct AdjListNode {
    int dest;
    struct AdjListNode* next;
} AdjListNode;

typedef struct {
    AdjListNode* head;
} AdjList;

AdjList graph[V];

void addEdge(int u, int v) {
    AdjListNode* node = (AdjListNode*)malloc(sizeof(AdjListNode));
    node->dest = v;
    node->next = graph[u].head;
    graph[u].head = node;
    
    // For undirected, add reverse edge too
    node = (AdjListNode*)malloc(sizeof(AdjListNode));
    node->dest = u;
    node->next = graph[v].head;
    graph[v].head = node;
}

🔍 Why This Logic:

Head insertion — New neighbor is added at the HEAD of the linked list (same as stack push). This gives $O(1)$ insertion instead of $O(degree)$ if appending at the end.

Two insertions for undirected — Edge $(u,v)$ means $v$ is a neighbor of $u$ AND $u$ is a neighbor of $v$. Both adjacency lists must be updated.

Space: $O(V + E)$ — We store each edge twice (in both directions) for undirected graphs, plus one list head per vertex. Much better than $O(V^2)$ for sparse graphs.

Trade-off vs matrix — Checking if a specific edge exists requires traversing the linked list ($O(degree)$), whereas a matrix gives $O(1)$. But iterating over all neighbors is $O(degree)$ here vs $O(V)$ for a matrix.

Space: $O(V + E)$
Check edge: $O(degree)$
Find all neighbors: $O(degree)$

Feature	Adjacency Matrix	Adjacency List
Space	$O(V^2)$	$O(V + E)$
Add edge	$O(1)$	$O(1)$
Check edge	$O(1)$	$O(V)$
Dense graphs	Better	Wasteful
Sparse graphs	Wasteful	Better

4.3 Breadth-First Search (BFS)

BFS explores a graph level by level, visiting all neighbors of a node before moving deeper. Uses a queue.

void BFS(int start) {
    int visited[V] = {0};
    int queue[V], front = 0, rear = 0;
    
    visited[start] = 1;
    queue[rear++] = start;
    
    while (front < rear) {
        int current = queue[front++];
        printf("%d ", current);
        
        for (int i = 0; i < V; i++) {
            if (adj[current][i] == 1 && !visited[i]) {
                visited[i] = 1;
                queue[rear++] = i;
            }
        }
    }
}

🔍 Line-by-Line Logic:

visited array — Prevents revisiting nodes (which would cause infinite loops in graphs with cycles). Set to 1 when a node is first discovered, not when it’s processed.

visited[start] = 1 BEFORE enqueue — Critical! We mark visited when ENQUEUEING, not when dequeuing. If we wait until dequeue, a node could be enqueued multiple times by different neighbors. This wastes time and causes duplicate processing.

Queue drives the traversal — FIFO order means: process all neighbors of node A before processing neighbors of node B (if A was discovered first). This creates the level-by-level behavior.

adj[current][i] == 1 && !visited[i] — Check: (1) is there an edge? (2) haven’t visited this neighbor yet? If both true, discover it.

Time: $O(V^2)$ with matrix — For each vertex, we scan the entire row of the adjacency matrix (all $V$ entries). With an adjacency list, it’s $O(V + E)$ because we only visit actual neighbors.

BFS starting from vertex 0:

Graph:        0 - 1 - 4
              |   |
              2 - 3

Queue states:
[0]       → Visit 0, add neighbors 1, 2 → [1, 2]
[1, 2]    → Visit 1, add neighbors 3, 4 → [2, 3, 4]
[2, 3, 4] → Visit 2, add neighbor 3 (already visited) → [3, 4]
[3, 4]    → Visit 3, (neighbors visited) → [4]
[4]       → Visit 4, (neighbors visited) → []

BFS Order: 0 → 1 → 2 → 3 → 4

Time: $O(V + E)$ (adjacency list) or $O(V^2)$ (adjacency matrix)
Space: $O(V)$

4.4 Depth-First Search (DFS)

DFS explores a graph by going as deep as possible before backtracking. Uses a stack (or recursion).

int visited[V] = {0};

void DFS(int vertex) {
    visited[vertex] = 1;
    printf("%d ", vertex);
    
    for (int i = 0; i < V; i++) {
        if (adj[vertex][i] == 1 && !visited[i])
            DFS(i);
    }
}

🔍 Why This Logic:

Recursion IS the stack — Each recursive call pushes a frame onto the call stack. When we backtrack (return), we pop back to the previous node. This is why DFS uses a stack (either explicitly or via recursion).

visited[vertex] = 1 immediately — Mark as visited BEFORE exploring neighbors. This prevents infinite loops in cyclic graphs.

DFS(i) goes deep first — When we find an unvisited neighbor, we immediately dive into it (recursive call) before checking other neighbors of the current vertex. We only return to check the next neighbor after fully exploring the first one’s subtree.

Contrast with BFS — BFS explores all neighbors at the current depth before going deeper. DFS explores one neighbor’s entire subtree before moving to the next neighbor. Think of BFS as “horizontal” and DFS as “vertical”.

Global visited array — Declared outside the function so it persists across recursive calls. Each call modifies the same array.

DFS starting from vertex 0:

Graph:        0 - 1 - 4
              |   |
              2 - 3

DFS(0): Visit 0
  DFS(1): Visit 1
    DFS(3): Visit 3
      DFS(2): Visit 2
        (neighbors 0, 3 already visited)
      return to 3
    return to 1
    DFS(4): Visit 4
    return to 1
  return to 0

DFS Order: 0 → 1 → 3 → 2 → 4

Time: $O(V + E)$
Space: $O(V)$ (for visited array + recursion stack)

Feature	BFS	DFS
Data structure	Queue	Stack / Recursion
Order	Level by level	Go deep, then backtrack
Shortest path (unweighted)	✔ Yes	✗ No
Memory	More (stores entire level)	Less (stores path)
Use cases	Shortest path, level-order	Cycle detection, topological sort

4.5 Topological Sorting

Topological Sort is a linear ordering of vertices in a DAG (Directed Acyclic Graph) such that for every directed edge $(u, v)$, vertex $u$ comes before $v$.

Application: Task scheduling, course prerequisites, build dependencies

DFS-based Topological Sort

int stack[V], top = -1;
int visited[V] = {0};

void topologicalDFS(int vertex) {
    visited[vertex] = 1;
    
    for (int i = 0; i < V; i++) {
        if (adj[vertex][i] == 1 && !visited[i])
            topologicalDFS(i);
    }
    
    stack[++top] = vertex;  // Push after all descendants processed
}

void topologicalSort() {
    for (int i = 0; i < V; i++) {
        if (!visited[i])
            topologicalDFS(i);
    }
    
    // Print in reverse stack order
    printf("Topological Order: ");
    while (top >= 0)
        printf("%d ", stack[top--]);
}

🔍 Line-by-Line Logic:

stack[++top] = vertex AFTER recursing on all neighbors — This is the key insight. A vertex is pushed to the stack only after ALL its dependencies (descendants) have been processed. So the vertex that finishes last (the one with the most dependencies) ends up at the bottom of the stack.

Why DFS-based? — DFS naturally explores all descendants of a node before returning. By pushing to stack on return (post-order), nodes that depend on others are guaranteed to come BEFORE their dependencies when we pop.

for (int i = 0; i < V; i++) in the driver — We call DFS from every unvisited vertex because the graph may be disconnected. Some vertices might not be reachable from any single starting point.

Print in reverse stack order — The stack’s top has the vertex that was pushed last (finished last = has no dependencies), which should come FIRST in topological order. So we pop and print.

Only works on DAGs — A cyclic graph has no valid topological ordering (chicken-and-egg problem). If the graph has a cycle, this algorithm still produces output, but it won’t be a valid topological sort.

Topological Sort Example:

Course prerequisites:
C1 → C3, C2 → C3, C3 → C4, C3 → C5

    C1 ─→ C3 ─→ C4
           ↑      
    C2 ────┘──→ C5

One valid topological order: C1, C2, C3, C4, C5
Another: C2, C1, C3, C5, C4

Multiple valid orderings may exist!

4.6 Shortest Path — Dijkstra's Algorithm

Dijkstra’s Algorithm finds the shortest path from a source vertex to all other vertices in a weighted graph with non-negative weights.

Greedy approach: Always process the unvisited vertex with the smallest known distance.

Algorithm Steps

Initialize distance of source = 0, all others = ∞
Mark all vertices as unvisited
Pick unvisited vertex with minimum distance
Update distances of its neighbors
Repeat until all vertices are visited

#define INF 99999

void dijkstra(int graph[V][V], int src) {
    int dist[V], visited[V];
    
    for (int i = 0; i < V; i++) {
        dist[i] = INF;
        visited[i] = 0;
    }
    dist[src] = 0;
    
    for (int count = 0; count < V - 1; count++) {
        // Find minimum distance unvisited vertex
        int min = INF, u = -1;
        for (int v = 0; v < V; v++)
            if (!visited[v] && dist[v] < min) {
                min = dist[v];
                u = v;
            }
        
        visited[u] = 1;
        
        // Update distances of neighbors
        for (int v = 0; v < V; v++) {
            if (!visited[v] && graph[u][v] && 
                dist[u] + graph[u][v] < dist[v])
                dist[v] = dist[u] + graph[u][v];
        }
    }
    
    // Print results
    printf("Vertex  Distance from Source %d\n", src);
    for (int i = 0; i < V; i++)
        printf("  %d \t\t %d\n", i, dist[i]);
}

🔍 Line-by-Line Logic:

dist[i] = INF (initialization) — We assume all vertices are unreachable initially. Only the source has dist[src] = 0.

Greedy: pick minimum unvisited vertex — The inner loop finds the unvisited vertex u with the smallest known distance. This is the key greedy choice: we commit to this vertex’s shortest path.

Why greedy works here — With non-negative weights, once we pick the minimum-distance unvisited vertex, no future path through other vertices can be shorter (adding more non-negative edges only increases distance).

Relaxation: dist[u] + graph[u][v] < dist[v] — If going through u provides a shorter path to v than the current known distance, update it. This is called “relaxing” the edge $(u,v)$.

!visited[v] && graph[u][v] — Only consider unvisited neighbors with actual edges (graph[u][v] != 0). Visited vertices already have their final shortest distance.

$V-1$ iterations — We process one vertex per iteration. After $V-1$ iterations, all vertices are visited (source was already done).

Time: $O(V^2)$ — Finding minimum takes $O(V)$ per iteration, and there are $V$ iterations. Using a min-heap (priority queue) reduces this to $O((V+E)\log V)$.

Limitation: Doesn’t work with negative edge weights. For that, use Bellman-Ford.

Dijkstra’s Trace — Source = 0:

Graph:
    0 --4-- 1
    |       |
    8       2
    |       |
    2 --1-- 3
    |
    7
    |
    4

Weighted Adjacency Matrix:
      0   1   2   3   4
  0 [ 0   4   0   0   8 ]
  1 [ 4   0   0   2   0 ]
  2 [ 0   0   0   1   7 ]
  3 [ 0   2   1   0   0 ]
  4 [ 8   0   7   0   0 ]

Step	Visited	dist[1]	dist[2]	dist[3]	dist[4]	Pick
Init	{}	∞	∞	∞	∞	0
1	{0}	4	∞	∞	8	1
2	{0,1}	4	∞	6	8	3
3	{0,1,3}	4	7	6	8	2
4	{0,1,3,2}	4	7	6	8	4

Shortest distances from vertex 0: 0→0, 1→4, 2→7, 3→6, 4→8

Time Complexity: $O(V^2)$ with adjacency matrix, $O((V+E)\log V)$ with min-heap

Practice Questions — Graphs

Q1. Represent the following graph using adjacency matrix and adjacency list:

  A --- B
  |   / |
  |  /  |
  C --- D

Q2. Perform BFS and DFS on the graph from Q1 starting at vertex A.

Q3. What is topological sorting? Can it be applied to undirected graphs? Why/why not?

Q4. Run Dijkstra’s algorithm on a given weighted graph and show the complete trace.

Q5. Compare BFS and DFS — data structures used, time complexity, and applications.

Q6. What is a DAG? Give a real-life example of topological sorting.

5 — Priority Queues & Heaps

5.1 Priority Queue ADT

A Priority Queue is an ADT where each element has a priority. Elements with higher priority are served before elements with lower priority, regardless of insertion order.

ADT PriorityQueue:
    Operations:
        insert(element, priority)
        extractMax()  / extractMin()
        peek()
        isEmpty()

Implementation	Insert	Extract Min/Max	Peek
Unsorted Array	$O(1)$	$O(n)$	$O(n)$
Sorted Array	$O(n)$	$O(1)$	$O(1)$
Binary Heap	$O(\log n)$	$O(\log n)$	$O(1)$

5.2 Binary Heap

A Binary Heap is a complete binary tree that satisfies the heap property:

Max-Heap: Parent ≥ Children (root is maximum)
Min-Heap: Parent ≤ Children (root is minimum)

Stored as an array (because it’s a complete binary tree):

For node at index $i$ (0-indexed):

Parent: $\lfloor(i-1)/2\rfloor$
Left child: $2i + 1$
Right child: $2i + 2$

Max-Heap:                Array representation:
       90                [90, 80, 70, 50, 60, 30, 20]
      /  \                0   1   2   3   4   5   6
    80    70
   / \   / \
  50  60 30  20

5.3 Heap Operations

Insert (Heapify-Up) — $O(\log n)$

Add new element at the end of the array (last position)
Bubble up: Compare with parent, swap if needed, repeat

void insert(int heap[], int* size, int value) {
    heap[*size] = value;
    int i = *size;
    (*size)++;
    
    // Heapify-up
    while (i > 0 && heap[i] > heap[(i-1)/2]) {
        // Swap with parent
        int temp = heap[i];
        heap[i] = heap[(i-1)/2];
        heap[(i-1)/2] = temp;
        i = (i-1)/2;
    }
}

🔍 Line-by-Line Logic:

heap[*size] = value — Place the new element at the END of the array. This maintains the complete binary tree structure (filled left to right).

(i-1)/2 — Parent index formula (0-indexed). For node at index 5: parent = $(5-1)/2 = 2$. Integer division automatically floors.

Heapify-up (bubble up): The new element may violate the max-heap property (child > parent). We compare with parent and swap upward until the property is restored or we reach the root.

while (i > 0 && heap[i] > heap[(i-1)/2]) — Stop when: (1) we reach the root (i == 0), or (2) parent is already larger (heap property satisfied).

Why $O(\log n)$? — In the worst case, the new element bubbles all the way up to the root. The height of a complete binary tree with $n$ nodes is $\lfloor \log_2 n \rfloor$, so at most $\log n$ swaps.

Extract Max (Heapify-Down) — $O(\log n)$

Remove root (max element)
Move last element to root
Bubble down: Compare with children, swap with larger child, repeat

int extractMax(int heap[], int* size) {
    int max = heap[0];
    heap[0] = heap[*size - 1];
    (*size)--;
    
    // Heapify-down
    int i = 0;
    while (1) {
        int largest = i;
        int left = 2*i + 1;
        int right = 2*i + 2;
        
        if (left < *size && heap[left] > heap[largest])
            largest = left;
        if (right < *size && heap[right] > heap[largest])
            largest = right;
        
        if (largest == i) break;
        
        int temp = heap[i];
        heap[i] = heap[largest];
        heap[largest] = temp;
        i = largest;
    }
    return max;
}

🔍 Line-by-Line Logic:

max = heap[0] — The root is always the maximum in a max-heap. Save it for return.

heap[0] = heap[*size - 1] — Move the LAST element to the root. This maintains the complete tree structure but likely violates the heap property (the last element is probably not the largest).

Heapify-down (bubble down): Compare the root with its children. Swap with the LARGER child (to maintain max-heap). Repeat until the element settles.

Why swap with the LARGER child? — If we swapped with the smaller child, the larger child would be greater than its new parent, violating the heap property.

left < *size — Bounds check: the child index must be within the array. A node might have 0, 1, or 2 children.

if (largest == i) break — If neither child is larger, the heap property is restored. Stop.

i = largest — Move down to where we swapped and continue checking.

Why $O(\log n)$? — The element sinks at most from root to leaf: $\log n$ levels.

Insert 85 into max-heap [90, 80, 70, 50, 60, 30, 20]:

Step 1: Add 85 at end → [90, 80, 70, 50, 60, 30, 20, 85]

         90
        /  \
      80    70
     / \   / \
   50  60 30  20
   /
  85

Step 2: Bubble up — 85 > parent 50 → swap
  [90, 80, 70, 85, 60, 30, 20, 50]

Step 3: 85 > parent 80 → swap
  [90, 85, 70, 80, 60, 30, 20, 50]

Step 4: 85 < parent 90 → stop

Final:   90
        /  \
      85    70
     / \   / \
   80  60 30  20
   /
  50

5.4 Build Heap — O(n) Algorithm

Problem: Given an unsorted array of $n$ elements, convert it into a valid max-heap.

Naïve approach: Insert elements one by one → $O(n \log n)$

Optimal approach (Floyd’s algorithm): Start from the last non-leaf node and apply heapify-down on each node from bottom to top → $O(n)$

Algorithm:

The last non-leaf node is at index $\lfloor n/2 \rfloor - 1$
For each node from $\lfloor n/2 \rfloor - 1$ down to 0, apply heapify-down
Skip leaf nodes (they are trivially valid heaps)

void heapifyDown(int heap[], int n, int i) {
    int largest = i;
    int left = 2 * i + 1;
    int right = 2 * i + 2;
    
    if (left < n && heap[left] > heap[largest])
        largest = left;
    if (right < n && heap[right] > heap[largest])
        largest = right;
    
    if (largest != i) {
        int temp = heap[i];
        heap[i] = heap[largest];
        heap[largest] = temp;
        heapifyDown(heap, n, largest);  // Recursively fix the affected subtree
    }
}

void buildHeap(int heap[], int n) {
    // Start from last non-leaf node and heapify each
    for (int i = n / 2 - 1; i >= 0; i--) {
        heapifyDown(heap, n, i);
    }
}

🔍 Line-by-Line Logic:

n/2 - 1 — In a complete binary tree, all nodes from index $\lfloor n/2 \rfloor$ to $n-1$ are leaves. Leaves are already valid heaps (no children to violate the property). So we start from the last non-leaf.

Bottom-up — By processing from the bottom, we guarantee that when we heapify node $i$, both its subtrees are already valid heaps. This is the key to correctness.

Why $O(n)$ and not $O(n \log n)$? — Most nodes are near the bottom of the tree and need few swaps. Specifically:

$n/2$ nodes at the bottom need 0 swaps (leaves)

$n/4$ nodes need at most 1 swap

$n/8$ nodes need at most 2 swaps

Total: $\sum_{k=0}^{\log n} \frac{n}{2^{k+1}} \cdot k = O(n)$ (converges by geometric series)

Build max-heap from: [4, 10, 3, 5, 1, 8, 7]

Initial:           4
                 /   \
               10     3
              / \    / \
             5   1  8   7

Last non-leaf = index 2 (value 3)

Step 1: Heapify index 2 (value 3): children are 8, 7. Swap with 8.

            4
          /   \
        10     8
       / \    / \
      5   1  3   7

Step 2: Heapify index 1 (value 10): children are 5, 1. 10 > both → no swap.

Step 3: Heapify index 0 (value 4): children are 10, 8. Swap with 10.

           10
          /   \
         4     8
        / \   / \
       5   1 3   7

Then heapify index 1 (value 4): children are 5, 1. Swap with 5.

           10
          /   \
         5     8
        / \   / \
       4   1 3   7

Final max-heap: [10, 5, 8, 4, 1, 3, 7] ✔

Subtree Node Count in a Heap:

For a max-heap stored as an array with $n$ elements, the number of nodes in the subtree rooted at index $i$ can be computed as: if the subtree is a complete binary tree of height $h$, it has at most $2^{h+1} - 1$ nodes. The left subtree of the root has $\lceil (n-1)/2 \rceil$ nodes and the right subtree has $\lfloor (n-1)/2 \rfloor$ nodes.

Practice Questions — Heaps

Q1. Build a max-heap from: [4, 10, 3, 5, 1, 8, 7]. Show each step.

Q2. Perform 3 extractMax operations on the heap from Q1.

Q3. What is the difference between a max-heap and a min-heap?

Q4. Why is a binary heap stored as an array instead of using pointers?

Q5. Implement Heap Sort using a max-heap.

5.5 Number of Distinct Max Heaps

Problem: Given $n$ distinct elements, how many distinct max-heaps can be formed?

The root must always be the maximum element. The remaining $n-1$ elements are divided between the left and right subtrees based on the complete binary tree structure.

Formula (Recursive / DP):

\[T(n) = \binom{n-1}{L} \times T(L) \times T(R)\]

Where:

$L$ = number of elements in the left subtree
$R = n - 1 - L$ = number of elements in the right subtree
$\binom{n-1}{L}$ = ways to choose which $L$ elements go to the left subtree
$T(L)$, $T(R)$ = number of valid max-heaps for each subtree

Base cases: $T(0) = T(1) = 1$

🔍 Why This Formula:

Root is fixed — The maximum element must be the root (heap property), so we have $n-1$ elements to distribute.

$L$ is determined by the tree structure — In a complete binary tree of size $n$, the left subtree size is fixed. For the last level (which may be partially filled), compute how many nodes go left vs right.

$\binom{n-1}{L}$ choices — We pick which $L$ of the $n-1$ remaining elements go left (the relative ordering within each subtree is determined by the recursive max-heap constraint, but which elements go where is a choice).

Multiply — The left subtree can be independently arranged in $T(L)$ ways, the right in $T(R)$ ways, and the choice of which elements go left gives $\binom{n-1}{L}$.

Computing $L$ (left subtree size) for a complete binary tree of $n$ nodes:

\[h = \lfloor \log_2 n \rfloor\] \[\text{Max nodes in last level} = 2^h\] \[\text{Actual nodes in last level} = n - (2^h - 1)\] \[L = \begin{cases} (2^h - 1) + \min(\text{actual last level nodes},\ 2^{h-1}) & \text{(left gets all of its last level, or fills up)} \end{cases}\]

Simplified: $L = (2^{h-1} - 1) + \min(n - 2^h + 1,\ 2^{h-1})$

Example: $n = 4$

Complete binary tree shape:

$h = \lfloor \log_2 4 \rfloor = 2$, $L = 2$, $R = 1$
$T(4) = \binom{3}{2} \times T(2) \times T(1) = 3 \times 1 \times 1 = 3$

The 3 max-heaps from ${1, 2, 3, 4}$:

    4           4           4
   / \         / \         / \
  3   2       2   3       3   1
 /           /           /
1           1           2

Common Values:

$n$	Number of Distinct Max Heaps
1	1
2	1
3	2
4	3
5	8
6	20
7	80

6 — Hashing

6.1 Introduction

Hashing is a technique that maps data (keys) to fixed-size indices in a table (hash table) using a hash function, enabling $O(1)$ average-case access.

\[\text{index} = h(\text{key})\]

where $h$ is the hash function and index is a valid position in the table.

Key "apple"  →  h("apple") = 3   → HashTable[3] = "apple"
Key "banana" →  h("banana") = 7  → HashTable[7] = "banana"
Key "cherry" →  h("cherry") = 3  → COLLISION! (same index as "apple")

6.2 Hash Functions

A good hash function should be:

Fast to compute
Uniform — distributes keys evenly
Deterministic — same key always gives same index

Common Hash Functions

1. Division Method:

\[h(k) = k \mod m\]

where $m$ is the table size (preferably a prime number).

m = 10, key = 72 → h(72) = 72 % 10 = 2
m = 10, key = 45 → h(45) = 45 % 10 = 5

2. Multiplication Method:

\[h(k) = \lfloor m \cdot (kA \mod 1) \rfloor\]

where $A$ is a constant ($0 < A < 1$), Knuth suggests $A ≈ 0.6180339887$.

3. Mid-Square Method:

Square the key, take the middle digits as index.

key = 62 → 62² = 3844 → middle digits: 84 → index = 84

4. Folding Method:

Split key into parts, add them.

key = 123456 → 12 + 34 + 56 = 102 → h = 102 % table_size

6.3 Collision Resolution

A collision occurs when two different keys hash to the same index: $h(k_1) = h(k_2)$ but $k_1 \neq k_2$.

Collisions are inevitable (Pigeonhole Principle) and must be handled.

1. Separate Chaining (Open Hashing)

Each slot in the hash table stores a linked list of all elements that hash to that index.

Index 0: → [20] → [30] → NULL
Index 1: → [11] → NULL
Index 2: → NULL
Index 3: → [3] → [13] → [23] → NULL
...

#define TABLE_SIZE 10

typedef struct HashNode {
    int key;
    struct HashNode* next;
} HashNode;

HashNode* hashTable[TABLE_SIZE] = {NULL};

int hash(int key) {
    return key % TABLE_SIZE;
}

void insert(int key) {
    int index = hash(key);
    HashNode* newNode = (HashNode*)malloc(sizeof(HashNode));
    newNode->key = key;
    newNode->next = hashTable[index];
    hashTable[index] = newNode;
}

int search(int key) {
    int index = hash(key);
    HashNode* temp = hashTable[index];
    while (temp != NULL) {
        if (temp->key == key) return 1;  // Found
        temp = temp->next;
    }
    return 0;  // Not found
}

🔍 Line-by-Line Logic:

hash(key) = key % TABLE_SIZE — The division method. Maps any key to an index in [0, TABLE_SIZE-1]. Using a prime table size reduces clustering.

Insert uses head insertion — newNode->next = hashTable[index]; hashTable[index] = newNode; — Same $O(1)$ insertion as a linked list stack push. The new element becomes the head of the chain.

Search traverses the chain — Compute the index, then linearly scan the linked list at that index. Average case: $O(1 + \alpha)$ where $\alpha = n/m$ (load factor = number of elements / table size). Worst case: $O(n)$ if all keys hash to the same slot.

Why separate chaining works well — No clustering problems, no need to resize the table, handles any number of collisions gracefully. Trade-off: extra memory for linked list pointers.

Load factor $\alpha$ — When $\alpha$ is small (e.g., < 1), most chains are short and performance is near $O(1)$. When $\alpha$ grows large, chains get long and we should resize the table.

2. Open Addressing (Closed Hashing)

All elements are stored within the hash table itself. When a collision occurs, probe for the next available slot.

a) Linear Probing:

\[h(k, i) = (h'(k) + i) \mod m \quad \text{for } i = 0, 1, 2, \ldots\]

If slot is occupied, try the next slot, then the next, etc.

Insert keys 72, 45, 52, 91, 82 into table of size 10 using linear probing:

$h(k) = k \mod 10$

Key	$h(k)$	Probe Sequence	Final Index
72	2	2 (empty)	2
45	5	5 (empty)	5
52	2	2 (occupied), 3 (empty)	3
91	1	1 (empty)	1
82	2	2, 3, 4 (all occupied→empty at 4)	4

Index: 0    1    2    3    4    5    6    7    8    9
      [ ] [91] [72] [52] [82] [45] [ ] [ ] [ ] [ ]

Primary Clustering: Linear probing causes clusters of occupied slots. Once a cluster forms, it grows faster (new keys that hash near the cluster get added to it), degrading performance.

b) Quadratic Probing:

\[h(k, i) = (h'(k) + c_1 \cdot i + c_2 \cdot i^2) \mod m\]

Common simplification: $h(k, i) = (h’(k) + i^2) \mod m$

Probe sequence: $+0, +1, +4, +9, +16, \ldots$ — spreads out more than linear probing.

c) Random Probing:

\[h(k, i) = (h'(k) + r_i) \mod m\]

Where $r_1, r_2, r_3, \ldots$ is a pre-computed pseudo-random permutation of ${1, 2, \ldots, m-1}$.

🔍 How it differs from other probing:

Linear probing uses offsets $1, 2, 3, \ldots$ (sequential) → causes primary clustering.

Quadratic probing uses offsets $1, 4, 9, \ldots$ (squares) → reduces primary clustering but causes secondary clustering.

Random probing uses a pseudo-random sequence of offsets → eliminates both primary and secondary clustering.

Key requirement: The random sequence must be a permutation of ${1, \ldots, m-1}$ so that every slot is eventually probed (guarantees insertion succeeds if table isn’t full).

Not truly random — Both inserter and searcher must use the same pseudo-random sequence (same seed), so the probe sequence is reproducible.

d) Double Hashing:

\[h(k, i) = (h_1(k) + i \cdot h_2(k)) \mod m\]

Uses a second hash function to determine the probe step size. Best distribution among open addressing methods.

h₁(k) = k % 7
h₂(k) = 5 - (k % 5)

For k = 14: h₁ = 0, h₂ = 5 - 4 = 1
  Probes: 0, 1, 2, 3, ...

Comparison

Method	Clustering	Memory	Implementation
Separate Chaining	No clustering	Extra (linked list nodes)	Simple
Linear Probing	Primary clustering	No extra	Simple
Quadratic Probing	Secondary clustering	No extra	Moderate
Double Hashing	Minimal clustering	No extra	Complex

6.4 Digit Extraction & Other Hash Methods

Digit Extraction: Select specific digits from the key to form the hash value. The chosen digit positions should vary most across keys.

Key = 94521678
If we extract digits at positions 3, 5, 8 (1-indexed): 5, 1, 8 → hash = 518
If table size = 1000, index = 518

When to use: Works well when certain digit positions in keys have more uniform distribution than others (e.g., phone numbers — area codes cluster, but last 4 digits are more varied).

Comparison of All Hash Methods:

Method	Formula	Best For
Division	$k \mod m$	General purpose, simple
Multiplication	$\lfloor m(kA \mod 1) \rfloor$	When $m$ is a power of 2
Mid-Square	Middle digits of $k^2$	Numeric keys, uniform spread
Folding	Sum of key parts	Long numeric keys
Digit Extraction	Selected digits of $k$	Keys with known digit patterns

6.5 Perfect, Minimal Perfect & Uniform Hashing

Perfect Hashing: A hash function that maps $n$ keys to $m$ slots with NO collisions at all. Requires $m \geq n$ (table size ≥ number of keys).

\[h: S \to \{0, 1, \ldots, m-1\} \quad \text{such that} \quad h(k_i) \neq h(k_j) \text{ for all } i \neq j\]

Types of Perfect Hashing:

Type	Definition	Table Size	Collisions
Perfect	No collisions for a known key set	$m \geq n$ (may have empty slots)	0
Minimal Perfect	No collisions AND no wasted space	$m = n$ exactly	0, no empty slots
Uniform	Each slot is equally likely (idealized)	Any	Minimized statistically

Perfect Hashing — When it’s possible:

Must know ALL keys in advance (static key set)
Used in compilers (keyword lookup), spell checkers, network routing tables
FKS scheme (Fredman, Komlós, Szemerédi): Two-level hashing achieves $O(1)$ worst-case lookup with $O(n)$ space

Minimal Perfect Hashing:

Most space-efficient — every slot is used, $m = n$
Example: If keywords are {if, else, while, for, return}, a minimal perfect hash maps each to a unique index in [0-4] with no gaps

Uniform Hashing (Simple Uniform Hashing Assumption — SUHA):

Each key is equally likely to hash to any of the $m$ slots, independently of other keys:

\[P(h(k) = i) = \frac{1}{m} \quad \text{for all slots } i\]

This is an idealized model used for analysis. Under SUHA:

Expected chain length in separate chaining = $\alpha = n/m$ (load factor)
Expected number of probes in open addressing:
- Unsuccessful search: $\frac{1}{1 - \alpha}$
- Successful search: $\frac{1}{\alpha} \ln \frac{1}{1 - \alpha}$

Load Factor and Performance:

Load Factor $\alpha$	Avg. probes (unsuccessful)	Avg. probes (successful)	Status
0.5	2	1.39	Good
0.75	4	1.85	Acceptable
0.9	10	2.56	Degrading
1.0	$\infty$	—	Table full!

Rule of thumb: Keep $\alpha < 0.75$ for open addressing. Resize (double) the table when load exceeds this threshold.

Practice Questions — Hashing

Q1. What is hashing? Why is it useful?

Q2. Insert keys 25, 42, 96, 101, 57, 31, 67 into a hash table of size 7 using the division method. Show the final table for:

(a) Separate Chaining
(b) Linear Probing
(c) Quadratic Probing

Q3. What is a collision? Compare separate chaining and open addressing.

Q4. Explain linear probing, quadratic probing, and double hashing with examples.

Q5. What is the load factor $\alpha$? How does it affect performance?

Q6. Why should the hash table size be a prime number?

Comprehensive Unit V Practice Set

Short Answer

1. Define: binary tree, BST, AVL tree, complete binary tree, full binary tree.

2. What are the three depth-first traversals of a binary tree? Give an example of each.

3. Why is in-order traversal of a BST always sorted?

4. What is the balance factor in an AVL tree? When do rotations occur?

5. What is the difference between a graph and a tree?

6. What is hashing? Name three hash functions.

Tracing

7. Construct a BST by inserting 50, 30, 70, 20, 40, 60, 80. Write all three traversals.

8. Insert 30, 20, 10, 25, 40, 35 into an AVL tree. Show all rotations.

9. Run BFS and DFS on the following graph starting from vertex A:

  A --- B --- E
  |         / |
  C --- D     F

10. Run Dijkstra’s algorithm from vertex S:

  S --2-- A --3-- D
  |       |       |
  6       1       5
  |       |       |
  B --4-- C --2-- E

11. Build a max-heap from [3, 1, 6, 5, 2, 4]. Show each step. Then extract max twice.

12. Insert keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash table of size 11 using:

(a) Division method with separate chaining
(b) Linear probing

Long Answer / Essay

13. Explain binary tree traversals (in-order, pre-order, post-order) with recursive code and detailed trace on a sample tree.

14. Explain BST insertion and deletion with all three cases. Give code and trace.

15. Explain AVL tree rotations (LL, RR, LR, RL) with diagrams. Why are AVL trees preferred over plain BSTs?

16. Compare BFS and DFS — algorithm, data structure used, time/space complexity, applications. Trace both on the same graph.

17. Explain Dijkstra’s shortest path algorithm. Trace it on a weighted graph with at least 5 vertices.

18. Explain hashing with collision resolution techniques (separate chaining, linear probing, quadratic probing, double hashing). Compare all four methods with advantages and disadvantages.

19. Explain priority queues and binary heaps. Write heap insert and extract-max with code and example.

20. Explain topological sorting. When is it applicable? Give an algorithm and trace on a DAG.

📝 View Complete Solutions for Unit V →

← Back to DSA Index