πŸ”’ Private Site

This site is password-protected.

Unit II β€” Linear Data Structures, Searching & Sorting

Syllabus Coverage: Concept of sequential organization, Arrays as ADT, Storage representation of Array, Array insertion and deletion, Matrix operations using arrays, String operations (Length, Concatenation, Copy, Palindrome, Reverse, Compare, Substring) without using library functions, Searching: Linear and Binary search algorithms and its analysis. Sorting: General concepts and algorithm analysis β€” Bubble sort, Insertion sort, Selection sort, Merge sort, Quick sort, Heap sort, stable and in-place sorting, sorting techniques in linear time β€” Counting sort. (09 Hours)


Table of Contents


Glossary β€” Key Terms

Term Meaning
Array A collection of elements of the same type stored in contiguous memory locations.
Index The position number used to access an array element (starts at 0 in C).
Row-Major Order 2D array elements stored row by row in memory.
Column-Major Order 2D array elements stored column by column in memory.
String A sequence of characters terminated by a null character ('\0') in C.
Linear Search Sequentially check each element until the target is found or the array ends.
Binary Search Divide a sorted array in half repeatedly to find the target.
Stable Sort A sort that preserves the relative order of records with equal keys.
In-Place Sort A sort that uses $O(1)$ or $O(\log n)$ extra memory (no large auxiliary arrays).
Divide & Conquer A strategy that splits a problem into subproblems, solves each, then combines results.
Pivot The element chosen in Quick Sort around which partitioning occurs.
Heap A complete binary tree satisfying the heap property (max-heap or min-heap).

1 β€” Arrays


1.1 Sequential Organization

Definition: Sequential organization means data elements are stored in consecutive (contiguous) memory locations. The position of each element is determined by its index.

In sequential organization:

  • Elements are stored one after another in memory
  • The address of any element can be calculated directly from its index
  • Random access is possible (access any element in $O(1)$ time)
  • Insertion and deletion are expensive (require shifting)

The array is the most fundamental sequential data structure.


1.2 Array as an ADT

ADT Array:
    Data:
        A fixed-size, ordered collection of elements of the same type
        Each element is identified by an index (0 to n-1)
    
    Operations:
        create(size)         β†’ Create array of given size
        get(index)           β†’ Return element at index         β€” O(1)
        set(index, value)    β†’ Set element at index to value   β€” O(1)
        insert(index, value) β†’ Insert value at index (shift)   β€” O(n)
        delete(index)        β†’ Delete element at index (shift) β€” O(n)
        search(value)        β†’ Find index of value             β€” O(n)
        length()             β†’ Return number of elements       β€” O(1)
        display()            β†’ Print all elements              β€” O(n)
    
    Error Conditions:
        get/set/insert/delete with invalid index β†’ "Index out of bounds"
        insert on full array β†’ "Array overflow"

1.3 Storage Representation of Arrays

One-Dimensional Array

If an array A of $n$ elements starts at base address $B$, and each element takes $w$ bytes:

\[\text{Address of } A[i] = B + i \times w\]

Example: int A[5] starting at address 1000, sizeof(int) = 4

Index Element Address
0 A[0] 1000
1 A[1] 1004
2 A[2] 1008
3 A[3] 1012
4 A[4] 1016

Address of A[3] = $1000 + 3 \times 4 = 1012$ βœ”

Two-Dimensional Array β€” Row-Major Order

Elements are stored row by row.

For array A[m][n] starting at base address $B$:

\[\text{Address of } A[i][j] = B + (i \times n + j) \times w\]

Example: int A[3][4] at base address 2000

Memory layout (row-major):

A[0][0] A[0][1] A[0][2] A[0][3] | A[1][0] A[1][1] A[1][2] A[1][3] | A[2][0] ...
2000    2004    2008    2012     | 2016    2020    2024    2028     | 2032    ...

Address of A[1][2] = $2000 + (1 \times 4 + 2) \times 4 = 2000 + 24 = 2024$ βœ”

Two-Dimensional Array β€” Column-Major Order

Elements are stored column by column.

\[\text{Address of } A[i][j] = B + (j \times m + i) \times w\]

Example: Same A[3][4] at base 2000 in column-major:

Address of A[1][2] = $2000 + (2 \times 3 + 1) \times 4 = 2000 + 28 = 2028$

C uses Row-Major order. FORTRAN uses Column-Major order. Exam questions often ask you to compute addresses in both orders.

General Formula β€” Arbitrary Lower Bounds

The formulas above assume indexing starts at 0. In many languages (FORTRAN, Pascal) and in exam problems, arrays can start from any index β€” including negative values.

1D Array β€” A[l₁ ... u₁] (lower bound $l_1$, upper bound $u_1$):

\[\text{Address of } A[i] = B + (i - l_1) \times w\]

Number of elements: $n = u_1 - l_1 + 1$

Example: int A[-3 ... 4] starting at base address 500, sizeof(int) = 4

Here $l_1 = -3$, $u_1 = 4$, $n = 4 - (-3) + 1 = 8$ elements.

Address of A[2] = $500 + (2 - (-3)) \times 4 = 500 + 5 \times 4 = 520$

Address of A[-1] = $500 + (-1 - (-3)) \times 4 = 500 + 2 \times 4 = 508$

2D Array β€” A[l₁ ... u₁][lβ‚‚ ... uβ‚‚]:

Let $R = u_1 - l_1 + 1$ (number of rows) and $C = u_2 - l_2 + 1$ (number of columns).

Row-Major:

\[\text{Address of } A[i][j] = B + \big[(i - l_1) \times C + (j - l_2)\big] \times w\]

Column-Major:

\[\text{Address of } A[i][j] = B + \big[(j - l_2) \times R + (i - l_1)\big] \times w\]

Example: int A[2 ... 5][3 ... 6] at base 1000, $w = 2$ bytes.

Here $l_1 = 2, u_1 = 5, l_2 = 3, u_2 = 6$, so $R = 4, C = 4$.

Row-Major: Address of A[3][5]:

\[= 1000 + \big[(3 - 2) \times 4 + (5 - 3)\big] \times 2 = 1000 + (4 + 2) \times 2 = 1012\]

Column-Major: Address of A[3][5]:

\[= 1000 + \big[(5 - 3) \times 4 + (3 - 2)\big] \times 2 = 1000 + (8 + 1) \times 2 = 1018\]

Example (negative bounds): A[-2 ... 2][0 ... 3] at base 400, $w = 4$ bytes.

$l_1 = -2, u_1 = 2, l_2 = 0, u_2 = 3 \Rightarrow R = 5, C = 4$, total = 20 elements.

Row-Major: Address of A[0][2]:

\[= 400 + \big[(0 - (-2)) \times 4 + (2 - 0)\big] \times 4 = 400 + (8 + 2) \times 4 = 440\]

Column-Major: Address of A[0][2]:

\[= 400 + \big[(2 - 0) \times 5 + (0 - (-2))\big] \times 4 = 400 + (10 + 2) \times 4 = 448\]

πŸ” Key Insight: The simplified formulas ($B + (i \times n + j) \times w$) are just special cases where $l_1 = 0$ and $l_2 = 0$. When you substitute $l_1 = l_2 = 0$ into the general formula, the subtraction terms vanish and you get the simplified version. Always use the general formula in exams unless the problem explicitly states 0-based indexing.

Quick Reference β€” All Address Formulas:

Array Type Formula
1D $A[l_1 \ldots u_1]$ $B + (i - l_1) \times w$
2D Row-Major $A[l_1 \ldots u_1][l_2 \ldots u_2]$ $B + [(i - l_1) \times C + (j - l_2)] \times w$
2D Column-Major $A[l_1 \ldots u_1][l_2 \ldots u_2]$ $B + [(j - l_2) \times R + (i - l_1)] \times w$

Where $R = u_1 - l_1 + 1$, $C = u_2 - l_2 + 1$, $w$ = size of each element.


1.4 Array Insertion and Deletion

Insertion at Position pos

// Insert 'value' at position 'pos' in array A of size n
void insert(int A[], int *n, int pos, int value) {
    // Step 1: Shift elements to the right
    for (int i = *n - 1; i >= pos; i--) {
        A[i + 1] = A[i];
    }
    // Step 2: Insert the new element
    A[pos] = value;
    // Step 3: Increase size
    (*n)++;
}

πŸ” Line-by-Line Logic:

  • int *n β€” We pass size as a pointer because insertion changes the array size; the caller needs to see the updated value.
  • for (int i = *n - 1; i >= pos; i--) β€” We shift right-to-left (from the last element toward pos). Why not left-to-right? Because shifting left-to-right would overwrite elements before we move them! Starting from the end ensures each element is safely copied to i+1 before being overwritten.
  • A[i + 1] = A[i] β€” Each element moves one position right, creating a gap at pos.
  • A[pos] = value β€” Once the gap is created, we simply place the new value there.
  • (*n)++ β€” Array now has one more element. The dereferenced pointer update ensures the caller sees the new size.

Time Complexity:

  • Best case (insert at end): $O(1)$
  • Worst case (insert at beginning): $O(n)$ β€” shift all elements
  • Average case: $O(n)$

Deletion at Position pos

// Delete element at position 'pos' from array A of size n
void delete(int A[], int *n, int pos) {
    // Step 1: Shift elements to the left
    for (int i = pos; i < *n - 1; i++) {
        A[i] = A[i + 1];
    }
    // Step 2: Decrease size
    (*n)--;
}

πŸ” Line-by-Line Logic:

  • for (int i = pos; i < *n - 1; i++) β€” We shift left-to-right (opposite of insertion!). Why? Because we’re filling the gap at pos by pulling elements leftward. Starting from pos ensures each position gets its replacement from i+1 before i+1 is itself overwritten.
  • i < *n - 1 β€” We stop at the second-to-last element because A[i] = A[i+1] would access out-of-bounds if i reached *n - 1.
  • A[i] = A[i + 1] β€” Each element slides one position left, overwriting the deleted position.
  • (*n)-- β€” Array shrinks by one. No need to clear the last position β€” it’s simply ignored since size decreased.
  • Notice the symmetry: Insertion shifts right-to-left, deletion shifts left-to-right. This pattern prevents data loss during shifting.

Time Complexity: Same as insertion β€” $O(n)$ worst case.

Example β€” Insert 25 at position 2:

Before: [10, 20, 30, 40, 50]    n = 5
                ↑ pos = 2

Shift:  [10, 20, __, 30, 40, 50]   (shift 30,40,50 right)
Insert: [10, 20, 25, 30, 40, 50]   n = 6

Example β€” Delete element at position 1:

Before: [10, 20, 30, 40, 50]    n = 5
             ↑ pos = 1

Shift:  [10, 30, 40, 50, __]   (shift 30,40,50 left)
After:  [10, 30, 40, 50]       n = 4

1.5 Matrix Operations Using Arrays

Matrix Addition

\[C[i][j] = A[i][j] + B[i][j]\]
void matrixAdd(int A[][MAX], int B[][MAX], int C[][MAX], int m, int n) {
    for (int i = 0; i < m; i++)
        for (int j = 0; j < n; j++)
            C[i][j] = A[i][j] + B[i][j];
}
// Time: O(m Γ— n)    Space: O(m Γ— n)

πŸ” Why This Logic: Matrix addition is element-wise β€” each C[i][j] depends only on A[i][j] and B[i][j] at the same position. The two nested loops simply visit every cell exactly once. Both matrices must have identical dimensions m Γ— n.

Matrix Multiplication

\[C[i][j] = \sum_{k=0}^{p-1} A[i][k] \times B[k][j]\]

For $A_{m \times p}$ and $B_{p \times n}$, result is $C_{m \times n}$:

void matrixMultiply(int A[][MAX], int B[][MAX], int C[][MAX], 
                    int m, int p, int n) {
    for (int i = 0; i < m; i++)
        for (int j = 0; j < n; j++) {
            C[i][j] = 0;
            for (int k = 0; k < p; k++)
                C[i][j] += A[i][k] * B[k][j];
        }
}
// Time: O(m Γ— n Γ— p)    For square matrices: O(nΒ³)

πŸ” Line-by-Line Logic:

  • C[i][j] = 0 β€” We must initialize before accumulating sums. Without this, C[i][j] might hold garbage values and the += would produce wrong results.
  • Why 3 nested loops? β€” Each C[i][j] is the dot product of row i of A and column j of B. The k loop computes this dot product: A[i][0]*B[0][j] + A[i][1]*B[1][j] + ... + A[i][p-1]*B[p-1][j].
  • k < p β€” p is the shared dimension (columns of A = rows of B). This is why matrix multiplication requires A.cols == B.rows.
  • Why A[i][k] * B[k][j]? β€” We walk across row i of A (varying k) and down column j of B (varying k). The shared index k links corresponding elements.

Matrix Transpose

\[B[j][i] = A[i][j]\]
void transpose(int A[][MAX], int B[][MAX], int m, int n) {
    for (int i = 0; i < m; i++)
        for (int j = 0; j < n; j++)
            B[j][i] = A[i][j];
}
// Time: O(m Γ— n)

πŸ” Why This Logic: Transpose mirrors a matrix across its main diagonal β€” row i becomes column i. So element at position (i, j) moves to (j, i). That’s why B[j][i] = A[i][j]. Note: if A is m Γ— n, then B must be n Γ— m β€” the dimensions swap. We use a separate output matrix B because doing it in-place is tricky for non-square matrices.

Special Matrix Storage β€” Triangular, Diagonal, Symmetric & Tridiagonal

For an $n \times n$ matrix, storing all $n^2$ elements wastes memory when only a subset are non-zero. Special matrices exploit structure to store only the required elements in a compact 1D array.


1. Lower Triangular Matrix

A matrix where all elements above the main diagonal are zero: $A[i][j] = 0$ for $j > i$.

| 1  0  0  0 |
| 2  3  0  0 |
| 4  5  6  0 |
| 7  8  9 10 |

Non-zero elements: Row $i$ has $i + 1$ elements (0-indexed). Total = $\sum_{i=0}^{n-1}(i+1) = \frac{n(n+1)}{2}$

Row-Major Mapping (store row by row into 1D array):

\[\text{index of } A[i][j] = \frac{i(i+1)}{2} + j \qquad (j \leq i)\]

πŸ” Why? Before row $i$, there are $1 + 2 + \cdots + i = \frac{i(i+1)}{2}$ elements. Within row $i$, element at column $j$ is at offset $j$.

Column-Major Mapping (store column by column):

\[\text{index of } A[i][j] = \frac{j(2n - j - 1)}{2} + (i - j) \qquad (j \leq i)\]

πŸ” Why? Column $j$ in a lower triangular matrix has elements from row $j$ to row $n-1$. Before column $j$, there are $n + (n-1) + \cdots + (n-j+1) = \frac{j(2n - j + 1)}{2}$ elements… simplified to the formula above when we count the offset $i - j$ within the column.

Example: $4 \times 4$ lower triangular matrix. Find index of $A[2][1]$ in row-major.

\[\text{index} = \frac{2(2+1)}{2} + 1 = 3 + 1 = 4\]

1D array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] β†’ arr[4] = 5 βœ”


2. Upper Triangular Matrix

A matrix where all elements below the main diagonal are zero: $A[i][j] = 0$ for $j < i$.

| 1  2  3  4 |
| 0  5  6  7 |
| 0  0  8  9 |
| 0  0  0 10 |

Non-zero elements: Row $i$ has $n - i$ elements (0-indexed). Total = $\frac{n(n+1)}{2}$

Row-Major Mapping:

\[\text{index of } A[i][j] = \frac{i(2n - i - 1)}{2} + (j - i) \qquad (j \geq i)\]

πŸ” Why? Before row $i$, there are $n + (n-1) + \cdots + (n-i+1) = \frac{i(2n - i + 1)}{2}$ elements… and within row $i$, the offset from the diagonal is $j - i$.

Column-Major Mapping:

\[\text{index of } A[i][j] = \frac{j(j+1)}{2} + i \qquad (j \geq i)\]

πŸ” Why? Column $j$ has elements from row $0$ to row $j$. Before column $j$, there are $1 + 2 + \cdots + j = \frac{j(j+1)}{2}$ elements. Within column $j$, element at row $i$ is at offset $i$.

Example: $4 \times 4$ upper triangular. Find index of $A[1][3]$ in row-major.

\[\text{index} = \frac{1(2 \times 4 - 1 - 1)}{2} + (3 - 1) = \frac{6}{2} + 2 = 3 + 2 = 5\]

1D array: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] β†’ arr[5] = 6 βœ” (wait, should be 7)

Actually, let’s verify: Row 0 has 4 elements (indices 0–3), Row 1 has 3 elements (indices 4–6). $A[1][3]$ is the last in row 1 β†’ index 6 β†’ value 7.

Using formula: $\frac{1(2 \times 4 - 1 - 1)}{2} + (3 - 1) = 3 + 2 = 5$. Hmm, let me recount.

Row 0: n elements = 4 β†’ indices 0,1,2,3 Row 1: n-1 elements = 3 β†’ indices 4,5,6

$A[1][1]=5$ at index 4, $A[1][2]=6$ at index 5, $A[1][3]=7$ at index 6.

Elements before row 1: $n = 4$. Offset within row 1: $j - i = 3 - 1 = 2$. Index = $4 + 2 = 6$. βœ”

The correct row-major formula is:

\[\text{index of } A[i][j] = in - \frac{i(i-1)}{2} + (j - i) \qquad (j \geq i)\]

3. Diagonal Matrix

A matrix where all elements outside the main diagonal are zero: $A[i][j] = 0$ for $i \neq j$.

| 5  0  0  0 |
| 0  3  0  0 |
| 0  0  7  0 |
| 0  0  0  2 |

Storage: Only $n$ elements needed. Mapping is trivial:

\[\text{index of } A[i][i] = i\]

πŸ” Only diagonal elements $A[0][0], A[1][1], \ldots, A[n-1][n-1]$ are stored. Any access $A[i][j]$ with $i \neq j$ returns 0 without accessing the array.


4. Symmetric Matrix

A matrix where $A[i][j] = A[j][i]$ for all $i, j$. Store only the lower triangular part (or upper) β€” the other half is a mirror.

Storage: Same as Lower Triangular = $\frac{n(n+1)}{2}$ elements.

\[\text{To access } A[i][j]: \begin{cases} \text{use LT formula if } i \geq j \\ \text{swap: access } A[j][i] \text{ if } i < j \end{cases}\]

5. Tridiagonal Matrix

A matrix where non-zero elements appear only on:

  • Main diagonal ($i = j$)
  • Diagonal above ($j = i + 1$)
  • Diagonal below ($j = i - 1$)
| 1  2  0  0  0 |
| 3  4  5  0  0 |
| 0  6  7  8  0 |
| 0  0  9 10 11 |
| 0  0  0 12 13 |

Non-zero elements: $3n - 2$ (for $n \geq 2$). The first and last rows have 2 elements each; all middle rows have 3.

Mapping to 1D array (row-major):

\[\text{index of } A[i][j] = 2i + j \qquad (\lvert i - j \rvert \leq 1)\]

πŸ” Why $2i + j$? Row 0 contributes 2 elements (at columns 0, 1). Each subsequent row $i$ starts at position $2i$ in the 1D array (2 elements per row for prior rows, minus the offset). The column within the band is $j$, giving offset $j$.

Alternatively (3-band mapping):

\[\text{index of } A[i][j] = 3i + (j - i + 1) - 1 = 2i + j\]

Example: $5 \times 5$ tridiagonal. Find index of $A[2][3]$ and $A[3][2]$.

$A[2][3]$: index = $2 \times 2 + 3 = 7$ β†’ value 8 βœ”

$A[3][2]$: index = $2 \times 3 + 2 = 8$ β†’ value 9 βœ”

Quick Reference β€” Special Matrix Storage:

Matrix Type Non-Zero Elements 1D Size Row-Major Index Formula
Lower Triangular $j \leq i$ $\frac{n(n+1)}{2}$ $\frac{i(i+1)}{2} + j$
Upper Triangular $j \geq i$ $\frac{n(n+1)}{2}$ $in - \frac{i(i-1)}{2} + (j - i)$
Diagonal $i = j$ $n$ $i$
Symmetric Store lower half $\frac{n(n+1)}{2}$ Same as Lower Triangular (swap if $i < j$)
Tridiagonal $\lvert i - j \rvert \leq 1$ $3n - 2$ $2i + j$

All indices are 0-based. For 1-based indexing, adjust formulas accordingly (e.g., Lower Triangular: $\frac{i(i-1)}{2} + j$).


1.6 Finding Minimum & Maximum β€” Tournament Method

Problem: Find both the minimum and maximum elements of an array of $n$ elements.

NaΓ―ve approach: Scan the array twice (once for min, once for max) β†’ $2(n-1)$ comparisons.

Better: Single scan tracking both β†’ $2(n-1)$ comparisons.

Optimal β€” Tournament Method (Divide & Conquer): Uses only $\lceil 3n/2 \rceil - 2$ comparisons.

NaΓ―ve Algorithm β€” Single Pass:

void findMinMax(int arr[], int n, int *min, int *max) {
    *min = *max = arr[0];
    for (int i = 1; i < n; i++) {
        if (arr[i] < *min) *min = arr[i];
        if (arr[i] > *max) *max = arr[i];
    }
}
// Comparisons: 2(n-1) β€” two comparisons per element

πŸ” Why 2(n-1)? Each element (except the first) is compared against both the current min and max β€” that’s 2 comparisons per element for $n-1$ elements.

Tournament Method (Pairwise Comparison):

The key insight: compare pairs of elements first, then compare the smaller with min and the larger with max. This saves comparisons.

void tournamentMinMax(int arr[], int n, int *min, int *max) {
    int i;
    
    // Handle odd/even n
    if (n % 2 == 0) {
        if (arr[0] < arr[1]) { *min = arr[0]; *max = arr[1]; }
        else                 { *min = arr[1]; *max = arr[0]; }
        i = 2;  // Start from index 2
    } else {
        *min = *max = arr[0];
        i = 1;  // Start from index 1
    }
    
    // Process pairs
    for (; i < n - 1; i += 2) {
        if (arr[i] < arr[i + 1]) {
            if (arr[i] < *min)     *min = arr[i];
            if (arr[i + 1] > *max) *max = arr[i + 1];
        } else {
            if (arr[i + 1] < *min) *min = arr[i + 1];
            if (arr[i] > *max)     *max = arr[i];
        }
    }
}
// Comparisons: ⌈3n/2βŒ‰ - 2

πŸ” Line-by-Line Logic:

  • Pair comparison first β€” Compare arr[i] with arr[i+1] (1 comparison). This tells us which is the candidate for min and which for max.
  • Then compare with running min/max β€” The smaller of the pair only needs to be compared with min (1 comparison). The larger only with max (1 comparison).
  • 3 comparisons per 2 elements β€” Instead of 4 comparisons per 2 elements (naΓ―ve). For $n$ elements: $\lceil 3n/2 \rceil - 2$ total.
  • Odd $n$ handling β€” If odd, initialize min and max with the first element alone, then process remaining elements in pairs.
  • Why is this optimal? β€” It can be proven that $\lceil 3n/2 \rceil - 2$ is the theoretical lower bound for simultaneous min-max finding. This algorithm achieves it.

Find min and max of [3, 5, 1, 8, 2, 7] ($n = 6$, even)

Step Pair Compare pair Compare with min/max min max
Init (3,5) 3 < 5 β€” 3 5
1 (1,8) 1 < 8 1 < 3 βœ”, 8 > 5 βœ” 1 8
2 (2,7) 2 < 7 2 < 1? No, 7 > 8? No 1 8

Result: min = 1, max = 8

Comparisons: 1 (init) + 3 + 3 = 7 = $\lceil 3 \times 6/2 \rceil - 2 = 9 - 2 = 7$ βœ”

NaΓ―ve would use $2(6-1) = 10$ comparisons.

Recursive Tournament (Divide & Conquer) Version:

typedef struct { int min; int max; } MinMax;

MinMax tournamentDC(int arr[], int low, int high) {
    MinMax result, left, right;
    
    if (low == high) {  // Single element
        result.min = result.max = arr[low];
        return result;
    }
    if (high == low + 1) {  // Two elements
        if (arr[low] < arr[high]) {
            result.min = arr[low]; result.max = arr[high];
        } else {
            result.min = arr[high]; result.max = arr[low];
        }
        return result;
    }
    
    int mid = (low + high) / 2;
    left = tournamentDC(arr, low, mid);
    right = tournamentDC(arr, mid + 1, high);
    
    result.min = (left.min < right.min) ? left.min : right.min;
    result.max = (left.max > right.max) ? left.max : right.max;
    return result;
}

πŸ” Why β€œTournament”? Think of it like a sports tournament β€” elements are paired off, winners (max candidates) advance up one bracket, losers (min candidates) advance up another. The overall min and max emerge from fewer total β€œmatches” (comparisons).


Practice Questions β€” Arrays

Q1. Define an array as an ADT. List all operations with their time complexities.

Q2. An array A[10] of integers starts at address 500. Each integer takes 4 bytes. Find the address of A[7].

Q3. A 2D array B[5][8] starts at base address 1000, element size 2 bytes. Find the address of B[3][5] in (a) Row-major (b) Column-major order.

Q4. Write a C function to insert an element at a given position in an array. Analyze its time and space complexity.

Q5. Write C code to multiply two matrices. What are the conditions for multiplication to be valid?

Q6. Explain the difference between Row-major and Column-major storage with diagrams.


2 β€” String Operations (Without Library Functions)


2.1 String Basics

In C, a string is an array of characters terminated by the null character '\0'.

char str[] = "Hello";
// Stored as: ['H', 'e', 'l', 'l', 'o', '\0']
// Size = 6 bytes (5 characters + 1 null terminator)

2.2 String Length

int stringLength(char str[]) {
    int len = 0;
    while (str[len] != '\0') {
        len++;
    }
    return len;
}
// Time: O(n)    Space: O(1)

πŸ” Why This Logic: In C, strings have no built-in length field β€” the only way to know where a string ends is to find the null terminator '\0'. So we start a counter at 0 and walk forward character by character. Each time we see a non-null character, we increment. When we hit '\0', the counter holds the length. This is exactly what strlen() does internally.


2.3 String Copy

void stringCopy(char dest[], char src[]) {
    int i = 0;
    while (src[i] != '\0') {
        dest[i] = src[i];
        i++;
    }
    dest[i] = '\0';  // Don't forget the null terminator!
}
// Time: O(n)    Space: O(1) (excluding destination)

πŸ” Line-by-Line Logic:

  • while (src[i] != '\0') β€” We copy characters one-by-one until we reach the source string’s end.
  • dest[i] = src[i] β€” Same index i for both arrays because we want a 1:1 copy.
  • dest[i] = '\0' β€” Critical! The while loop copies data characters but stops before copying '\0'. We must manually add it, otherwise dest won’t be a valid C string and any subsequent string operation on it would read garbage past the end.

2.4 String Concatenation

void stringConcat(char dest[], char src[]) {
    int i = 0, j = 0;
    // Find end of dest
    while (dest[i] != '\0') i++;
    // Copy src to end of dest
    while (src[j] != '\0') {
        dest[i] = src[j];
        i++;
        j++;
    }
    dest[i] = '\0';
}
// Time: O(m + n) where m = len(dest), n = len(src)

πŸ” Line-by-Line Logic:

  • Two-phase approach: First find where dest ends (phase 1), then append src there (phase 2).
  • while (dest[i] != '\0') i++; β€” This walks i to the null terminator position of dest. We need this because we must start writing src after the existing content.
  • Separate indices i and j β€” i tracks position in dest (continuing from end), j tracks position in src (starting from 0). They advance independently.
  • dest[i] = '\0' β€” Again, we must null-terminate the result. The concatenated string needs its own end marker.
  • Time is O(m+n) because we traverse dest once (m steps) then src once (n steps).

2.5 String Reverse

void stringReverse(char str[]) {
    int len = 0;
    while (str[len] != '\0') len++;
    
    int start = 0, end = len - 1;
    while (start < end) {
        // Swap characters
        char temp = str[start];
        str[start] = str[end];
        str[end] = temp;
        start++;
        end--;
    }
}
// Time: O(n)    Space: O(1)

πŸ” Line-by-Line Logic:

  • Two-pointer technique: start begins at the first character, end at the last. They move toward each other, swapping characters along the way.
  • while (start < end) β€” We stop when pointers meet/cross. If the string has odd length, the middle character stays in place (it’s its own mirror). If even, all characters get swapped.
  • Why not start <= end? Because when start == end, they point to the same character β€” swapping it with itself is pointless.
  • In-place reversal β€” We use only one temp variable for swapping, no extra array. This is the most space-efficient approach ($O(1)$).
  • Why find length first? We need end = len - 1 to know where the last character is. Alternatively, you could walk a pointer to the end, but computing length is cleaner.

2.6 String Compare

int stringCompare(char s1[], char s2[]) {
    int i = 0;
    while (s1[i] != '\0' && s2[i] != '\0') {
        if (s1[i] < s2[i]) return -1;   // s1 < s2
        if (s1[i] > s2[i]) return 1;    // s1 > s2
        i++;
    }
    // If we reach here, one or both strings ended
    if (s1[i] == '\0' && s2[i] == '\0') return 0;   // equal
    if (s1[i] == '\0') return -1;   // s1 is shorter
    return 1;                        // s2 is shorter
}
// Time: O(min(m, n))    Space: O(1)

πŸ” Line-by-Line Logic:

  • Lexicographic (dictionary) comparison β€” We compare strings character by character, just like how words are ordered in a dictionary.
  • while (s1[i] != '\0' && s2[i] != '\0') β€” We keep comparing as long as both strings still have characters. The && ensures we don’t read past either string’s end.
  • if (s1[i] < s2[i]) return -1 β€” Characters are compared by ASCII values. E.g., 'a'(97) < 'b'(98), so "apple" < "banana". The moment we find a difference, the result is known β€” no need to check further.
  • Post-loop checks β€” If the loop ends without finding a difference, the strings are identical up to the shorter one’s length. Then:
    • Both ended ('\0') β†’ they’re equal (return 0)
    • s1 ended first β†’ s1 is shorter, so s1 < s2 (e.g., "app" < "apple")
    • s2 ended first β†’ s2 is shorter, so s1 > s2

2.7 Palindrome Check

int isPalindrome(char str[]) {
    int len = 0;
    while (str[len] != '\0') len++;
    
    int start = 0, end = len - 1;
    while (start < end) {
        if (str[start] != str[end])
            return 0;  // Not a palindrome
        start++;
        end--;
    }
    return 1;  // Is a palindrome
}
// Time: O(n)    Space: O(1)

πŸ” Why This Logic:

  • Same two-pointer technique as stringReverse, but instead of swapping, we compare. A palindrome reads the same forwards and backwards, so str[0] must equal str[len-1], str[1] must equal str[len-2], etc.
  • Early exit on mismatch β€” The moment any pair doesn’t match, we immediately return 0. No need to check further pairs β€” one mismatch is enough to prove it’s not a palindrome.
  • We only check half the string β€” The start < end condition means we do at most n/2 comparisons. Checking the full string would be redundant since each comparison covers both a character and its mirror.

Example: "madam"

  • Compare str[0]='m' with str[4]='m' βœ”
  • Compare str[1]='a' with str[3]='a' βœ”
  • Compare str[2]='d' with str[2]='d' βœ”
  • Result: Palindrome!

Find if string pattern exists in string text:

int substringSearch(char text[], char pattern[]) {
    int n = 0, m = 0;
    while (text[n] != '\0') n++;      // length of text
    while (pattern[m] != '\0') m++;   // length of pattern
    
    for (int i = 0; i <= n - m; i++) {
        int j;
        for (j = 0; j < m; j++) {
            if (text[i + j] != pattern[j])
                break;
        }
        if (j == m)
            return i;  // Pattern found at index i
    }
    return -1;  // Not found
}
// Time: O(n Γ— m) worst case    Space: O(1)

πŸ” Line-by-Line Logic:

  • i <= n - m β€” We only try starting positions where the pattern can fully fit. If the text has 10 chars and pattern has 3, the last valid starting position is index 7 (positions 7, 8, 9). Starting at index 8 would go out of bounds.
  • The inner j loop β€” For each starting position i, compare text[i+0] with pattern[0], text[i+1] with pattern[1], etc. The offset i+j maps pattern positions to text positions.
  • break on mismatch β€” As soon as any character doesn’t match, there’s no point checking the rest of this alignment β€” move to next starting position.
  • if (j == m) β€” If j made it all the way to m (pattern length) without breaking, ALL characters matched β€” the pattern is found at position i. If j < m, the break triggered, meaning a mismatch occurred.
  • This is the β€œBrute Force” or β€œNaive” pattern matching algorithm. It’s $O(n \times m)$ in the worst case (e.g., text=”AAAAAB”, pattern=”AAB”). Faster algorithms like KMP achieve $O(n+m)$.

Practice Questions β€” Strings

Q1. Write C functions (without using library functions) for: String length, copy, concatenation, reverse, compare, palindrome check, and substring search.

Q2. What is the null character '\0'? Why is it important in C strings?

Q3. Write a function to count the number of vowels in a string.

Q4. Write a function to check if two strings are anagrams.

Q5. Explain the time complexity of each string operation you implemented.


3 β€” Searching


Linear Search (Sequential Search): Check each element of the array one by one from beginning to end until the target is found or the array is exhausted.

Algorithm

int linearSearch(int A[], int n, int key) {
    for (int i = 0; i < n; i++) {
        if (A[i] == key)
            return i;    // Found at index i
    }
    return -1;           // Not found
}

πŸ” Line-by-Line Logic:

  • for (int i = 0; i < n; i++) β€” Check every element from index 0 to n-1. There’s no shortcut because the array is unsorted β€” the key could be anywhere.
  • if (A[i] == key) return i β€” Return immediately upon finding the key. We don’t need to check further because the question is β€œdoes it exist and where?” Returning the index (not just 1) gives the caller the position.
  • return -1 β€” If the loop completes without finding the key, it’s not in the array. We use -1 as a sentinel value because valid indices are 0 to n-1, so -1 unambiguously means β€œnot found.”
  • No prerequisites β€” Unlike binary search, this works on unsorted, sorted, linked lists, anything. It’s the simplest search possible.

Analysis

Case When Comparisons Time
Best Key at index 0 1 $O(1)$
Worst Key at last index or absent $n$ $O(n)$
Average Key equally likely anywhere $\frac{n+1}{2}$ $O(n)$

Space Complexity: $O(1)$ β€” no extra space needed.

When to use Linear Search:

  • Array is unsorted (binary search won’t work)
  • Array is small (overhead of sorting not worth it)
  • You need to search only once (no point sorting for a single search)

Binary Search: For a sorted array, repeatedly divide the search interval in half. Compare the target with the middle element and eliminate half the remaining elements each step.

Algorithm (Iterative)

int binarySearch(int A[], int n, int key) {
    int low = 0, high = n - 1;
    
    while (low <= high) {
        int mid = low + (high - low) / 2;  // avoids overflow
        
        if (A[mid] == key)
            return mid;         // Found
        else if (A[mid] < key)
            low = mid + 1;     // Search right half
        else
            high = mid - 1;    // Search left half
    }
    return -1;                  // Not found
}

πŸ” Line-by-Line Logic:

  • low = 0, high = n - 1 β€” These define the search window. Initially, the entire array is our search space.
  • while (low <= high) β€” As long as there’s at least one element to check. When low > high, the search window is empty β€” key doesn’t exist.
  • mid = low + (high - low) / 2 β€” Why not (low + high) / 2? Because if low and high are both large (near INT_MAX), their sum can overflow! The formula low + (high - low)/2 gives the same result without overflow risk. This is a classic bug that even professional code had for decades.
  • if (A[mid] == key) β€” Found it! Return the index.
  • else if (A[mid] < key) β†’ low = mid + 1 β€” The key is larger than the middle element. Since the array is sorted, the key must be in the right half. We set low = mid + 1 (not mid) because we’ve already checked mid and it’s not the answer.
  • else β†’ high = mid - 1 β€” The key is smaller, so search the left half. Again, mid - 1 because mid is ruled out.
  • Each iteration halves the search space β€” That’s why it’s $O(\log n)$. For 1 million elements, at most 20 comparisons!

Algorithm (Recursive)

int binarySearchRec(int A[], int low, int high, int key) {
    if (low > high)
        return -1;             // Base case: not found
    
    int mid = low + (high - low) / 2;
    
    if (A[mid] == key)
        return mid;
    else if (A[mid] < key)
        return binarySearchRec(A, mid + 1, high, key);
    else
        return binarySearchRec(A, low, mid - 1, key);
}

πŸ” Why Recursive Version?

  • Same logic as iterative, but expressed as tail recursion β€” each call replaces the loop iteration.
  • if (low > high) return -1 β€” Base case replaces the while condition. When the search window is empty, we stop.
  • return binarySearchRec(A, mid + 1, high, key) β€” Instead of updating low = mid + 1 and looping, we make a new function call with the updated range.
  • Trade-off: The recursive version uses $O(\log n)$ stack space for the call frames, while the iterative version uses $O(1)$. In practice, iterative is preferred for this reason.
  • Both versions are correct β€” the recursive one is more β€œelegant” and maps directly to the mathematical definition: β€œsearch in [low, mid-1] or [mid+1, high].”

Dry Run Example

Search for key = 23 in sorted array:

A = [2, 5, 8, 12, 16, 23, 38, 56, 72, 91]
     0  1  2   3   4   5   6   7   8   9

Step 1: low=0, high=9, mid=4 β†’ A[4]=16 < 23 β†’ low=5
Step 2: low=5, high=9, mid=7 β†’ A[7]=56 > 23 β†’ high=6
Step 3: low=5, high=6, mid=5 β†’ A[5]=23 == 23 β†’ FOUND at index 5! βœ”

Only 3 comparisons vs potentially 6 for linear search!

Analysis

Case When Comparisons Time
Best Key is at the middle 1 $O(1)$
Worst Key not present $\lfloor \log_2 n \rfloor + 1$ $O(\log n)$
Average β€” $\approx \log_2 n$ $O(\log n)$

Space Complexity:

  • Iterative: $O(1)$
  • Recursive: $O(\log n)$ β€” due to recursion stack

PREREQUISITE: The array MUST be sorted for binary search to work. If the array is unsorted, you must sort it first (costing $O(n \log n)$).


Feature Linear Search Binary Search
Prerequisite None Array must be sorted
Best Case $O(1)$ $O(1)$
Worst Case $O(n)$ $O(\log n)$
Average Case $O(n)$ $O(\log n)$
Space $O(1)$ $O(1)$ iterative / $O(\log n)$ recursive
Data Structure Array or linked list Array only (needs random access)
Simplicity Very simple Slightly complex
For $n = 1,000,000$ Up to 1,000,000 comparisons At most 20 comparisons

Practice Questions β€” Searching

Q1. Write iterative and recursive versions of Binary Search. Trace both for the array [3, 7, 11, 15, 19, 23, 27] searching for key = 19.

Q2. Compare Linear Search and Binary Search in terms of prerequisites, time complexity, and use cases.

Q3. Why can’t Binary Search be used on a linked list efficiently?

Q4. For an array of 1 million elements, how many comparisons does Binary Search need in the worst case?

Q5. Modify Binary Search to return the first occurrence of a duplicate element.


4 β€” Sorting


4.1 Sorting Concepts β€” Stable & In-Place

Stable Sort: A sorting algorithm is stable if it preserves the relative order of records with equal keys.

In-Place Sort: A sorting algorithm is in-place if it uses only $O(1)$ (or $O(\log n)$ for recursive ones) extra memory.

Stability Example:

Input: [(Alice, 85), (Bob, 90), (Charlie, 85)]

Stable sort by marks: [(Alice, 85), (Charlie, 85), (Bob, 90)]
β†’ Alice appears before Charlie (same order as input) βœ”

Unstable sort might give: [(Charlie, 85), (Alice, 85), (Bob, 90)]
β†’ Charlie before Alice (order changed) βœ—


4.2 Bubble Sort

Idea: Repeatedly step through the array, compare adjacent elements, and swap them if they are in the wrong order. After each pass, the largest unsorted element β€œbubbles up” to its correct position.

Algorithm

void bubbleSort(int A[], int n) {
    for (int i = 0; i < n - 1; i++) {
        int swapped = 0;  // Optimization flag
        for (int j = 0; j < n - 1 - i; j++) {
            if (A[j] > A[j + 1]) {
                // Swap
                int temp = A[j];
                A[j] = A[j + 1];
                A[j + 1] = temp;
                swapped = 1;
            }
        }
        if (!swapped) break;  // Already sorted β€” early exit
    }
}

πŸ” Line-by-Line Logic:

  • i < n - 1 β€” We need at most n-1 passes. Why? After each pass, the largest unsorted element reaches its final position. After n-1 passes, n-1 elements are placed, so the last one is automatically correct.
  • j < n - 1 - i β€” This is the key optimization the user asked about! After pass i, the last i elements are already in their final sorted positions (they β€œbubbled up”). So we don’t need to compare them again. In pass 0, we check all n-1 pairs. In pass 1, the last element is sorted, so we check n-2 pairs. In pass i, we check n-1-i pairs. Without -i, the algorithm would still be correct but would do useless comparisons.
  • if (A[j] > A[j + 1]) β†’ swap β€” Compare adjacent elements. If the left one is larger, they’re in wrong order β€” swap them. This is what makes larger elements β€œbubble” rightward.
  • swapped flag β€” If we complete an entire pass without any swaps, the array is already sorted! The break skips remaining passes. This turns best-case (already sorted) from $O(n^2)$ to $O(n)$.
  • Why compare adjacent pairs? β€” Think of it like bubbles rising in water. Each pass pushes the biggest unsorted element to the end, one swap at a time. After pass 0: max is at position n-1. After pass 1: second-max at n-2. And so on.

Dry Run

Sort: [64, 34, 25, 12, 22, 11, 90]

Pass 1: Compare adjacent, swap if needed:

[64, 34, 25, 12, 22, 11, 90]
 64>34 β†’ swap β†’ [34, 64, 25, 12, 22, 11, 90]
 64>25 β†’ swap β†’ [34, 25, 64, 12, 22, 11, 90]
 64>12 β†’ swap β†’ [34, 25, 12, 64, 22, 11, 90]
 64>22 β†’ swap β†’ [34, 25, 12, 22, 64, 11, 90]
 64>11 β†’ swap β†’ [34, 25, 12, 22, 11, 64, 90]
 64<90 β†’ no swap
After Pass 1: [34, 25, 12, 22, 11, 64, 90]  ← 90 is in place

Continue passes until sorted…

Analysis

Case Comparisons Swaps Time
Best (already sorted, with flag) $n - 1$ 0 $O(n)$
Worst (reverse sorted) $\frac{n(n-1)}{2}$ $\frac{n(n-1)}{2}$ $O(n^2)$
Average $\frac{n(n-1)}{2}$ $\frac{n(n-1)}{4}$ $O(n^2)$

Space: $O(1)$ β€” In-place
Stable: Yes βœ” (equal elements are never swapped)


4.3 Selection Sort

Idea: Find the minimum element in the unsorted portion and swap it with the first unsorted element. Repeat for each position.

Algorithm

void selectionSort(int A[], int n) {
    for (int i = 0; i < n - 1; i++) {
        int minIdx = i;
        for (int j = i + 1; j < n; j++) {
            if (A[j] < A[minIdx])
                minIdx = j;
        }
        // Swap A[i] and A[minIdx]
        int temp = A[i];
        A[i] = A[minIdx];
        A[minIdx] = temp;
    }
}

πŸ” Line-by-Line Logic:

  • i < n - 1 β€” We place n-1 elements; the last one is automatically correct.
  • minIdx = i β€” We assume the current position holds the minimum. Then we scan the rest to see if there’s something smaller.
  • for (int j = i + 1; j < n; j++) β€” Search the unsorted portion (from i+1 to end) for the actual minimum. Why start at i+1? Because positions 0 to i-1 are already sorted from previous passes.
  • if (A[j] < A[minIdx]) minIdx = j β€” We only update the index, not swap yet. Why? Because we want to find the true minimum first, not swap multiple times. This is what makes selection sort do exactly n-1 swaps total β€” one swap per pass, unlike bubble sort which may swap many times per pass.
  • The swap after the inner loop β€” Once we know the minimum’s position, one swap puts it in place. This is why selection sort is preferred when write operations are expensive (e.g., writing to flash memory).
  • Why is it unstable? β€” The swap can move A[i] to a distant position, jumping over equal elements. Example: [5a, 5b, 3] β†’ swap 5a with 3 β†’ [3, 5b, 5a] β€” 5a and 5b changed relative order.

Dry Run

Sort: [64, 25, 12, 22, 11]

Pass 1: Find min in [64,25,12,22,11] β†’ 11 at idx 4. Swap with idx 0.
        [11, 25, 12, 22, 64]

Pass 2: Find min in [25,12,22,64] β†’ 12 at idx 2. Swap with idx 1.
        [11, 12, 25, 22, 64]

Pass 3: Find min in [25,22,64] β†’ 22 at idx 3. Swap with idx 2.
        [11, 12, 22, 25, 64]

Pass 4: Find min in [25,64] β†’ 25 at idx 3. No swap needed.
        [11, 12, 22, 25, 64]  βœ” SORTED

Analysis

Case Comparisons Swaps Time
All cases $\frac{n(n-1)}{2}$ $n - 1$ $O(n^2)$

Space: $O(1)$ β€” In-place
Stable: No βœ— (swapping can change relative order of equal elements)

Selection Sort does fewer swaps than Bubble Sort β€” exactly $n-1$ swaps vs up to $\frac{n(n-1)}{2}$. This makes it better when write operations are expensive.


4.4 Insertion Sort

Idea: Build the sorted array one element at a time. Take each element and insert it into its correct position in the already-sorted left portion, shifting elements as needed.

Algorithm

void insertionSort(int A[], int n) {
    for (int i = 1; i < n; i++) {
        int key = A[i];
        int j = i - 1;
        // Shift elements greater than key to the right
        while (j >= 0 && A[j] > key) {
            A[j + 1] = A[j];
            j--;
        }
        A[j + 1] = key;
    }
}

πŸ” Line-by-Line Logic:

  • i = 1 β€” We start from the second element because a single element (index 0) is trivially sorted. The sorted portion initially is just A[0].
  • key = A[i] β€” We save the element to be inserted. This is crucial because the shifting in the while loop will overwrite A[i].
  • j = i - 1 β€” Start comparing from the rightmost element of the sorted portion (just left of key).
  • while (j >= 0 && A[j] > key) β€” Two conditions:
    • j >= 0 β€” Don’t go past the beginning of the array (bounds check).
    • A[j] > key β€” Keep shifting as long as elements are larger than key. Equal elements are NOT shifted (> not >=), which is why insertion sort is stable.
  • A[j + 1] = A[j] β€” Shift element one position right to make room. We don’t need a swap β€” just overwrite A[j+1] because we saved the original value in key.
  • j-- β€” Move left to check the next element.
  • A[j + 1] = key β€” When the while loop ends, j is either -1 (key is smallest) or A[j] <= key. Either way, j+1 is the correct insertion position.
  • Think of it like sorting a hand of cards β€” You pick up one card at a time and slide it into the correct position among the cards you’re already holding.

Dry Run

Sort: [12, 11, 13, 5, 6]

i=1: key=11, compare with 12. 12>11 β†’ shift. Insert 11 at pos 0.
     [11, 12, 13, 5, 6]

i=2: key=13, compare with 12. 12<13 β†’ stop. Already in place.
     [11, 12, 13, 5, 6]

i=3: key=5, compare with 13β†’shift, 12β†’shift, 11β†’shift. Insert at pos 0.
     [5, 11, 12, 13, 6]

i=4: key=6, compare with 13β†’shift, 12β†’shift, 11β†’shift. Insert at pos 1.
     [5, 6, 11, 12, 13]  βœ” SORTED

Analysis

Case Comparisons Shifts Time
Best (already sorted) $n - 1$ 0 $O(n)$
Worst (reverse sorted) $\frac{n(n-1)}{2}$ $\frac{n(n-1)}{2}$ $O(n^2)$
Average $\frac{n(n-1)}{4}$ $\frac{n(n-1)}{4}$ $O(n^2)$

Space: $O(1)$ β€” In-place
Stable: Yes βœ”

Insertion Sort is excellent for:

  • Nearly sorted arrays (best case $O(n)$!)
  • Small arrays ($n < 50$)
  • Online sorting (can sort data as it arrives)

Many practical sorting implementations (e.g., TimSort in Python/Java) use Insertion Sort for small sub-arrays.


4.5 Merge Sort

Idea: Divide and Conquer β€” divide the array into two halves, recursively sort each half, then merge the two sorted halves into one sorted array.

Algorithm

void merge(int A[], int left, int mid, int right) {
    int n1 = mid - left + 1;   // size of left half
    int n2 = right - mid;       // size of right half
    
    // Create temp arrays
    int L[n1], R[n2];
    for (int i = 0; i < n1; i++) L[i] = A[left + i];
    for (int j = 0; j < n2; j++) R[j] = A[mid + 1 + j];
    
    // Merge back into A[left..right]
    int i = 0, j = 0, k = left;
    while (i < n1 && j < n2) {
        if (L[i] <= R[j])       // <= ensures stability
            A[k++] = L[i++];
        else
            A[k++] = R[j++];
    }
    // Copy remaining elements
    while (i < n1) A[k++] = L[i++];
    while (j < n2) A[k++] = R[j++];
}

void mergeSort(int A[], int left, int right) {
    if (left < right) {
        int mid = left + (right - left) / 2;
        mergeSort(A, left, mid);        // Sort left half
        mergeSort(A, mid + 1, right);   // Sort right half
        merge(A, left, mid, right);     // Merge sorted halves
    }
}

πŸ” Line-by-Line Logic β€” merge():

  • n1 = mid - left + 1 β€” Size of left half [left..mid]. The +1 is because both endpoints are inclusive.
  • n2 = right - mid β€” Size of right half [mid+1..right]. No +1 needed because right - (mid+1) + 1 = right - mid.
  • Why create temp arrays L[] and R[]? β€” We’re about to overwrite A[left..right] during merging. If we didn’t copy the data first, we’d lose the original values. This is why merge sort needs $O(n)$ extra space.
  • if (L[i] <= R[j]) β€” The <= is critical for stability! When elements are equal, we pick from the left array first. Since left-array elements appeared first in the original array, their relative order is preserved. Using < instead would make it unstable.
  • The main while loop β€” Two-pointer merge: compare the front elements of both halves, take the smaller one. This works because both halves are already sorted (guaranteed by recursion). Think of merging two sorted decks of cards.
  • Remaining-element loops β€” One array might run out before the other. The remaining elements are already sorted and all larger than what’s been placed, so just copy them directly.

Line-by-Line Logic β€” mergeSort():

  • if (left < right) β€” Base case: if left >= right, the sub-array has 0 or 1 element, which is inherently sorted. No work needed.
  • mid = left + (right - left) / 2 β€” Same overflow-safe midpoint formula as binary search.
  • Divide β†’ Conquer β†’ Combine β€” First sort left half, then right half, then merge them. The recursion goes deeper and deeper until we hit single elements, then merges build the sorted result bottom-up.

Dry Run

Sort: [38, 27, 43, 3, 9, 82, 10]

                    [38, 27, 43, 3, 9, 82, 10]
                   /                           \
          [38, 27, 43, 3]              [9, 82, 10]
          /             \              /          \
      [38, 27]      [43, 3]       [9, 82]      [10]
      /     \       /    \        /     \         |
    [38]  [27]   [43]   [3]    [9]   [82]      [10]
      \    /       \    /        \    /           |
     [27, 38]    [3, 43]       [9, 82]         [10]
         \          /              \            /
      [3, 27, 38, 43]          [9, 10, 82]
              \                    /
        [3, 9, 10, 27, 38, 43, 82]  βœ” SORTED

Analysis

Case Time Reason
Best $O(n \log n)$ Always divides and merges
Worst $O(n \log n)$ Same β€” not input-dependent
Average $O(n \log n)$ Same

Recurrence: $T(n) = 2T(n/2) + O(n)$ β†’ By Master Theorem: $O(n \log n)$

Space: $O(n)$ β€” needs auxiliary arrays for merging
Stable: Yes βœ”
In-Place: No βœ— (needs $O(n)$ extra space)


4.6 Quick Sort

Idea: Divide and Conquer β€” choose a pivot element, partition the array so all elements smaller than pivot are on the left and all larger are on the right, then recursively sort each partition.

Algorithm (Lomuto Partition Scheme)

int partition(int A[], int low, int high) {
    int pivot = A[high];  // Choose last element as pivot
    int i = low - 1;      // Index of smaller element
    
    for (int j = low; j < high; j++) {
        if (A[j] <= pivot) {
            i++;
            // Swap A[i] and A[j]
            int temp = A[i];
            A[i] = A[j];
            A[j] = temp;
        }
    }
    // Place pivot in correct position
    int temp = A[i + 1];
    A[i + 1] = A[high];
    A[high] = temp;
    
    return i + 1;  // Return pivot index
}

void quickSort(int A[], int low, int high) {
    if (low < high) {
        int pi = partition(A, low, high);
        quickSort(A, low, pi - 1);    // Sort left of pivot
        quickSort(A, pi + 1, high);   // Sort right of pivot
    }
}

πŸ” Line-by-Line Logic β€” partition() (Lomuto Scheme):

  • pivot = A[high] β€” We choose the last element as pivot. Other strategies (first, random, median-of-three) exist, but last-element is simplest to implement.
  • i = low - 1 β€” i tracks the boundary between β€œβ‰€ pivot” elements and β€œ> pivot” elements. It starts before low because no elements have been classified yet. Everything to the left of i (inclusive) is ≀ pivot.
  • for (int j = low; j < high; j++) β€” j scans through all elements (except pivot itself). For each element, we decide: is it ≀ pivot or > pivot?
  • if (A[j] <= pivot) { i++; swap(A[i], A[j]); } β€” Found an element that belongs in the β€œsmall” section. Increment i to extend the small section, then swap A[j] into position i. Elements between i+1 and j-1 are all > pivot.
  • Final swap: A[i+1] ↔ A[high] β€” Place the pivot between the two partitions. Position i+1 is the first element > pivot, so swapping puts pivot at the exact boundary. Everything left is ≀ pivot, everything right is > pivot.
  • return i + 1 β€” The pivot’s final index. It will never move again.

Why Lomuto over Hoare? Lomuto is easier to understand and implement. Hoare’s partition is slightly more efficient (fewer swaps on average) but harder to code correctly.

Why worst case is $O(n^2)$: If array is already sorted and we pick the last element as pivot, the pivot is always the largest. Every partition produces sub-arrays of size n-1 and 0. That gives $T(n) = T(n-1) + O(n) = O(n^2)$.

Dry Run

Sort: [10, 80, 30, 90, 40, 50, 70], pivot = 70 (last element)

Partition around pivot 70:
  Elements ≀ 70: [10, 30, 40, 50]
  Pivot: [70]
  Elements > 70: [80, 90]
  
  Result: [10, 30, 40, 50, 70, 80, 90]
                            ↑
                     Pivot in final position

Recursively sort [10, 30, 40, 50] and [80, 90]
β†’ [10, 30, 40, 50, 70, 80, 90]  βœ” SORTED

Analysis

Case When Time
Best Pivot always splits array into two equal halves $O(n \log n)$
Worst Pivot is always the smallest or largest element (already sorted input!) $O(n^2)$
Average Random pivots $O(n \log n)$

Space: $O(\log n)$ average (recursion stack), $O(n)$ worst case
Stable: No βœ—
In-Place: Yes βœ” (with recursion stack overhead)

Quick Sort’s worst case is $O(n^2)$! This happens when the array is already sorted (or reverse sorted) and the pivot is always the first/last element. Solution: Use randomized pivot selection or median-of-three.

Quick Sort vs Merge Sort:

  • Quick Sort is faster in practice (better cache locality, smaller constant factor)
  • Merge Sort guarantees $O(n \log n)$ always
  • Quick Sort is in-place ($O(\log n)$ space); Merge Sort needs $O(n)$ extra space

4.7 Heap Sort

Idea: Build a Max-Heap from the array, then repeatedly extract the maximum element (root) and place it at the end of the array.

Max-Heap Property: Every parent node is β‰₯ its children.

Heap Basics (Array Representation)

For a node at index $i$ (0-indexed):

  • Left child: $2i + 1$
  • Right child: $2i + 2$
  • Parent: $\lfloor(i - 1) / 2\rfloor$

Algorithm

void heapify(int A[], int n, int i) {
    int largest = i;
    int left = 2 * i + 1;
    int right = 2 * i + 2;
    
    if (left < n && A[left] > A[largest])
        largest = left;
    if (right < n && A[right] > A[largest])
        largest = right;
    
    if (largest != i) {
        int temp = A[i];
        A[i] = A[largest];
        A[largest] = temp;
        heapify(A, n, largest);  // Recursively fix subtree
    }
}

void heapSort(int A[], int n) {
    // Step 1: Build Max-Heap (from last non-leaf to root)
    for (int i = n / 2 - 1; i >= 0; i--)
        heapify(A, n, i);
    
    // Step 2: Extract max one by one
    for (int i = n - 1; i > 0; i--) {
        // Move current root (max) to end
        int temp = A[0];
        A[0] = A[i];
        A[i] = temp;
        // Heapify reduced heap
        heapify(A, i, 0);
    }
}

πŸ” Line-by-Line Logic β€” heapify():

  • largest = i β€” Assume the current node is the largest among itself and its children.
  • left = 2*i + 1, right = 2*i + 2 β€” These formulas come from the array representation of a complete binary tree. For 0-indexed arrays, the children of node i are at 2i+1 and 2i+2.
  • if (left < n && A[left] > A[largest]) β€” Two checks: (1) does the left child exist? (left < n), and (2) is it larger than the current largest? The bounds check prevents array out-of-bounds.
  • if (largest != i) β†’ swap and recurse β€” If a child was larger, swap parent with that child to fix the violation. But now the child’s subtree might be violated, so we recursively heapify downward. This β€œsifting down” continues until the element finds its correct position.

Line-by-Line Logic β€” heapSort():

  • for (int i = n/2 - 1; i >= 0; i--) β€” Why start at n/2 - 1? Because nodes at indices n/2 to n-1 are leaf nodes β€” they have no children, so they trivially satisfy the heap property. We only need to heapify internal nodes. Starting from the last non-leaf and working up ensures that when we heapify node i, its subtrees are already valid heaps.
  • Building the heap is $O(n)$, not $O(n \log n)$! β€” This is a counter-intuitive result. Most nodes are near the bottom (leaves) and need little work. A mathematical proof shows the total work is $O(n)$.
  • A[0] ↔ A[i] β€” The root (maximum element) is swapped to position i (end of unsorted portion). Now A[i] is in its final position.
  • heapify(A, i, 0) β€” The element swapped to root likely violates the heap property. Heapify fixes it. We pass i as the size (not n) because elements after index i are already sorted and excluded from the heap.
  • Why is it $O(n \log n)$? β€” Building heap is $O(n)$, but extracting n elements costs $O(\log n)$ each = $O(n \log n)$ total.

Analysis

Case Time
Best $O(n \log n)$
Worst $O(n \log n)$
Average $O(n \log n)$

Building the heap: $O(n)$
$n$ extractions Γ— heapify: $O(n \log n)$

Space: $O(1)$ β€” In-place βœ”
Stable: No βœ—


4.8 Counting Sort (Linear Time)

Idea: Count the occurrences of each distinct value, then reconstruct the sorted array from the counts. Works only when values are in a known, limited range $[0, k]$.

Algorithm

void countingSort(int A[], int n, int k) {
    int count[k + 1];  // count array for values 0 to k
    int output[n];
    
    // Initialize count array
    for (int i = 0; i <= k; i++) count[i] = 0;
    
    // Count occurrences
    for (int i = 0; i < n; i++) count[A[i]]++;
    
    // Cumulative count (prefix sum)
    for (int i = 1; i <= k; i++) count[i] += count[i - 1];
    
    // Build output array (traverse input in reverse for stability)
    for (int i = n - 1; i >= 0; i--) {
        output[count[A[i]] - 1] = A[i];
        count[A[i]]--;
    }
    
    // Copy output back to A
    for (int i = 0; i < n; i++) A[i] = output[i];
}

πŸ” Line-by-Line Logic:

  • int count[k + 1] β€” We need one counter for each possible value from 0 to k. If max value is 8, we need indices 0–8 = 9 slots.
  • Phase 1 β€” Count occurrences: count[A[i]]++ tallies how many times each value appears. After this, count[v] = number of elements equal to v.
  • Phase 2 β€” Cumulative count (prefix sum): count[i] += count[i - 1] transforms the count array so that count[v] now tells us how many elements are ≀ v. This directly gives us the position where value v should end up in the sorted output.
  • Phase 3 β€” Build output (right-to-left):
    • for (int i = n - 1; i >= 0; i--) β€” Why traverse in reverse? This is the key to stability! When two elements have the same value, the one appearing later in the input (higher index) gets placed at a later position in the output. If we went left-to-right, equal elements would end up in reversed order.
    • output[count[A[i]] - 1] = A[i] β€” count[A[i]] tells us how many elements are ≀ A[i]. So position count[A[i]] - 1 (0-indexed) is where this element belongs.
    • count[A[i]]-- β€” Decrement so the next element with the same value goes one position earlier.
  • This is NOT a comparison-based sort β€” It never compares two elements against each other. That’s how it beats the $\Omega(n \log n)$ lower bound for comparison sorts. The trade-off is it only works for integers in a known range.

Dry Run

Sort: [4, 2, 2, 8, 3, 3, 1], range $k = 8$

Count:       [0, 1, 2, 2, 1, 0, 0, 0, 1]   (idx 0-8)
Cumulative:  [0, 1, 3, 5, 6, 6, 6, 6, 7]

Build output (right to left for stability):
  A[6]=1 β†’ output[0]=1, count[1]=0
  A[5]=3 β†’ output[4]=3, count[3]=4
  A[4]=3 β†’ output[3]=3, count[3]=3
  A[3]=8 β†’ output[6]=8, count[8]=6
  A[2]=2 β†’ output[2]=2, count[2]=2
  A[1]=2 β†’ output[1]=2, count[2]=1
  A[0]=4 β†’ output[5]=4, count[4]=5

Output: [1, 2, 2, 3, 3, 4, 8]  βœ” SORTED

Analysis

Aspect Value
Time $O(n + k)$ where $k$ = range of values
Space $O(n + k)$
Stable Yes βœ”
In-Place No βœ—

When NOT to use Counting Sort:

  • When $k$ (range) is much larger than $n$ β€” e.g., sorting 10 numbers in range $[0, 10^9]$ wastes huge memory
  • When values are negative (needs modification) or floating-point

4.9 Sorting Algorithms β€” Master Comparison

Algorithm Best Average Worst Space Stable In-Place
Bubble Sort $O(n)$ $O(n^2)$ $O(n^2)$ $O(1)$ βœ” βœ”
Selection Sort $O(n^2)$ $O(n^2)$ $O(n^2)$ $O(1)$ βœ— βœ”
Insertion Sort $O(n)$ $O(n^2)$ $O(n^2)$ $O(1)$ βœ” βœ”
Merge Sort $O(n \log n)$ $O(n \log n)$ $O(n \log n)$ $O(n)$ βœ” βœ—
Quick Sort $O(n \log n)$ $O(n \log n)$ $O(n^2)$ $O(\log n)$ βœ— βœ”
Heap Sort $O(n \log n)$ $O(n \log n)$ $O(n \log n)$ $O(1)$ βœ— βœ”
Counting Sort $O(n+k)$ $O(n+k)$ $O(n+k)$ $O(n+k)$ βœ” βœ—

Quick Memory Aid:

  • $O(n^2)$ sorts (simple): Bubble, Selection, Insertion β€” easy to code, slow for large $n$
  • $O(n \log n)$ sorts (efficient): Merge, Quick, Heap β€” work well for large $n$
  • $O(n+k)$ sorts (linear): Counting sort β€” fastest when $k$ is small
  • Only Merge Sort is both stable AND $O(n \log n)$ guaranteed, but costs $O(n)$ extra space

Practice Questions β€” Sorting

Q1. Sort the array [5, 2, 8, 1, 9, 3] using (a) Bubble Sort (b) Selection Sort (c) Insertion Sort. Show all passes.

Q2. Trace Merge Sort on the array [12, 11, 13, 5, 6, 7]. Show the divide and merge steps.

Q3. Trace Quick Sort on [10, 7, 8, 9, 1, 5] using the last element as pivot. Show each partition step.

Q4. Build a Max-Heap from the array [4, 10, 3, 5, 1]. Then perform Heap Sort step by step.

Q5. Sort [1, 4, 1, 2, 7, 5, 2] using Counting Sort. Show the count array and output.

Q6. What is a stable sort? Give an example showing why stability matters.

Q7. Why is Quick Sort preferred over Merge Sort in practice despite having $O(n^2)$ worst case?

Q8. Can Bubble Sort’s worst case be improved from $O(n^2)$? Explain the optimization using a swap flag.

Q9. Which sorting algorithms are suitable for nearly sorted data? Explain why.

Q10. Compare all sorting algorithms in a table. Under what conditions would you choose each?

Q11. Prove that any comparison-based sorting algorithm has a lower bound of $\Omega(n \log n)$.

Q12. Write C code for Merge Sort and trace its execution for [38, 27, 43, 3].


Comprehensive Unit II Practice Set

Short Answer

1. Define sequential organization. Why do arrays support random access?

2. What is the address formula for A[i][j] in row-major and column-major order?

3. Write C functions for string length and palindrome check without using library functions.

4. What is the difference between stable and unstable sorting?

5. Why is Binary Search $O(\log n)$? Prove it.

Tracing & Analytical

6. An integer array A[4][6] starts at address 2000. Element size = 4 bytes. Find address of A[2][3] in row-major and column-major.

7. Sort [29, 10, 14, 37, 13] using each of the 6 sorting algorithms. Show all intermediate steps.

8. Given an array of $n$ elements where only one element is out of place. Which sort would you choose and why?

9. Implement Binary Search recursively. What is its space complexity? How does it differ from iterative version?

10. A company needs to sort 10 million records. The data has a limited range of keys (0 to 999). Which sorting algorithm would you recommend and why?

Long Answer

11. Compare Merge Sort and Quick Sort comprehensively β€” algorithm, dry run, time/space complexity, stability, and use cases.

12. Explain Heap Sort completely β€” what is a heap, how to build a max-heap, how the sorting works, with a full example. Analyze its complexity.

13. Explain Counting Sort. Why is it $O(n + k)$? When is it better than comparison-based sorts? When is it impractical?


πŸ“ View Complete Solutions for Unit II β†’

← Back to DSA Index