Unit II β Complete Solutions to All Practice Questions
Table of Contents
Practice Questions β Arrays
Q1. Define an array as an ADT. List all operations with their time complexities.
Array as an Abstract Data Type (ADT):
An Array ADT is a fixed-size, ordered collection of elements of the same data type, where each element is identified by a numerical index.
ADT Array:
Data:
A fixed-size collection of n elements of identical type
Each element is accessed by an index (0 to n-1)
Operations:
create(size) β Allocate array of given size β O(1)
get(index) β Return element at given index β O(1)
set(index, value) β Update element at given index β O(1)
insert(index, value) β Insert value at index (shift right) β O(n)
delete(index) β Remove element at index (shift left) β O(n)
search(value) β Find index of given value β O(n) [linear]
length() β Return current number of elements β O(1)
display() β Print all elements β O(n)
Error Conditions:
get/set with invalid index β "Index out of bounds"
insert on full array β "Array overflow"
delete on empty array β "Array underflow"
Key time complexities explained:
| Operation | Time | Reason |
|---|---|---|
get(i) / set(i, v) |
$O(1)$ | Address computed directly: $B + i \times w$ |
insert(i, v) |
$O(n)$ | Must shift elements right to create a gap |
delete(i) |
$O(n)$ | Must shift elements left to fill the gap |
search(v) |
$O(n)$ | Worst case: scan entire array (linear search) |
The defining characteristic of an array is random access β any element can be accessed in constant time using its index, because elements are stored in contiguous memory.
Q2. An array A[10] of integers starts at address 500. Each integer takes 4 bytes. Find the address of A[7].
Given:
- Base address $B = 500$
- Element size $w = 4$ bytes (integer)
- Index $i = 7$
- Lower bound $l = 0$ (C-style, 0-based indexing)
Formula (1D array):
\[\text{Address of } A[i] = B + (i - l) \times w\]Calculation:
\[\text{Address of } A[7] = 500 + (7 - 0) \times 4 = 500 + 28 = \boxed{528}\]Verification by enumeration:
| Element | Address |
|---|---|
| A[0] | 500 |
| A[1] | 504 |
| A[2] | 508 |
| A[3] | 512 |
| A[4] | 516 |
| A[5] | 520 |
| A[6] | 524 |
| A[7] | 528 β |
| A[8] | 532 |
| A[9] | 536 |
Q3. A 2D array B[5][8] starts at base address 1000, element size 2 bytes. Find the address of B[3][5] in (a) Row-major (b) Column-major order.
Given:
- Array: $B[5][8]$ β $m = 5$ rows, $n = 8$ columns
- Base address $B_0 = 1000$
- Element size $w = 2$ bytes
- Target: $B[3][5]$ β $i = 3$, $j = 5$
- Indexing starts at 0 (so $l_1 = 0, l_2 = 0$)
(a) Row-Major Order
In row-major, elements are stored row by row.
Formula:
\[\text{Address} = B_0 + (i \times n + j) \times w\]Calculation:
\[= 1000 + (3 \times 8 + 5) \times 2\] \[= 1000 + (24 + 5) \times 2\] \[= 1000 + 29 \times 2\] \[= 1000 + 58\] \[= \boxed{1058}\]Explanation: Before row 3, there are $3 \times 8 = 24$ elements (rows 0, 1, 2 each have 8 elements). Within row 3, element at column 5 is the 6th element (index 5). Total offset = $(24 + 5) \times 2 = 58$ bytes.
(b) Column-Major Order
In column-major, elements are stored column by column.
Formula:
\[\text{Address} = B_0 + (j \times m + i) \times w\]Calculation:
\[= 1000 + (5 \times 5 + 3) \times 2\] \[= 1000 + (25 + 3) \times 2\] \[= 1000 + 28 \times 2\] \[= 1000 + 56\] \[= \boxed{1056}\]Explanation: Before column 5, there are $5 \times 5 = 25$ elements (columns 0β4 each have 5 elements). Within column 5, element at row 3 is the 4th element (index 3). Total offset = $(25 + 3) \times 2 = 56$ bytes.
Q4. Write a C function to insert an element at a given position in an array. Analyze its time and space complexity.
C Code:
#include <stdio.h>
#define MAX_SIZE 100
// Insert 'value' at position 'pos' in array A of current size n
// Returns 1 on success, 0 on failure
int insertElement(int A[], int *n, int pos, int value) {
// Validate position
if (pos < 0 || pos > *n || *n >= MAX_SIZE) {
printf("Invalid position or array is full.\n");
return 0;
}
// Step 1: Shift elements from position pos to end, one position to the right
for (int i = *n - 1; i >= pos; i--) {
A[i + 1] = A[i];
}
// Step 2: Insert the new element at position pos
A[pos] = value;
// Step 3: Increment the size
(*n)++;
return 1;
}
// Driver
int main() {
int A[MAX_SIZE] = {10, 20, 30, 40, 50};
int n = 5;
printf("Before insertion: ");
for (int i = 0; i < n; i++) printf("%d ", A[i]);
printf("\n");
insertElement(A, &n, 2, 25); // Insert 25 at position 2
printf("After insertion: ");
for (int i = 0; i < n; i++) printf("%d ", A[i]);
printf("\n");
return 0;
}
Output:
Before insertion: 10 20 30 40 50
After insertion: 10 20 25 30 40 50
Step-by-step trace (inserting 25 at position 2):
Original: [10, 20, 30, 40, 50] n = 5
β pos = 2
i=4: A[5] = A[4] β [10, 20, 30, 40, 50, 50]
i=3: A[4] = A[3] β [10, 20, 30, 40, 40, 50]
i=2: A[3] = A[2] β [10, 20, 30, 30, 40, 50]
Insert: A[2] = 25 β [10, 20, 25, 30, 40, 50] n = 6
Complexity Analysis:
| Aspect | Complexity | Explanation |
|---|---|---|
| Best Case Time | $O(1)$ | Insert at the end (pos = n): no shifting needed |
| Worst Case Time | $O(n)$ | Insert at position 0: all $n$ elements must shift |
| Average Case Time | $O(n)$ | On average, $n/2$ elements shift |
| Space | $O(1)$ | Only uses a fixed number of extra variables (i, temp) |
Why shifting must go right-to-left: If we shifted left-to-right, A[pos+1] = A[pos] would overwrite A[pos+1] before we moved it, losing data. Going from the end backwards ensures each element is safely copied before being overwritten.
Q5. Write C code to multiply two matrices. What are the conditions for multiplication to be valid?
Condition for valid multiplication:
For matrices $A_{m \times p}$ and $B_{p \times n}$, multiplication is valid only if the number of columns of A equals the number of rows of B (i.e., both are $p$). The result is a matrix $C_{m \times n}$.
\[C[i][j] = \sum_{k=0}^{p-1} A[i][k] \times B[k][j]\]C Code:
#include <stdio.h>
#define MAX 10
void matrixMultiply(int A[][MAX], int B[][MAX], int C[][MAX],
int m, int p, int n) {
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
C[i][j] = 0; // Initialize accumulator
for (int k = 0; k < p; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
void printMatrix(int M[][MAX], int rows, int cols) {
for (int i = 0; i < rows; i++) {
for (int j = 0; j < cols; j++)
printf("%4d ", M[i][j]);
printf("\n");
}
}
int main() {
int A[MAX][MAX] = { {1, 2, 3},
{4, 5, 6} }; // 2Γ3
int B[MAX][MAX] = { {7, 8},
{9, 10},
{11, 12} }; // 3Γ2
int C[MAX][MAX];
int m = 2, p = 3, n = 2; // A is 2Γ3, B is 3Γ2
matrixMultiply(A, B, C, m, p, n);
printf("Result (2x2):\n");
printMatrix(C, m, n);
return 0;
}
Output:
Result (2x2):
58 64
139 154
Step-by-step for C[0][0]:
\[C[0][0] = A[0][0] \times B[0][0] + A[0][1] \times B[1][0] + A[0][2] \times B[2][0]\] \[= 1 \times 7 + 2 \times 9 + 3 \times 11 = 7 + 18 + 33 = 58 \quad β\]Step-by-step for C[1][1]:
\[C[1][1] = A[1][0] \times B[0][1] + A[1][1] \times B[1][1] + A[1][2] \times B[2][1]\] \[= 4 \times 8 + 5 \times 10 + 6 \times 12 = 32 + 50 + 72 = 154 \quad β\]Time Complexity: $O(m \times n \times p)$. For square matrices ($m = n = p$): $O(n^3)$.
Space Complexity: $O(m \times n)$ for the result matrix.
Q6. Explain the difference between Row-major and Column-major storage with diagrams.
Row-Major Order: Elements are stored row by row in memory. After all elements of row 0, row 1 follows, and so on. Used by C, C++, Java, Python (NumPy default).
Column-Major Order: Elements are stored column by column in memory. After all elements of column 0, column 1 follows, and so on. Used by FORTRAN, MATLAB, R.
Example: Consider a $3 \times 4$ matrix:
Logical View:
Col 0 Col 1 Col 2 Col 3
Row 0 [ a00 a01 a02 a03 ]
Row 1 [ a10 a11 a12 a13 ]
Row 2 [ a20 a21 a22 a23 ]
Row-Major Layout in Memory:
Address: 100 104 108 112 116 120 124 128 132 136 140 144
ββββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ
βa00 βa01 βa02 βa03 βa10 βa11 βa12 βa13 βa20 βa21 βa22 βa23 β
ββββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ
ββββ Row 0 ββββββββββ€ββββ Row 1 ββββββββββ€ββββ Row 2 ββββββββββ€
Column-Major Layout in Memory:
Address: 100 104 108 112 116 120 124 128 132 136 140 144
ββββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ¬βββββ
βa00 βa10 βa20 βa01 βa11 βa21 βa02 βa12 βa22 βa03 βa13 βa23 β
ββββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ΄βββββ
βββ Col 0 βββ€βββ Col 1 βββ€βββ Col 2 βββ€βββ Col 3 βββ€
Address Formulas (0-based, $m$ rows, $n$ cols, element size $w$):
| Order | Formula for $A[i][j]$ |
|---|---|
| Row-Major | $B + (i \times n + j) \times w$ |
| Column-Major | $B + (j \times m + i) \times w$ |
Key Differences:
| Aspect | Row-Major | Column-Major |
|---|---|---|
| Storage order | Row by row | Column by column |
| Efficient traversal | Row-wise (i varies slowly) | Column-wise (j varies slowly) |
| Languages | C, C++, Java, Python | FORTRAN, MATLAB, R |
| Cache performance | Better for row-wise access | Better for column-wise access |
Why it matters: Accessing elements in the order theyβre stored in memory (row-wise in C, column-wise in FORTRAN) gives much better cache performance because consecutive memory locations are loaded into cache together. Accessing in the wrong order causes frequent cache misses, slowing down the program significantly.
Practice Questions β Strings
Q1. Write C functions (without using library functions) for: String length, copy, concatenation, reverse, compare, palindrome check, and substring search.
All seven string functions implemented in C:
#include <stdio.h>
// 1. String Length β O(n)
int stringLength(char str[]) {
int len = 0;
while (str[len] != '\0') {
len++;
}
return len;
}
// 2. String Copy β O(n)
void stringCopy(char dest[], char src[]) {
int i = 0;
while (src[i] != '\0') {
dest[i] = src[i];
i++;
}
dest[i] = '\0'; // Null-terminate destination
}
// 3. String Concatenation β O(m + n)
void stringConcat(char dest[], char src[]) {
int i = 0, j = 0;
while (dest[i] != '\0') i++; // Find end of dest
while (src[j] != '\0') { // Append src
dest[i] = src[j];
i++;
j++;
}
dest[i] = '\0'; // Null-terminate result
}
// 4. String Reverse (in-place) β O(n)
void stringReverse(char str[]) {
int len = 0;
while (str[len] != '\0') len++;
int start = 0, end = len - 1;
while (start < end) {
char temp = str[start];
str[start] = str[end];
str[end] = temp;
start++;
end--;
}
}
// 5. String Compare β O(min(m, n))
// Returns: -1 if s1 < s2, 0 if equal, 1 if s1 > s2
int stringCompare(char s1[], char s2[]) {
int i = 0;
while (s1[i] != '\0' && s2[i] != '\0') {
if (s1[i] < s2[i]) return -1;
if (s1[i] > s2[i]) return 1;
i++;
}
if (s1[i] == '\0' && s2[i] == '\0') return 0;
if (s1[i] == '\0') return -1;
return 1;
}
// 6. Palindrome Check β O(n)
// Returns: 1 if palindrome, 0 otherwise
int isPalindrome(char str[]) {
int len = 0;
while (str[len] != '\0') len++;
int start = 0, end = len - 1;
while (start < end) {
if (str[start] != str[end])
return 0;
start++;
end--;
}
return 1;
}
// 7. Substring Search (Brute Force) β O(n Γ m)
// Returns: index of first occurrence, or -1 if not found
int substringSearch(char text[], char pattern[]) {
int n = 0, m = 0;
while (text[n] != '\0') n++;
while (pattern[m] != '\0') m++;
for (int i = 0; i <= n - m; i++) {
int j;
for (j = 0; j < m; j++) {
if (text[i + j] != pattern[j])
break;
}
if (j == m)
return i; // Found at index i
}
return -1; // Not found
}
Logic summary for each function:
| Function | Technique | Key Idea |
|---|---|---|
| Length | Linear scan | Count characters until '\0' |
| Copy | Index-by-index | Copy each char; manually add '\0' |
| Concat | Two-phase scan | Find end of dest, then append src |
| Reverse | Two-pointer swap | Swap from both ends, moving inward |
| Compare | Character-by-character | Lexicographic comparison via ASCII |
| Palindrome | Two-pointer compare | Same as reverse but compare instead of swap |
| Substring | Sliding window brute force | Try every starting position in text |
Q2. What is the null character '\0'? Why is it important in C strings?
The Null Character '\0':
- It is the character with ASCII value 0 (not the digit β0β which has ASCII value 48).
- It is a sentinel (end marker) for C strings.
- It occupies 1 byte in memory.
Why it is important:
-
C has no built-in string length field. Unlike Java or Python, C strings are just arrays of characters with no metadata. The only way to know where a string ends is the
'\0'marker. -
All standard string functions depend on it. Functions like
strlen(),printf("%s"),strcpy(),strcmp()all scan for'\0'to know when to stop. Without it, they would read past the string into garbage memory (buffer overrun). -
Memory allocation must account for it. A string of $n$ characters requires $n + 1$ bytes β the extra byte is for
'\0'.
Example:
char str[] = "Hello";
// Memory: ['H', 'e', 'l', 'l', 'o', '\0']
// sizeof(str) = 6 bytes (5 chars + 1 null)
// strlen(str) = 5 (counts characters, not null)
What happens without '\0':
char str[5] = {'H', 'e', 'l', 'l', 'o'}; // No null terminator!
printf("%s", str); // UNDEFINED BEHAVIOR β prints "Hello" followed by garbage
// until it happens to find a 0 byte in memory
Key point: The null character is what makes an array of characters a string in C. Without it, itβs just a character array.
Q3. Write a function to count the number of vowels in a string.
#include <stdio.h>
int countVowels(char str[]) {
int count = 0;
for (int i = 0; str[i] != '\0'; i++) {
char ch = str[i];
// Check for both lowercase and uppercase vowels
if (ch == 'a' || ch == 'e' || ch == 'i' || ch == 'o' || ch == 'u' ||
ch == 'A' || ch == 'E' || ch == 'I' || ch == 'O' || ch == 'U') {
count++;
}
}
return count;
}
int main() {
char str[] = "Hello World";
printf("Vowels in \"%s\": %d\n", str, countVowels(str));
return 0;
}
Output:
Vowels in "Hello World": 3
Trace:
H β not a vowel
e β vowel β (count = 1)
l β not
l β not
o β vowel β (count = 2)
β not (space)
W β not
o β vowel β (count = 3)
r β not
l β not
d β not
Result: 3
Time Complexity: $O(n)$ β single pass through the string.
Space Complexity: $O(1)$ β only a counter variable.
Q4. Write a function to check if two strings are anagrams.
Two strings are anagrams if they contain the same characters with the same frequencies, possibly in a different order. Example: βlistenβ and βsilentβ are anagrams.
Approach: Count character frequencies in both strings and compare.
#include <stdio.h>
int areAnagrams(char s1[], char s2[]) {
int count[256] = {0}; // Frequency array for all ASCII characters
int i;
// Count frequencies from s1
for (i = 0; s1[i] != '\0'; i++)
count[(unsigned char)s1[i]]++;
// Subtract frequencies from s2
for (i = 0; s2[i] != '\0'; i++)
count[(unsigned char)s2[i]]--;
// If all counts are 0, strings are anagrams
for (i = 0; i < 256; i++) {
if (count[i] != 0)
return 0; // Not anagrams
}
return 1; // Anagrams
}
int main() {
char s1[] = "listen";
char s2[] = "silent";
if (areAnagrams(s1, s2))
printf("\"%s\" and \"%s\" are anagrams.\n", s1, s2);
else
printf("\"%s\" and \"%s\" are NOT anagrams.\n", s1, s2);
return 0;
}
Output:
"listen" and "silent" are anagrams.
Trace for βlistenβ and βsilentβ:
After counting s1 ("listen"):
count['l']=1, count['i']=1, count['s']=1, count['t']=1, count['e']=1, count['n']=1
After subtracting s2 ("silent"):
count['s']=0, count['i']=0, count['l']=0, count['e']=0, count['n']=0, count['t']=0
All counts are 0 β Anagrams β
Time Complexity: $O(n + m)$ where $n, m$ are lengths of the two strings (plus $O(256) = O(1)$ for the final check).
Space Complexity: $O(1)$ β the frequency array has a fixed size of 256 regardless of input size.
Why this approach? Sorting both strings and comparing would also work ($O(n \log n)$), but the frequency-counting approach is faster at $O(n)$.
Q5. Explain the time complexity of each string operation you implemented.
| Operation | Time Complexity | Space Complexity | Explanation |
|---|---|---|---|
| Length | $O(n)$ | $O(1)$ | Must scan every character to find '\0'. No shortcut exists since C strings have no length metadata. |
| Copy | $O(n)$ | $O(1)$ | Copies each of the $n$ characters one by one, plus the null terminator. |
| Concatenation | $O(m + n)$ | $O(1)$ | Phase 1: traverse dest to find its end ($m$ steps). Phase 2: copy src ($n$ steps). Total = $m + n$. |
| Reverse | $O(n)$ | $O(1)$ | Finds length in $O(n)$, then does $n/2$ swaps. Total work is $O(n)$. In-place using two-pointer technique. |
| Compare | $O(\min(m, n))$ | $O(1)$ | Compares character by character. Stops at the first mismatch or when the shorter string ends. |
| Palindrome | $O(n)$ | $O(1)$ | Finds length in $O(n)$, then at most $n/2$ comparisons. Total = $O(n)$. |
| Substring Search | $O(n \times m)$ worst | $O(1)$ | For each of $O(n)$ starting positions, compares up to $m$ characters. Worst case: text = βAAAAβ¦ABβ, pattern = βAABβ. |
Important observations:
- All operations use $O(1)$ extra space β no auxiliary arrays.
- The $O(n)$ for string length means every function that needs the length internally scans the entire string first.
- Concatenation is $O(m + n)$, not $O(n)$, because finding the end of the destination string also takes time.
- Substring search can be improved to $O(n + m)$ using the KMP (Knuth-Morris-Pratt) algorithm, but the naive approach is the standard syllabus requirement.
Practice Questions β Searching
Q1. Write iterative and recursive versions of Binary Search. Trace both for the array [3, 7, 11, 15, 19, 23, 27] searching for key = 19.
Iterative Binary Search:
int binarySearchIterative(int A[], int n, int key) {
int low = 0, high = n - 1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (A[mid] == key)
return mid;
else if (A[mid] < key)
low = mid + 1;
else
high = mid - 1;
}
return -1;
}
Recursive Binary Search:
int binarySearchRecursive(int A[], int low, int high, int key) {
if (low > high)
return -1;
int mid = low + (high - low) / 2;
if (A[mid] == key)
return mid;
else if (A[mid] < key)
return binarySearchRecursive(A, mid + 1, high, key);
else
return binarySearchRecursive(A, low, mid - 1, key);
}
Trace β Iterative version for [3, 7, 11, 15, 19, 23, 27], key = 19:
Array: [3, 7, 11, 15, 19, 23, 27]
Index: 0 1 2 3 4 5 6
Step 1: low = 0, high = 6
mid = 0 + (6-0)/2 = 3
A[3] = 15 < 19 β low = 3 + 1 = 4
Step 2: low = 4, high = 6
mid = 4 + (6-4)/2 = 5
A[5] = 23 > 19 β high = 5 - 1 = 4
Step 3: low = 4, high = 4
mid = 4 + (4-4)/2 = 4
A[4] = 19 == 19 β FOUND at index 4 β
Total comparisons: 3
Trace β Recursive version for the same input:
Call 1: binarySearchRecursive(A, 0, 6, 19)
mid = 3, A[3] = 15 < 19
β Call binarySearchRecursive(A, 4, 6, 19)
Call 2: binarySearchRecursive(A, 4, 6, 19)
mid = 5, A[5] = 23 > 19
β Call binarySearchRecursive(A, 4, 4, 19)
Call 3: binarySearchRecursive(A, 4, 4, 19)
mid = 4, A[4] = 19 == 19
β Return 4
Unwinding: Call 3 returns 4 β Call 2 returns 4 β Call 1 returns 4
Result: FOUND at index 4 β
Both versions make exactly 3 comparisons. The iterative version uses $O(1)$ space; the recursive version uses $O(\log n)$ stack space (3 stack frames in this case).
Q2. Compare Linear Search and Binary Search in terms of prerequisites, time complexity, and use cases.
| Feature | Linear Search | Binary Search |
|---|---|---|
| Prerequisite | None β works on any array | Array must be sorted |
| Best Case | $O(1)$ β target at first position | $O(1)$ β target at middle |
| Worst Case | $O(n)$ β target at end or absent | $O(\log n)$ β target absent |
| Average Case | $O(n)$ β $\frac{n+1}{2}$ comparisons | $O(\log n)$ β $\approx \log_2 n$ comparisons |
| Space (iterative) | $O(1)$ | $O(1)$ |
| Space (recursive) | $O(n)$ | $O(\log n)$ |
| Data structure | Array or linked list | Array only (needs random access) |
| Implementation | Very simple (1 loop) | Slightly complex (midpoint logic) |
| For $n = 1{,}000{,}000$ | Up to 1,000,000 comparisons | At most 20 comparisons |
When to use each:
Use Linear Search when:
- The array is unsorted and sorting is not feasible
- The array is small ($n < 20$)
- You need to search only once (sorting + binary search has overhead)
- Data is stored in a linked list (no random access)
Use Binary Search when:
- The array is sorted (or can be sorted once and searched many times)
- The array is large
- Multiple searches will be performed (amortize sorting cost)
- Performance is critical
Break-even analysis: If you need to search a dataset $k$ times, binary search is worthwhile when:
\[O(n \log n) + k \cdot O(\log n) < k \cdot O(n)\]This simplifies to $k > \frac{n \log n}{n - \log n} \approx \log n$ for large $n$. So even just $\log n$ searches justify sorting first!
Q3. Why canβt Binary Search be used on a linked list efficiently?
Binary search requires random access β the ability to jump directly to any element by its index in $O(1)$ time. This is what makes computing and accessing the middle element efficient.
Why linked lists fail:
-
No random access: In a linked list, to access the $k$-th element, you must traverse $k$ nodes from the head. Accessing the middle element takes $O(n/2) = O(n)$ time.
-
Finding mid is costly: In an array,
mid = (low + high) / 2is computed in $O(1)$ and accessed in $O(1)$. In a linked list, finding the middle node requires traversing from the head each time β $O(n)$. -
Each iteration becomes $O(n)$: Binary search does $O(\log n)$ iterations. If each iteration costs $O(n)$ to find the middle, the total becomes $O(n \log n)$ β worse than linear search which is $O(n)$!
Comparison:
| Operation | Array | Linked List |
|---|---|---|
| Access middle element | $O(1)$ | $O(n)$ |
| Binary search total | $O(\log n)$ | $O(n \log n)$ |
| Linear search total | $O(n)$ | $O(n)$ |
Conclusion: For linked lists, linear search ($O(n)$) is the best option. Binary search on a linked list is theoretically possible but practically pointless since itβs slower than linear search. This is why binary search is designed specifically for arrays (or any structure with $O(1)$ random access, like a balanced BST, which effectively does binary search in its structure).
Q4. For an array of 1 million elements, how many comparisons does Binary Search need in the worst case?
Given: $n = 1{,}000{,}000 = 10^6$
Worst-case comparisons for Binary Search:
\[\text{Comparisons} = \lfloor \log_2 n \rfloor + 1\]Calculation:
\[\log_2(1{,}000{,}000) = \frac{\ln(10^6)}{\ln 2} = \frac{6 \times \ln 10}{\ln 2} = \frac{6 \times 2.3026}{0.6931} \approx 19.93\] \[\lfloor 19.93 \rfloor + 1 = 19 + 1 = \boxed{20}\]Verification by powers of 2:
\[2^{19} = 524{,}288\] \[2^{20} = 1{,}048{,}576\]Since $524{,}288 < 1{,}000{,}000 \leq 1{,}048{,}576$, we need at most 20 comparisons.
Interpretation: Even with one million elements, binary search finds the answer (or determines it doesnβt exist) in at most 20 comparisons! Linear search, by contrast, could need up to 1,000,000 comparisons.
| Array Size | Binary Search (worst) | Linear Search (worst) |
|---|---|---|
| 1,000 | 10 | 1,000 |
| 1,000,000 | 20 | 1,000,000 |
| 1,000,000,000 | 30 | 1,000,000,000 |
Q5. Modify Binary Search to return the first occurrence of a duplicate element.
The standard binary search may return any occurrence of the target when there are duplicates. To find the first (leftmost) occurrence, we modify the algorithm to continue searching left even after finding a match.
int binarySearchFirst(int A[], int n, int key) {
int low = 0, high = n - 1;
int result = -1; // Store the answer
while (low <= high) {
int mid = low + (high - low) / 2;
if (A[mid] == key) {
result = mid; // Record this occurrence
high = mid - 1; // Keep searching LEFT for earlier occurrence
}
else if (A[mid] < key)
low = mid + 1;
else
high = mid - 1;
}
return result;
}
Key change: When A[mid] == key, instead of immediately returning, we:
- Save
midinresult(in case this is the first occurrence) - Set
high = mid - 1to continue searching the left half for a possibly earlier occurrence
Trace Example:
Array: [2, 4, 4, 4, 8, 8, 10]
Index: 0 1 2 3 4 5 6
Search for key = 4 (first occurrence):
Step 1: low=0, high=6, mid=3 β A[3]=4 == 4
result=3, high=2 (search left)
Step 2: low=0, high=2, mid=1 β A[1]=4 == 4
result=1, high=0 (search left)
Step 3: low=0, high=0, mid=0 β A[0]=2 < 4
low=1
Step 4: low=1, high=0 β low > high β STOP
Return result = 1 β (first occurrence of 4 is at index 1)
Similarly, for the LAST occurrence:
int binarySearchLast(int A[], int n, int key) {
int low = 0, high = n - 1;
int result = -1;
while (low <= high) {
int mid = low + (high - low) / 2;
if (A[mid] == key) {
result = mid;
low = mid + 1; // Keep searching RIGHT for later occurrence
}
else if (A[mid] < key)
low = mid + 1;
else
high = mid - 1;
}
return result;
}
Time Complexity: Still $O(\log n)$ β same number of iterations as standard binary search. We just donβt early-exit on the first match.
Practice Questions β Sorting
Q1. Sort the array [5, 2, 8, 1, 9, 3] using (a) Bubble Sort (b) Selection Sort (c) Insertion Sort. Show all passes.
(a) Bubble Sort
Array: [5, 2, 8, 1, 9, 3]
Pass 1: Compare adjacent pairs (indices 0 to 4):
[5, 2, 8, 1, 9, 3] 5 > 2 β swap β [2, 5, 8, 1, 9, 3]
[2, 5, 8, 1, 9, 3] 5 < 8 β no swap
[2, 5, 8, 1, 9, 3] 8 > 1 β swap β [2, 5, 1, 8, 9, 3]
[2, 5, 1, 8, 9, 3] 8 < 9 β no swap
[2, 5, 1, 8, 9, 3] 9 > 3 β swap β [2, 5, 1, 8, 3, 9]
After Pass 1: [2, 5, 1, 8, 3, 9] β 9 is in place
Pass 2: Compare indices 0 to 3:
[2, 5, 1, 8, 3, 9] 2 < 5 β no swap
[2, 5, 1, 8, 3, 9] 5 > 1 β swap β [2, 1, 5, 8, 3, 9]
[2, 1, 5, 8, 3, 9] 5 < 8 β no swap
[2, 1, 5, 8, 3, 9] 8 > 3 β swap β [2, 1, 5, 3, 8, 9]
After Pass 2: [2, 1, 5, 3, 8, 9] β 8, 9 in place
Pass 3: Compare indices 0 to 2:
[2, 1, 5, 3, 8, 9] 2 > 1 β swap β [1, 2, 5, 3, 8, 9]
[1, 2, 5, 3, 8, 9] 2 < 5 β no swap
[1, 2, 5, 3, 8, 9] 5 > 3 β swap β [1, 2, 3, 5, 8, 9]
After Pass 3: [1, 2, 3, 5, 8, 9] β 5, 8, 9 in place
Pass 4: Compare indices 0 to 1:
[1, 2, 3, 5, 8, 9] 1 < 2 β no swap
[1, 2, 3, 5, 8, 9] 2 < 3 β no swap
No swaps occurred β swapped flag = 0 β EARLY EXIT
Final result: [1, 2, 3, 5, 8, 9] β
(b) Selection Sort
Array: [5, 2, 8, 1, 9, 3]
Pass 1: Find minimum in [5, 2, 8, 1, 9, 3] β min = 1 at index 3. Swap with index 0.
[1, 2, 8, 5, 9, 3]
β
Pass 2: Find minimum in [2, 8, 5, 9, 3] β min = 2 at index 1. Already in place.
[1, 2, 8, 5, 9, 3]
β β
Pass 3: Find minimum in [8, 5, 9, 3] β min = 3 at index 5. Swap with index 2.
[1, 2, 3, 5, 9, 8]
β β β
Pass 4: Find minimum in [5, 9, 8] β min = 5 at index 3. Already in place.
[1, 2, 3, 5, 9, 8]
β β β β
Pass 5: Find minimum in [9, 8] β min = 8 at index 5. Swap with index 4.
[1, 2, 3, 5, 8, 9]
β β β β β β
Final result: [1, 2, 3, 5, 8, 9] β
(c) Insertion Sort
Array: [5, 2, 8, 1, 9, 3]
i = 1: key = 2. Compare with sorted part [5].
2 < 5 β shift 5 right β insert 2 at pos 0
[2, 5, 8, 1, 9, 3] sorted: [2, 5]
i = 2: key = 8. Compare with sorted part [2, 5].
8 > 5 β stop. 8 is already in correct position.
[2, 5, 8, 1, 9, 3] sorted: [2, 5, 8]
i = 3: key = 1. Compare with sorted part [2, 5, 8].
1 < 8 β shift 8 right
1 < 5 β shift 5 right
1 < 2 β shift 2 right
Insert 1 at pos 0
[1, 2, 5, 8, 9, 3] sorted: [1, 2, 5, 8]
i = 4: key = 9. Compare with sorted part [1, 2, 5, 8].
9 > 8 β stop. 9 is already in correct position.
[1, 2, 5, 8, 9, 3] sorted: [1, 2, 5, 8, 9]
i = 5: key = 3. Compare with sorted part [1, 2, 5, 8, 9].
3 < 9 β shift 9 right
3 < 8 β shift 8 right
3 < 5 β shift 5 right
3 > 2 β stop. Insert 3 at pos 2
[1, 2, 3, 5, 8, 9] sorted: [1, 2, 3, 5, 8, 9]
Final result: [1, 2, 3, 5, 8, 9] β
Q2. Trace Merge Sort on the array [12, 11, 13, 5, 6, 7]. Show the divide and merge steps.
Array: [12, 11, 13, 5, 6, 7]
Phase 1: Divide (splitting into halves)
Level 0: [12, 11, 13, 5, 6, 7]
/ \
Level 1: [12, 11, 13] [5, 6, 7]
/ \ / \
Level 2: [12] [11, 13] [5] [6, 7]
/ \ / \
Level 3: [11] [13] [6] [7]
Phase 2: Merge (combining sorted sub-arrays bottom-up)
Merge [11] and [13]:
- Compare 11 and 13 β 11 < 13 β take 11, then take 13
- Result: [11, 13]
Merge [12] and [11, 13]:
- Compare 12 and 11 β 11 < 12 β take 11
- Compare 12 and 13 β 12 < 13 β take 12
- Remaining: 13 β take 13
- Result: [11, 12, 13]
Merge [6] and [7]:
- Compare 6 and 7 β 6 < 7 β take 6, then take 7
- Result: [6, 7]
Merge [5] and [6, 7]:
- Compare 5 and 6 β 5 < 6 β take 5
- Remaining: [6, 7] β take 6, then 7
- Result: [5, 6, 7]
Final Merge: [11, 12, 13] and [5, 6, 7]:
L = [11, 12, 13] R = [5, 6, 7]
i=0, j=0
Step 1: L[0]=11 vs R[0]=5 β 5 < 11 β take 5 Output: [5]
Step 2: L[0]=11 vs R[1]=6 β 6 < 11 β take 6 Output: [5, 6]
Step 3: L[0]=11 vs R[2]=7 β 7 < 11 β take 7 Output: [5, 6, 7]
Step 4: R exhausted. Copy remaining L: 11, 12, 13 Output: [5, 6, 7, 11, 12, 13]
Final result: [5, 6, 7, 11, 12, 13] β
Summary Tree
[12, 11, 13, 5, 6, 7]
/ \
[12, 11, 13] [5, 6, 7]
/ \ / \
[12] [11, 13] [5] [6, 7]
/ \ / \
[11] [13] [6] [7]
\ / \ /
[11, 13] [6, 7]
\ / \ /
[11, 12, 13] [5, 6, 7]
\ /
[5, 6, 7, 11, 12, 13] β
Q3. Trace Quick Sort on [10, 7, 8, 9, 1, 5] using the last element as pivot. Show each partition step.
Array: [10, 7, 8, 9, 1, 5], pivot = last element
Step 1: quickSort(A, 0, 5) β pivot = A[5] = 5
i = -1 (boundary of β€ pivot section)
j=0: A[0]=10 > 5 β skip i=-1
j=1: A[1]=7 > 5 β skip i=-1
j=2: A[2]=8 > 5 β skip i=-1
j=3: A[3]=9 > 5 β skip i=-1
j=4: A[4]=1 β€ 5 β i=0, swap A[0]βA[4] i=0
Array: [1, 7, 8, 9, 10, 5]
Final: swap A[i+1]βA[high] β swap A[1]βA[5]
Array: [1, 5, 8, 9, 10, 7]
Pivot 5 is at index 1.
Left partition: [1] (index 0 to 0)
Right partition: [8, 9, 10, 7] (index 2 to 5)
Step 2: quickSort(A, 0, 0) β single element [1]
Base case: low >= high β already sorted. [1] β
Step 3: quickSort(A, 2, 5) β sub-array [8, 9, 10, 7], pivot = A[5] = 7
i = 1 (starts at low - 1 = 2 - 1 = 1)
j=2: A[2]=8 > 7 β skip i=1
j=3: A[3]=9 > 7 β skip i=1
j=4: A[4]=10 > 7 β skip i=1
Final: swap A[i+1]βA[high] β swap A[2]βA[5]
Array: [1, 5, 7, 9, 10, 8]
Pivot 7 is at index 2.
Left partition: [] (empty, index 2 to 1)
Right partition: [9, 10, 8] (index 3 to 5)
Step 4: quickSort(A, 3, 5) β sub-array [9, 10, 8], pivot = A[5] = 8
i = 2 (starts at low - 1 = 3 - 1 = 2)
j=3: A[3]=9 > 8 β skip i=2
j=4: A[4]=10 > 8 β skip i=2
Final: swap A[i+1]βA[high] β swap A[3]βA[5]
Array: [1, 5, 7, 8, 10, 9]
Pivot 8 is at index 3.
Left partition: [] (empty, index 3 to 2)
Right partition: [10, 9] (index 4 to 5)
Step 5: quickSort(A, 4, 5) β sub-array [10, 9], pivot = A[5] = 9
i = 3 (starts at low - 1 = 4 - 1 = 3)
j=4: A[4]=10 > 9 β skip i=3
Final: swap A[i+1]βA[high] β swap A[4]βA[5]
Array: [1, 5, 7, 8, 9, 10]
Pivot 9 is at index 4.
Left partition: [] (empty)
Right partition: [10] (single element)
Final sorted array: [1, 5, 7, 8, 9, 10] β
Summary of pivots chosen and their final positions:
| Partition | Pivot | Final Position |
|---|---|---|
| [10,7,8,9,1,5] | 5 | index 1 |
| [8,9,10,7] | 7 | index 2 |
| [9,10,8] | 8 | index 3 |
| [10,9] | 9 | index 4 |
Q4. Build a Max-Heap from the array [4, 10, 3, 5, 1]. Then perform Heap Sort step by step.
Array: [4, 10, 3, 5, 1], $n = 5$
Phase 1: Build Max-Heap
Start heapifying from the last non-leaf node: index $\lfloor n/2 \rfloor - 1 = \lfloor 5/2 \rfloor - 1 = 1$.
Initial tree:
4
/ \
10 3
/ \
5 1
i = 1: Heapify node at index 1 (value 10)
- Left child: index 3 (value 5)
- Right child: index 4 (value 1)
- Largest = 10 (parent) β no swap needed
Array: [4, 10, 3, 5, 1] (unchanged)
i = 0: Heapify node at index 0 (value 4)
- Left child: index 1 (value 10)
- Right child: index 2 (value 3)
- Largest = 10 at index 1 β swap A[0] and A[1]
Array: [10, 4, 3, 5, 1]
10
/ \
4 3
/ \
5 1
Now recursively heapify index 1 (value 4):
- Left child: index 3 (value 5)
- Right child: index 4 (value 1)
- Largest = 5 at index 3 β swap A[1] and A[3]
Array: [10, 5, 3, 4, 1]
10
/ \
5 3
/ \
4 1
Recursively heapify index 3 (value 4):
- No children within bounds β stop
Max-Heap built: [10, 5, 3, 4, 1] β
Phase 2: Heap Sort (extract max repeatedly)
Extraction 1: Swap root (10) with last element (1). Heap size = 4.
Array: [1, 5, 3, 4, | 10] (10 is sorted)
Heapify index 0 (value 1):
Left=5, Right=3 β largest=5 at index 1 β swap A[0]βA[1]
Array: [5, 1, 3, 4, | 10]
Heapify index 1 (value 1):
Left=4 β largest=4 at index 3 β swap A[1]βA[3]
Array: [5, 4, 3, 1, | 10]
Heap: [5, 4, 3, 1]
5
/ \
4 3
/
1
Extraction 2: Swap root (5) with last unsorted (1). Heap size = 3.
Array: [1, 4, 3, | 5, 10] (5, 10 sorted)
Heapify index 0 (value 1):
Left=4, Right=3 β largest=4 at index 1 β swap A[0]βA[1]
Array: [4, 1, 3, | 5, 10]
Heap: [4, 1, 3]
4
/ \
1 3
Extraction 3: Swap root (4) with last unsorted (3). Heap size = 2.
Array: [3, 1, | 4, 5, 10] (4, 5, 10 sorted)
Heapify index 0 (value 3):
Left=1 β largest=3 (parent) β no swap
Heap: [3, 1]
3
/
1
Extraction 4: Swap root (3) with last unsorted (1). Heap size = 1.
Array: [1, | 3, 4, 5, 10] (3, 4, 5, 10 sorted)
Single element β stop.
Final sorted array: [1, 3, 4, 5, 10] β
Q5. Sort [1, 4, 1, 2, 7, 5, 2] using Counting Sort. Show the count array and output.
Array A = [1, 4, 1, 2, 7, 5, 2], $n = 7$, range $k = 7$ (max value)
Step 1: Count occurrences
Scan A and increment count[A[i]]:
A[0]=1 β count[1]++
A[1]=4 β count[4]++
A[2]=1 β count[1]++
A[3]=2 β count[2]++
A[4]=7 β count[7]++
A[5]=5 β count[5]++
A[6]=2 β count[2]++
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| Count | 0 | 2 | 2 | 0 | 1 | 1 | 0 | 1 |
Step 2: Cumulative count (prefix sum)
count[i] += count[i-1]
| Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| Cumulative | 0 | 2 | 4 | 4 | 5 | 6 | 6 | 7 |
Meaning: count[v] = number of elements β€ $v$. For example, count[2] = 4 means there are 4 elements with value β€ 2.
Step 3: Build output array (traverse input right-to-left for stability)
i=6: A[6]=2 β position = count[2]-1 = 4-1 = 3 β output[3]=2, count[2]=3
i=5: A[5]=5 β position = count[5]-1 = 6-1 = 5 β output[5]=5, count[5]=5
i=4: A[4]=7 β position = count[7]-1 = 7-1 = 6 β output[6]=7, count[7]=6
i=3: A[3]=2 β position = count[2]-1 = 3-1 = 2 β output[2]=2, count[2]=2
i=2: A[2]=1 β position = count[1]-1 = 2-1 = 1 β output[1]=1, count[1]=1
i=1: A[1]=4 β position = count[4]-1 = 5-1 = 4 β output[4]=4, count[4]=4
i=0: A[0]=1 β position = count[1]-1 = 1-1 = 0 β output[0]=1, count[1]=0
| Output index | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| Value | 1 | 1 | 2 | 2 | 4 | 5 | 7 |
Final sorted array: [1, 1, 2, 2, 4, 5, 7] β
Stability verified: The two 1βs β A[0]=1 appeared before A[2]=1 in input. In output, A[0]βs 1 is at index 0 and A[2]βs 1 is at index 1 β original order preserved β. Same for the two 2βs.
Q6. What is a stable sort? Give an example showing why stability matters.
Definition: A sorting algorithm is stable if it preserves the relative order of elements that have equal keys. That is, if two elements have the same key value and element A appeared before element B in the input, then A still appears before B in the sorted output.
Example β Student Records sorted by marks:
Input (already sorted alphabetically):
| Name | Marks |
|---|---|
| Alice | 85 |
| Bob | 90 |
| Charlie | 85 |
| Dave | 90 |
Stable sort by marks:
| Name | Marks |
|---|---|
| Alice | 85 |
| Charlie | 85 |
| Bob | 90 |
| Dave | 90 |
Alice still comes before Charlie (both scored 85) β
Bob still comes before Dave (both scored 90) β
The alphabetical ordering within the same marks is preserved!
Unstable sort by marks (possible output):
| Name | Marks |
|---|---|
| Charlie | 85 |
| Alice | 85 |
| Dave | 90 |
| Bob | 90 |
Charlie before Alice β β the original relative order is broken.
Why stability matters:
-
Multi-key sorting: Suppose you first sort by name, then sort by marks. With a stable sort, students with the same marks remain in alphabetical order. With an unstable sort, the alphabetical ordering within a group is destroyed.
-
Database operations: When sorting records that are already ordered by a secondary key, stability preserves that secondary ordering.
-
Radix sort depends on stability: Radix sort sorts by least significant digit first, using a stable sort at each digit. If the per-digit sort were unstable, the entire algorithm would produce wrong results.
Classification:
| Stable Sorts | Unstable Sorts |
|---|---|
| Bubble Sort | Selection Sort |
| Insertion Sort | Quick Sort |
| Merge Sort | Heap Sort |
| Counting Sort | Β |
Q7. Why is Quick Sort preferred over Merge Sort in practice despite having $O(n^2)$ worst case?
Despite its $O(n^2)$ worst case, Quick Sort is often the default choice in practice for the following reasons:
1. Better Cache Performance:
- Quick Sort works in-place on the same array. It accesses elements that are close together in memory, leading to excellent cache locality.
- Merge Sort copies data to temporary arrays during merging. This causes more cache misses because data is spread across different memory locations.
2. Lower Space Overhead:
- Quick Sort: $O(\log n)$ space (recursion stack only)
- Merge Sort: $O(n)$ extra space (auxiliary arrays for merging)
- For large datasets, the $O(n)$ memory overhead of Merge Sort can be significant.
3. Smaller Constant Factor:
- Quick Sortβs inner loop is very tight: compare and swap adjacent elements. The constant factor hidden in $O(n \log n)$ is smaller than Merge Sortβs.
- Merge Sort has overhead from copying to and from temporary arrays.
4. Worst Case is Avoidable:
- The $O(n^2)$ worst case occurs when the pivot is always the smallest or largest element (e.g., sorted array with first/last element as pivot).
- Randomized pivot selection or median-of-three makes the worst case extremely improbable β practically $O(n \log n)$ always.
5. In-Place Operation:
- Quick Sort doesnβt need extra arrays. For memory-constrained systems, this is a significant advantage.
6. Practical Benchmarks:
- On real-world data, Quick Sort typically outperforms Merge Sort by 2β3Γ due to the factors above.
- Java uses Quick Sort for primitives; Pythonβs Timsort is a hybrid of Merge Sort + Insertion Sort (chosen for stability, not speed).
When Merge Sort IS preferred:
- When stability is required
- When worst-case $O(n \log n)$ is guaranteed (safety-critical systems)
- When sorting linked lists (merge is easy without extra space; Quick Sortβs random access is expensive)
- When sorting data from external storage (merge sortβs sequential access pattern suits disk I/O)
Q8. Can Bubble Sortβs worst case be improved from $O(n^2)$? Explain the optimization using a swap flag.
Short answer: The worst case remains $O(n^2)$ β it cannot be improved with a swap flag. The flag only improves the best case from $O(n^2)$ to $O(n)$.
The Swap Flag Optimization:
void bubbleSortOptimized(int A[], int n) {
for (int i = 0; i < n - 1; i++) {
int swapped = 0; // Flag
for (int j = 0; j < n - 1 - i; j++) {
if (A[j] > A[j + 1]) {
int temp = A[j];
A[j] = A[j + 1];
A[j + 1] = temp;
swapped = 1;
}
}
if (!swapped) break; // No swaps β array is sorted β exit early
}
}
How it works: After each pass, if no swaps occurred, the array is already sorted. The flag detects this and exits early.
Best Case (already sorted array):
Array: [1, 2, 3, 4, 5]
Pass 1: Compare all adjacent pairs β no swaps β swapped = 0 β BREAK
Total comparisons: n - 1 = 4 β O(n)
Without the flag, even a sorted array would go through all $n-1$ passes, doing $\frac{n(n-1)}{2}$ comparisons β $O(n^2)$.
Worst Case (reverse sorted array):
Array: [5, 4, 3, 2, 1]
Pass 1: 4 swaps β [4, 3, 2, 1, 5] swapped = 1
Pass 2: 3 swaps β [3, 2, 1, 4, 5] swapped = 1
Pass 3: 2 swaps β [2, 1, 3, 4, 5] swapped = 1
Pass 4: 1 swap β [1, 2, 3, 4, 5] swapped = 1
All passes needed. Total comparisons = 4+3+2+1 = 10 = n(n-1)/2 β O(nΒ²)
The flag never triggers early exit because every pass has at least one swap.
| Case | Without Flag | With Flag |
|---|---|---|
| Best (sorted) | $O(n^2)$ | $O(n)$ β improved |
| Average | $O(n^2)$ | $O(n^2)$ β no change |
| Worst (reverse) | $O(n^2)$ | $O(n^2)$ β no change |
Conclusion: The swap flag optimization helps detect an already-sorted or nearly-sorted array early, but it cannot improve the fundamental worst-case complexity of $O(n^2)$. To achieve better worst-case performance, you need a fundamentally different algorithm like Merge Sort or Heap Sort.
Q9. Which sorting algorithms are suitable for nearly sorted data? Explain why.
Best algorithms for nearly sorted data:
1. Insertion Sort β Best Choice β
- Nearly sorted performance: $O(n)$ to $O(n \times k)$, where $k$ is how far each element is from its correct position.
- Why: Insertion sort shifts elements to their correct position. If theyβre already close to their correct spot, very few shifts are needed. The inner
whileloop terminates quickly. - Example:
[1, 3, 2, 4, 5, 7, 6, 8]β each element is at most 1 position away. Insertion sort handles this in nearly $O(n)$.
2. Bubble Sort (with swap flag) β Good
- Nearly sorted performance: $O(n)$ (detects sorted data in one pass)
- Why: If only a few elements are out of place, most passes will have few or no swaps. The swap flag terminates early once a pass has no swaps.
- Limitation: Not as fast as insertion sort when elements are slightly out of order but still need multiple passes.
3. TimSort (hybrid) β Industry Standard
- Used by Python and Java. Itβs a hybrid of Merge Sort and Insertion Sort.
- Why: It detects existing sorted βrunsβ in the data and merges them. Nearly sorted data has long runs, so TimSort approaches $O(n)$.
Why these work better on nearly sorted data:
| Algorithm | Nearly Sorted Time | Reason |
|---|---|---|
| Insertion Sort | $O(n)$ | Few shifts per element |
| Bubble Sort (flag) | $O(n)$ | Early termination |
| Merge Sort | $O(n \log n)$ | Cannot exploit existing order |
| Quick Sort | $O(n \log n)$ to $O(n^2)$ | Sorted data is its WORST case (with fixed pivot) |
| Heap Sort | $O(n \log n)$ | Cannot exploit existing order |
| Selection Sort | $O(n^2)$ | Always does all comparisons regardless |
Insertion Sort is the clear winner for nearly sorted data. This is why practical algorithms like TimSort use insertion sort for small or nearly sorted sub-arrays.
Q10. Compare all sorting algorithms in a table. Under what conditions would you choose each?
Master Comparison Table:
| Algorithm | Best | Average | Worst | Space | Stable | In-Place | Method |
|---|---|---|---|---|---|---|---|
| Bubble Sort | $O(n)$ | $O(n^2)$ | $O(n^2)$ | $O(1)$ | β | β | Comparison |
| Selection Sort | $O(n^2)$ | $O(n^2)$ | $O(n^2)$ | $O(1)$ | β | β | Comparison |
| Insertion Sort | $O(n)$ | $O(n^2)$ | $O(n^2)$ | $O(1)$ | β | β | Comparison |
| Merge Sort | $O(n \log n)$ | $O(n \log n)$ | $O(n \log n)$ | $O(n)$ | β | β | Divide & Conquer |
| Quick Sort | $O(n \log n)$ | $O(n \log n)$ | $O(n^2)$ | $O(\log n)$ | β | β | Divide & Conquer |
| Heap Sort | $O(n \log n)$ | $O(n \log n)$ | $O(n \log n)$ | $O(1)$ | β | β | Selection (heap-based) |
| Counting Sort | $O(n+k)$ | $O(n+k)$ | $O(n+k)$ | $O(n+k)$ | β | β | Non-comparison |
When to choose each:
| Scenario | Best Algorithm | Reason |
|---|---|---|
| Small array ($n < 50$) | Insertion Sort | Low overhead, simple, fast for small $n$ |
| Nearly sorted data | Insertion Sort | Nearly $O(n)$ performance |
| General purpose, fast | Quick Sort | Best average performance, good cache behavior |
| Guaranteed $O(n \log n)$ | Merge Sort or Heap Sort | No worst-case degradation |
| Stability required | Merge Sort | Only stable $O(n \log n)$ sort |
| Memory constrained | Heap Sort | $O(1)$ extra space, guaranteed $O(n \log n)$ |
| Integer keys in small range | Counting Sort | $O(n+k)$ β faster than any comparison sort |
| Writes are expensive (flash memory) | Selection Sort | Minimum swaps: exactly $n - 1$ |
| Linked list | Merge Sort | No random access needed; merge is natural for lists |
| External sorting (data on disk) | Merge Sort | Sequential access pattern suits disk I/O |
| Simple implementation needed | Bubble Sort or Insertion Sort | Easy to code and debug |
| Online sorting (data arrives incrementally) | Insertion Sort | Can sort elements as they arrive |
Q11. Prove that any comparison-based sorting algorithm has a lower bound of $\Omega(n \log n)$.
Theorem: Any comparison-based sorting algorithm must make at least $\Omega(n \log n)$ comparisons in the worst case to sort $n$ elements.
Proof using the Decision Tree model:
Step 1: Model sorting as a decision tree.
Any comparison-based sort can be modeled as a binary decision tree where:
- Each internal node represents a comparison ($a_i \leq a_j$?)
- Each branch represents the outcome (yes/no)
- Each leaf represents a specific permutation (output ordering)
Step 2: Count the leaves.
For $n$ distinct elements, there are $n!$ possible input permutations. Each permutation must lead to a different leaf (because different orderings require different outputs). Therefore:
\[\text{Number of leaves} \geq n!\]Step 3: Relate tree height to leaves.
A binary tree of height $h$ has at most $2^h$ leaves. Since we need at least $n!$ leaves:
\[2^h \geq n!\] \[h \geq \log_2(n!)\]Step 4: Apply Stirlingβs approximation.
Using Stirlingβs approximation: $n! \approx \sqrt{2\pi n} \left(\frac{n}{e}\right)^n$
\[\log_2(n!) = \sum_{i=1}^{n} \log_2 i\]A simpler bound: $n! \geq \left(\frac{n}{2}\right)^{n/2}$ (because at least half the terms in the product are $\geq n/2$).
\[h \geq \log_2\left(\left(\frac{n}{2}\right)^{n/2}\right) = \frac{n}{2} \cdot \log_2\left(\frac{n}{2}\right) = \frac{n}{2}(\log_2 n - 1)\] \[h \geq \frac{n \log_2 n}{2} - \frac{n}{2} = \Omega(n \log n)\]Alternatively, using the exact form:
\[\log_2(n!) = \log_2 1 + \log_2 2 + \cdots + \log_2 n = \sum_{i=1}^{n} \log_2 i\] \[\geq \sum_{i=n/2}^{n} \log_2 i \geq \frac{n}{2} \cdot \log_2\left(\frac{n}{2}\right) = \Omega(n \log n)\]Step 5: Conclusion.
The height $h$ represents the maximum number of comparisons (the worst-case path from root to leaf). Since $h = \Omega(n \log n)$, any comparison-based sort must make $\Omega(n \log n)$ comparisons in the worst case.
\[\boxed{\text{Lower bound for comparison-based sorting} = \Omega(n \log n)}\]Important implications:
- Merge Sort and Heap Sort are optimal comparison-based sorts since they achieve $O(n \log n)$.
- To sort faster than $O(n \log n)$, you must use non-comparison methods (e.g., Counting Sort, Radix Sort), which exploit special structure in the keys.
Q12. Write C code for Merge Sort and trace its execution for [38, 27, 43, 3].
C Code:
#include <stdio.h>
void merge(int A[], int left, int mid, int right) {
int n1 = mid - left + 1; // Size of left half
int n2 = right - mid; // Size of right half
// Create temporary arrays
int L[n1], R[n2];
for (int i = 0; i < n1; i++) L[i] = A[left + i];
for (int j = 0; j < n2; j++) R[j] = A[mid + 1 + j];
// Merge the two sorted halves back into A
int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) // <= for stability
A[k++] = L[i++];
else
A[k++] = R[j++];
}
// Copy remaining elements
while (i < n1) A[k++] = L[i++];
while (j < n2) A[k++] = R[j++];
}
void mergeSort(int A[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(A, left, mid); // Sort left half
mergeSort(A, mid + 1, right); // Sort right half
merge(A, left, mid, right); // Merge sorted halves
}
}
int main() {
int A[] = {38, 27, 43, 3};
int n = 4;
printf("Before: ");
for (int i = 0; i < n; i++) printf("%d ", A[i]);
printf("\n");
mergeSort(A, 0, n - 1);
printf("After: ");
for (int i = 0; i < n; i++) printf("%d ", A[i]);
printf("\n");
return 0;
}
Output:
Before: 38 27 43 3
After: 3 27 38 43
Detailed Trace for [38, 27, 43, 3]:
mergeSort(A, 0, 3)
βββ mid = 1
βββ mergeSort(A, 0, 1) β Sort [38, 27]
β βββ mid = 0
β βββ mergeSort(A, 0, 0) β [38] β base case (single element)
β βββ mergeSort(A, 1, 1) β [27] β base case (single element)
β βββ merge(A, 0, 0, 1) β Merge [38] and [27]
β L = [38], R = [27]
β 38 > 27 β take 27 β A = [27, __, 43, 3]
β Copy remaining L: 38 β A = [27, 38, 43, 3]
β β Merged: [27, 38]
β
βββ mergeSort(A, 2, 3) β Sort [43, 3]
β βββ mid = 2
β βββ mergeSort(A, 2, 2) β [43] β base case (single element)
β βββ mergeSort(A, 3, 3) β [3] β base case (single element)
β βββ merge(A, 2, 2, 3) β Merge [43] and [3]
β L = [43], R = [3]
β 43 > 3 β take 3 β A = [27, 38, 3, __]
β Copy remaining L: 43 β A = [27, 38, 3, 43]
β β Merged: [3, 43]
β
βββ merge(A, 0, 1, 3) β Merge [27, 38] and [3, 43]
L = [27, 38], R = [3, 43]
Step 1: L[0]=27 vs R[0]=3 β 3 < 27 β take 3 A = [3, __, __, __]
Step 2: L[0]=27 vs R[1]=43 β 27 < 43 β take 27 A = [3, 27, __, __]
Step 3: L[1]=38 vs R[1]=43 β 38 < 43 β take 38 A = [3, 27, 38, __]
Step 4: L exhausted. Copy remaining R: 43 A = [3, 27, 38, 43]
β Final merged: [3, 27, 38, 43]
Summary of recursive calls:
[38, 27, 43, 3]
/ \
[38, 27] [43, 3]
/ \ / \
[38] [27] [43] [3]
\ / \ /
[27, 38] [3, 43]
\ /
[3, 27, 38, 43] β
Total comparisons: 4 (1 in first merge + 1 in second merge + 2 in final merge).
Comprehensive Unit II Practice Set
Short Answer
Q1. Define sequential organization. Why do arrays support random access?
Sequential Organization: A method of storing data elements in consecutive (contiguous) memory locations. Each element is placed immediately after the previous one in memory, with no gaps.
Why arrays support random access:
Arrays support random access β accessing any element in $O(1)$ time β because:
- Contiguous storage: All elements are stored back-to-back in memory. The address of any element can be calculated directly from the base address:
-
Uniform element size: Every element has the same size ($w$ bytes), so the offset from the base is a simple multiplication.
-
Direct hardware support: The CPU can compute the address using simple arithmetic (multiply and add), then issue a single memory read β no traversal needed.
Contrast with linked lists: In a linked list, elements are scattered across memory, connected by pointers. To access the $k$-th element, you must follow $k$ pointers from the head β thatβs $O(k)$ time, not random access.
Q2. What is the address formula for A[i][j] in row-major and column-major order?
General formulas for 2D array A[lβ ... uβ][lβ ... uβ]:
Let:
- $B$ = base address
- $w$ = size of each element in bytes
- $R = u_1 - l_1 + 1$ = number of rows
- $C = u_2 - l_2 + 1$ = number of columns
Row-Major Order (row by row):
\[\boxed{\text{Address of } A[i][j] = B + \big[(i - l_1) \times C + (j - l_2)\big] \times w}\]Explanation: Skip $(i - l_1)$ complete rows (each of $C$ elements), then skip $(j - l_2)$ elements within the current row.
Column-Major Order (column by column):
\[\boxed{\text{Address of } A[i][j] = B + \big[(j - l_2) \times R + (i - l_1)\big] \times w}\]Explanation: Skip $(j - l_2)$ complete columns (each of $R$ elements), then skip $(i - l_1)$ elements within the current column.
Simplified formulas (when indexing starts at 0, i.e., $l_1 = l_2 = 0$):
| Order | Formula |
|---|---|
| Row-Major | $B + (i \times n + j) \times w$ |
| Column-Major | $B + (j \times m + i) \times w$ |
where $m$ = number of rows, $n$ = number of columns.
Q3. Write C functions for string length and palindrome check without using library functions.
// String Length β O(n) time, O(1) space
int stringLength(char str[]) {
int len = 0;
while (str[len] != '\0') {
len++;
}
return len;
}
Logic: Start counter at 0, walk through the array until the null terminator '\0' is found. The counter value at that point is the string length.
// Palindrome Check β O(n) time, O(1) space
// Returns: 1 if palindrome, 0 otherwise
int isPalindrome(char str[]) {
// Step 1: Find length
int len = 0;
while (str[len] != '\0') len++;
// Step 2: Compare from both ends
int start = 0, end = len - 1;
while (start < end) {
if (str[start] != str[end])
return 0; // Mismatch β not a palindrome
start++;
end--;
}
return 1; // All pairs matched β palindrome
}
Logic: Use two pointers β start at the beginning and end at the end. Compare characters moving inward. If any pair mismatches, itβs not a palindrome. If all pairs match (or pointers cross), it is.
Example traces:
stringLength("Hello") β 'H','e','l','l','o','\0' β len = 5
isPalindrome("racecar"):
r == r β, a == a β, c == c β, e (middle) β return 1 β
isPalindrome("hello"):
h != o β return 0 β
Q4. What is the difference between stable and unstable sorting?
| Aspect | Stable Sort | Unstable Sort |
|---|---|---|
| Definition | Preserves relative order of equal elements | May change relative order of equal elements |
| Guarantee | If $A[i] = A[j]$ and $i < j$ in input, then $A[i]$ appears before $A[j]$ in output | No such guarantee |
| Examples | Bubble Sort, Insertion Sort, Merge Sort, Counting Sort | Selection Sort, Quick Sort, Heap Sort |
| Use case | Multi-key sorting, maintaining secondary order | General purpose where stability isnβt needed |
Concrete example:
Input: [(apple, 3), (banana, 1), (cherry, 3), (date, 2)]
Sort by the number:
- Stable:
[(banana, 1), (date, 2), (apple, 3), (cherry, 3)]β apple before cherry β - Unstable:
[(banana, 1), (date, 2), (cherry, 3), (apple, 3)]β cherry before apple β
Stability matters when you need to preserve an existing ordering (e.g., sort by score, keeping alphabetical order within same scores).
Q5. Why is Binary Search $O(\log n)$? Prove it.
Intuitive explanation: Binary search halves the search space with every comparison. Starting with $n$ elements:
\[n \to \frac{n}{2} \to \frac{n}{4} \to \frac{n}{8} \to \cdots \to 1\]The number of halvings needed to go from $n$ to $1$ is $\log_2 n$.
Formal proof using recurrence:
Let $T(n)$ = number of comparisons for binary search on $n$ elements.
Recurrence relation:
\[T(n) = T\left(\frac{n}{2}\right) + 1, \quad T(1) = 1\]- $T(n/2)$: After one comparison, we recurse on half the array
- $+1$: The one comparison with the middle element
Solving by repeated substitution:
\[T(n) = T\left(\frac{n}{2}\right) + 1\] \[= T\left(\frac{n}{4}\right) + 1 + 1 = T\left(\frac{n}{4}\right) + 2\] \[= T\left(\frac{n}{8}\right) + 3\] \[= T\left(\frac{n}{2^k}\right) + k\]The recursion stops when $\frac{n}{2^k} = 1$, i.e., $2^k = n$, i.e., $k = \log_2 n$.
\[T(n) = T(1) + \log_2 n = 1 + \log_2 n\] \[\boxed{T(n) = \lfloor \log_2 n \rfloor + 1 = O(\log n)}\]Alternative proof using Master Theorem:
$T(n) = 1 \cdot T(n/2) + O(1)$
Here $a = 1$, $b = 2$, $f(n) = O(1)$, and $n^{\log_b a} = n^{\log_2 1} = n^0 = 1$.
Since $f(n) = \Theta(n^{\log_b a}) = \Theta(1)$, by Case 2 of the Master Theorem:
\[T(n) = \Theta(\log n)\]Tracing & Analytical
Q6. An integer array A[4][6] starts at address 2000. Element size = 4 bytes. Find address of A[2][3] in row-major and column-major.
Given:
- Array: $A[4][6]$ β $m = 4$ rows, $n = 6$ columns
- Base address $B = 2000$
- Element size $w = 4$ bytes
- Target: $A[2][3]$ β $i = 2$, $j = 3$
- Indexing starts at 0 ($l_1 = 0, l_2 = 0$)
Row-Major Order:
\[\text{Address} = B + (i \times n + j) \times w\] \[= 2000 + (2 \times 6 + 3) \times 4\] \[= 2000 + (12 + 3) \times 4\] \[= 2000 + 15 \times 4\] \[= 2000 + 60\] \[= \boxed{2060}\]Explanation: Before row 2, there are $2 \times 6 = 12$ elements (rows 0 and 1, each with 6 elements). Within row 2, element at column 3 is the 4th element. Total elements before $A[2][3]$ = $12 + 3 = 15$. Offset = $15 \times 4 = 60$ bytes.
Column-Major Order:
\[\text{Address} = B + (j \times m + i) \times w\] \[= 2000 + (3 \times 4 + 2) \times 4\] \[= 2000 + (12 + 2) \times 4\] \[= 2000 + 14 \times 4\] \[= 2000 + 56\] \[= \boxed{2056}\]Explanation: Before column 3, there are $3 \times 4 = 12$ elements (columns 0, 1, 2, each with 4 elements). Within column 3, element at row 2 is the 3rd element. Total elements before $A[2][3]$ = $12 + 2 = 14$. Offset = $14 \times 4 = 56$ bytes.
Q7. Sort [29, 10, 14, 37, 13] using each of the 6 sorting algorithms. Show all intermediate steps.
(a) Bubble Sort
Initial: [29, 10, 14, 37, 13]
Pass 1 (compare indices 0-3):
29 > 10 β swap β [10, 29, 14, 37, 13]
29 > 14 β swap β [10, 14, 29, 37, 13]
29 < 37 β no swap
37 > 13 β swap β [10, 14, 29, 13, 37]
After: [10, 14, 29, 13, 37] β 37 in place
Pass 2 (compare indices 0-2):
10 < 14 β no swap
14 < 29 β no swap
29 > 13 β swap β [10, 14, 13, 29, 37]
After: [10, 14, 13, 29, 37] β 29 in place
Pass 3 (compare indices 0-1):
10 < 14 β no swap
14 > 13 β swap β [10, 13, 14, 29, 37]
After: [10, 13, 14, 29, 37] β 14 in place
Pass 4 (compare index 0):
10 < 13 β no swap
No swaps β EARLY EXIT
Result: [10, 13, 14, 29, 37] β
(b) Selection Sort
Initial: [29, 10, 14, 37, 13]
Pass 1: Find min in [29,10,14,37,13] β 10 at idx 1. Swap with idx 0.
[10, 29, 14, 37, 13]
Pass 2: Find min in [29,14,37,13] β 13 at idx 4. Swap with idx 1.
[10, 13, 14, 37, 29]
Pass 3: Find min in [14,37,29] β 14 at idx 2. Already in place.
[10, 13, 14, 37, 29]
Pass 4: Find min in [37,29] β 29 at idx 4. Swap with idx 3.
[10, 13, 14, 29, 37]
Result: [10, 13, 14, 29, 37] β
(c) Insertion Sort
Initial: [29, 10, 14, 37, 13]
i=1: key=10. Compare with [29].
10 < 29 β shift 29 right. Insert 10 at pos 0.
[10, 29, 14, 37, 13] sorted: [10, 29]
i=2: key=14. Compare with [10, 29].
14 < 29 β shift 29 right.
14 > 10 β stop. Insert 14 at pos 1.
[10, 14, 29, 37, 13] sorted: [10, 14, 29]
i=3: key=37. Compare with [10, 14, 29].
37 > 29 β stop. Already in place.
[10, 14, 29, 37, 13] sorted: [10, 14, 29, 37]
i=4: key=13. Compare with [10, 14, 29, 37].
13 < 37 β shift 37 right
13 < 29 β shift 29 right
13 < 14 β shift 14 right
13 > 10 β stop. Insert 13 at pos 1.
[10, 13, 14, 29, 37] sorted: [10, 13, 14, 29, 37]
Result: [10, 13, 14, 29, 37] β
(d) Merge Sort
[29, 10, 14, 37, 13]
/ \
[29, 10] [14, 37, 13]
/ \ / \
[29] [10] [14] [37, 13]
/ \
[37] [13]
Merge [29] and [10]:
10 < 29 β [10, 29]
Merge [37] and [13]:
13 < 37 β [13, 37]
Merge [14] and [13, 37]:
L=[14], R=[13, 37]
13 < 14 β take 13
14 < 37 β take 14
Remaining: 37
Result: [13, 14, 37]
Merge [10, 29] and [13, 14, 37]:
L=[10, 29], R=[13, 14, 37]
10 < 13 β take 10 [10]
29 > 13 β take 13 [10, 13]
29 > 14 β take 14 [10, 13, 14]
29 < 37 β take 29 [10, 13, 14, 29]
Remaining: 37 [10, 13, 14, 29, 37]
Result: [10, 13, 14, 29, 37] β
(e) Quick Sort (last element as pivot)
quickSort([29, 10, 14, 37, 13], 0, 4) pivot = 13
i = -1
j=0: 29 > 13 β skip
j=1: 10 β€ 13 β i=0, swap A[0]βA[1] β [10, 29, 14, 37, 13]
j=2: 14 > 13 β skip
j=3: 37 > 13 β skip
Swap A[i+1]βA[4] β swap A[1]βA[4] β [10, 13, 14, 37, 29]
Pivot 13 at index 1.
Left: quickSort([10], 0, 0) β base case [10] β
Right: quickSort([14, 37, 29], 2, 4) pivot = 29
i = 1
j=2: 14 β€ 29 β i=2, swap A[2]βA[2] β [10, 13, 14, 37, 29]
j=3: 37 > 29 β skip
Swap A[i+1]βA[4] β swap A[3]βA[4] β [10, 13, 14, 29, 37]
Pivot 29 at index 3.
Left: quickSort([14], 2, 2) β base case [14] β
Right: quickSort([37], 4, 4) β base case [37] β
Result: [10, 13, 14, 29, 37] β
(f) Heap Sort
Step 1: Build Max-Heap from [29, 10, 14, 37, 13]
Initial tree:
29
/ \
10 14
/ \
37 13
Last non-leaf: index 1 (value 10)
i=1: Heapify(1). Left=37, Right=13. Largest=37 at idx 3. Swap A[1]βA[3].
[29, 37, 14, 10, 13]
Heapify(3): no children in bounds β stop.
i=0: Heapify(0). Left=37, Right=14. Largest=37 at idx 1. Swap A[0]βA[1].
[37, 29, 14, 10, 13]
Heapify(1): Left=10, Right=13. Largest=29 (parent) β stop.
Max-Heap: [37, 29, 14, 10, 13]
37
/ \
29 14
/ \
10 13
Step 2: Extract max repeatedly
Extract 37: Swap A[0]βA[4]. Heapify(0) on size 4.
[13, 29, 14, 10, | 37]
Heapify: 13<29 β swap β [29, 13, 14, 10, | 37]
Heapify(1): 13>10 β stop.
Heap: [29, 13, 14, 10]
Extract 29: Swap A[0]βA[3]. Heapify(0) on size 3.
[10, 13, 14, | 29, 37]
Heapify: 10<14 β swap A[0]βA[2] β [14, 13, 10, | 29, 37]
Heap: [14, 13, 10]
Extract 14: Swap A[0]βA[2]. Heapify(0) on size 2.
[10, 13, | 14, 29, 37]
Heapify: 10<13 β swap β [13, 10, | 14, 29, 37]
Heap: [13, 10]
Extract 13: Swap A[0]βA[1]. Size 1.
[10, | 13, 14, 29, 37]
Result: [10, 13, 14, 29, 37] β
Q8. Given an array of $n$ elements where only one element is out of place. Which sort would you choose and why?
Best choice: Insertion Sort
Why:
When only one element is out of place, the array is nearly sorted. Insertion Sort excels in this scenario:
- Insertion sort iterates through the array. For elements already in their correct position, the inner
whileloop does zero shifts (just one comparison each). - When it reaches the out-of-place element, it shifts it to its correct position (at most $n-1$ shifts for that one element).
- Total work: $O(n)$ β one pass through the array with one element needing multiple shifts.
Example: [1, 2, 3, 9, 4, 5, 6, 7, 8] β 9 is out of place.
i=1: key=2 > 1 β stop (0 shifts)
i=2: key=3 > 2 β stop (0 shifts)
i=3: key=9 > 3 β stop (0 shifts)
i=4: key=4 < 9 β shift 9. 4 > 3 β stop. Insert 4.
[1, 2, 3, 4, 9, 5, 6, 7, 8]
i=5: key=5 < 9 β shift 9. 5 > 4 β stop. Insert 5.
[1, 2, 3, 4, 5, 9, 6, 7, 8]
... and so on, 9 "bubbles" to its correct position with subsequent elements displacing it.
Why not other algorithms?
| Algorithm | Time for 1 element out of place | Problem |
|---|---|---|
| Insertion Sort | $O(n)$ β | Perfect fit |
| Bubble Sort (flag) | $O(n)$ | Also good, but slightly more comparisons |
| Selection Sort | $O(n^2)$ | Always does all comparisons |
| Merge Sort | $O(n \log n)$ | Canβt exploit nearly-sorted structure |
| Quick Sort | $O(n \log n)$ avg | Doesnβt benefit from near-sortedness |
| Heap Sort | $O(n \log n)$ | Canβt exploit nearly-sorted structure |
Q9. Implement Binary Search recursively. What is its space complexity? How does it differ from iterative version?
Recursive Binary Search:
int binarySearchRecursive(int A[], int low, int high, int key) {
// Base case: search space is empty
if (low > high)
return -1;
int mid = low + (high - low) / 2;
if (A[mid] == key)
return mid; // Found!
else if (A[mid] < key)
return binarySearchRecursive(A, mid + 1, high, key); // Search right
else
return binarySearchRecursive(A, low, mid - 1, key); // Search left
}
// Call: binarySearchRecursive(A, 0, n-1, key);
Space Complexity Analysis:
Recursive version: $O(\log n)$
Each recursive call adds a stack frame containing the local variables (low, high, mid, key) and the return address. Since the recursion depth is $\log n$ (the array halves each time), there are at most $\log n$ stack frames simultaneously on the call stack.
Iterative version: $O(1)$
The iterative version uses a while loop with just three variables (low, high, mid). No recursion means no stack frames β constant extra space.
Comparison:
| Aspect | Iterative | Recursive |
|---|---|---|
| Time | $O(\log n)$ | $O(\log n)$ |
| Space | $O(1)$ | $O(\log n)$ β recursion stack |
| Elegance | Less elegant | More mathematically clean |
| Stack overflow risk | None | Possible for very deep recursion (huge $n$) |
| Tail call optimization | N/A | If compiler optimizes tail recursion, space can become $O(1)$ |
| Performance | Slightly faster | Slight function call overhead |
Recommendation: Use the iterative version in practice. It has the same time complexity but uses constant space and avoids function call overhead. The recursive version is useful for understanding the divide-and-conquer paradigm.
Q10. A company needs to sort 10 million records. The data has a limited range of keys (0 to 999). Which sorting algorithm would you recommend and why?
Recommended: Counting Sort β
Why:
Given:
- $n = 10{,}000{,}000$ (10 million records)
- $k = 999$ (key range 0 to 999)
Counting Sort complexity: $O(n + k) = O(10{,}000{,}000 + 999) \approx O(n)$
Comparison-based sorts: $O(n \log n) = O(10{,}000{,}000 \times \log_2 10{,}000{,}000) \approx O(10{,}000{,}000 \times 23.25) \approx O(232{,}500{,}000)$
Counting Sort: $O(10{,}000{,}000 + 999) \approx O(10{,}001{,}000)$
Speedup: $\frac{232{,}500{,}000}{10{,}001{,}000} \approx 23\times$ faster!
Why Counting Sort is ideal here:
-
Key range is small: $k = 999 \ll n = 10^7$. The condition $k = O(n)$ is satisfied, so counting sort runs in linear time.
-
Extra space is manageable: We need a count array of size 1000 and an output array of size $n$. The count array is trivially small (1000 integers = ~4 KB). The output array needs $O(n)$ space, which is the same as the input.
-
Stability: Counting sort is stable, which is important if the records have secondary fields that should maintain their relative order.
-
Simple implementation: No complex divide-and-conquer or tree structures needed.
Why NOT other algorithms?
| Algorithm | Time for this case | Issue |
|---|---|---|
| Counting Sort | $O(n)$ β $10^7$ β | Best choice |
| Quick Sort | $O(n \log n)$ β $2.3 \times 10^8$ | 23Γ slower |
| Merge Sort | $O(n \log n)$ β $2.3 \times 10^8$ | Also needs $O(n)$ extra space |
| Heap Sort | $O(n \log n)$ β $2.3 \times 10^8$ | Poor cache performance on large data |
| Radix Sort | $O(d \times (n + kβ))$ | Also good; uses counting sort as subroutine |
Counting sort wins because $k \ll n$, making it effectively linear time. If the key range were $k = 10^9$ (much larger than $n$), counting sort would be impractical (too much memory), and Quick Sort or Merge Sort would be better choices.
Long Answer
Q11. Compare Merge Sort and Quick Sort comprehensively β algorithm, dry run, time/space complexity, stability, and use cases.
Merge Sort
Algorithm: Divide and Conquer β split the array into halves, recursively sort each half, then merge the two sorted halves.
void merge(int A[], int left, int mid, int right) {
int n1 = mid - left + 1, n2 = right - mid;
int L[n1], R[n2];
for (int i = 0; i < n1; i++) L[i] = A[left + i];
for (int j = 0; j < n2; j++) R[j] = A[mid + 1 + j];
int i = 0, j = 0, k = left;
while (i < n1 && j < n2)
A[k++] = (L[i] <= R[j]) ? L[i++] : R[j++];
while (i < n1) A[k++] = L[i++];
while (j < n2) A[k++] = R[j++];
}
void mergeSort(int A[], int left, int right) {
if (left < right) {
int mid = left + (right - left) / 2;
mergeSort(A, left, mid);
mergeSort(A, mid + 1, right);
merge(A, left, mid, right);
}
}
Dry Run on [5, 3, 8, 1]:
[5, 3, 8, 1] β split β [5, 3] and [8, 1]
[5, 3] β split β [5] and [3] β merge β [3, 5]
[8, 1] β split β [8] and [1] β merge β [1, 8]
Merge [3, 5] and [1, 8]:
1 < 3 β [1]
3 < 8 β [1, 3]
5 < 8 β [1, 3, 5]
Remaining: 8 β [1, 3, 5, 8] β
Quick Sort
Algorithm: Divide and Conquer β choose a pivot, partition elements into β€pivot and >pivot groups, then recursively sort each group.
int partition(int A[], int low, int high) {
int pivot = A[high], i = low - 1;
for (int j = low; j < high; j++) {
if (A[j] <= pivot) {
i++;
int temp = A[i]; A[i] = A[j]; A[j] = temp;
}
}
int temp = A[i+1]; A[i+1] = A[high]; A[high] = temp;
return i + 1;
}
void quickSort(int A[], int low, int high) {
if (low < high) {
int pi = partition(A, low, high);
quickSort(A, low, pi - 1);
quickSort(A, pi + 1, high);
}
}
Dry Run on [5, 3, 8, 1], pivot = 1:
Partition with pivot=1 (A[3]):
i=-1
j=0: 5>1 skip; j=1: 3>1 skip; j=2: 8>1 skip
Swap A[0]βA[3] β [1, 3, 8, 5] pivot at index 0
Left: [] (empty)
Right: [3, 8, 5], pivot=5
j=1: 3β€5 β i=1, swap A[1]βA[1] (no change)
j=2: 8>5 skip
Swap A[2]βA[3] β [1, 3, 5, 8] pivot at index 2
Left: [3] β Right: [8] β
Result: [1, 3, 5, 8] β
Comprehensive Comparison
| Aspect | Merge Sort | Quick Sort |
|---|---|---|
| Strategy | Divide β Recurse β Merge | Partition β Recurse |
| When work happens | During merge (combining) | During partition (dividing) |
| Best Case | $O(n \log n)$ | $O(n \log n)$ |
| Average Case | $O(n \log n)$ | $O(n \log n)$ |
| Worst Case | $O(n \log n)$ β | $O(n^2)$ β |
| Space | $O(n)$ β auxiliary arrays | $O(\log n)$ β stack only |
| Stable | Yes β | No β |
| In-Place | No β | Yes β |
| Cache Performance | Poor (copies to temp arrays) | Excellent (in-place, sequential) |
| Parallelizable | Yes | Yes (but less balanced partitions) |
| Adaptive | No (same time regardless) | No (unless modified) |
Recurrences:
| Β | Merge Sort | Quick Sort |
|---|---|---|
| Best/Avg | $T(n) = 2T(n/2) + O(n)$ | $T(n) = 2T(n/2) + O(n)$ |
| Worst | Same β $O(n \log n)$ | $T(n) = T(n-1) + O(n)$ β $O(n^2)$ |
Use Cases:
| Scenario | Better Choice | Reason |
|---|---|---|
| General in-memory sorting | Quick Sort | Faster in practice (cache, constant factor) |
| Stability is needed | Merge Sort | Only stable $O(n \log n)$ sort |
| Worst-case guarantee needed | Merge Sort | Always $O(n \log n)$ |
| Memory is limited | Quick Sort | $O(\log n)$ vs $O(n)$ space |
| Sorting linked lists | Merge Sort | No random access needed |
| External sorting (disk) | Merge Sort | Sequential access pattern |
| Large arrays of primitives | Quick Sort | Best practical performance |
Q12. Explain Heap Sort completely β what is a heap, how to build a max-heap, how the sorting works, with a full example. Analyze its complexity.
What is a Heap?
A heap is a complete binary tree that satisfies the heap property:
- Max-Heap: Every parent node is β₯ both its children. The root holds the maximum.
- Min-Heap: Every parent node is β€ both its children. The root holds the minimum.
Complete binary tree: All levels are fully filled except possibly the last, which is filled left to right.
Array representation: A heap is stored efficiently in an array (0-indexed):
- Left child of node $i$: $2i + 1$
- Right child of node $i$: $2i + 2$
- Parent of node $i$: $\lfloor(i-1)/2\rfloor$
Heapify β Fixing a Single Violation
heapify(A, n, i) assumes the subtrees rooted at left(i) and right(i) are already valid max-heaps, but A[i] might violate the max-heap property. It fixes this by βsifting downβ the element.
void heapify(int A[], int n, int i) {
int largest = i;
int left = 2 * i + 1;
int right = 2 * i + 2;
if (left < n && A[left] > A[largest])
largest = left;
if (right < n && A[right] > A[largest])
largest = right;
if (largest != i) {
int temp = A[i]; A[i] = A[largest]; A[largest] = temp;
heapify(A, n, largest); // Recursively fix the affected subtree
}
}
Time: $O(\log n)$ β at most one swap per level, and the tree has $\log n$ levels.
Building the Max-Heap
// Start from the last non-leaf node and heapify each node bottom-up
for (int i = n / 2 - 1; i >= 0; i--)
heapify(A, n, i);
Why start at n/2 - 1? Nodes at indices $\lfloor n/2 \rfloor$ to $n-1$ are leaves β they have no children and trivially satisfy the heap property.
Time for build: $O(n)$ (not $O(n \log n)$!) β because most nodes are near the bottom and need little work.
Heap Sort Algorithm
void heapSort(int A[], int n) {
// Phase 1: Build Max-Heap
for (int i = n / 2 - 1; i >= 0; i--)
heapify(A, n, i);
// Phase 2: Extract max repeatedly
for (int i = n - 1; i > 0; i--) {
int temp = A[0]; A[0] = A[i]; A[i] = temp; // Move max to end
heapify(A, i, 0); // Restore heap on reduced array
}
}
How it works:
- Build a max-heap β the largest element is at
A[0](root). - Swap
A[0](max) withA[n-1](last). Now the largest is at its final sorted position. - Reduce the heap size by 1 and heapify the root to restore the max-heap.
- Repeat until heap has size 1.
Full Example: Sort [7, 3, 5, 1, 9, 2]
Phase 1: Build Max-Heap
Initial array: [7, 3, 5, 1, 9, 2] (n = 6)
Tree: 7
/ \
3 5
/ \ /
1 9 2
Last non-leaf: index 2 (n/2 - 1 = 2)
i=2: Heapify(2). Node=5, Left=2. 5 > 2 β no swap.
[7, 3, 5, 1, 9, 2]
i=1: Heapify(1). Node=3, Left=1, Right=9. Largest=9 at idx 4.
Swap A[1]βA[4] β [7, 9, 5, 1, 3, 2]
Heapify(4): no children β stop.
i=0: Heapify(0). Node=7, Left=9, Right=5. Largest=9 at idx 1.
Swap A[0]βA[1] β [9, 7, 5, 1, 3, 2]
Heapify(1): Left=1, Right=3. Largest=7 (parent) β stop.
Max-Heap: [9, 7, 5, 1, 3, 2]
9
/ \
7 5
/ \ /
1 3 2
Phase 2: Sort by extracting max
Step 1: Swap A[0]=9 β A[5]=2. Heapify(0, size=5).
[2, 7, 5, 1, 3, | 9]
Heapify: 2<7 β swap(0,1) β [7, 2, 5, 1, 3, | 9]
Heapify(1): 2<3 β swap(1,4) β [7, 3, 5, 1, 2, | 9]
Heap: [7, 3, 5, 1, 2]
Step 2: Swap A[0]=7 β A[4]=2. Heapify(0, size=4).
[2, 3, 5, 1, | 7, 9]
Heapify: 2<5 β swap(0,2) β [5, 3, 2, 1, | 7, 9]
Heap: [5, 3, 2, 1]
Step 3: Swap A[0]=5 β A[3]=1. Heapify(0, size=3).
[1, 3, 2, | 5, 7, 9]
Heapify: 1<3 β swap(0,1) β [3, 1, 2, | 5, 7, 9]
Heap: [3, 1, 2]
Step 4: Swap A[0]=3 β A[2]=2. Heapify(0, size=2).
[2, 1, | 3, 5, 7, 9]
Heapify: 2>1 β no swap.
Heap: [2, 1]
Step 5: Swap A[0]=2 β A[1]=1. Size=1 β done.
[1, | 2, 3, 5, 7, 9]
Result: [1, 2, 3, 5, 7, 9] β
Complexity Analysis
| Aspect | Value | Explanation |
|---|---|---|
| Build heap | $O(n)$ | Most nodes are near the bottom; total work sums to $O(n)$ |
| Extract phase | $O(n \log n)$ | $n-1$ extractions, each requiring $O(\log n)$ heapify |
| Total Time | $O(n \log n)$ | Dominated by extraction phase |
| Best Case | $O(n \log n)$ | Same regardless of input |
| Worst Case | $O(n \log n)$ | Same β no degradation |
| Average Case | $O(n \log n)$ | Same |
| Space | $O(1)$ | In-place β only swaps within the array |
| Stable | No | Swapping root to end can change relative order of equal elements |
Advantages:
- Guaranteed $O(n \log n)$ β no worst-case degradation like Quick Sort
- In-place β $O(1)$ extra space, unlike Merge Sortβs $O(n)$
- Combines the best of Merge Sort (guaranteed performance) and Quick Sort (in-place)
Disadvantages:
- Not stable β canβt preserve relative order of equal elements
- Poor cache performance β parent-child relationships jump around in memory
- Slower in practice than Quick Sort due to cache misses and larger constant factor
Q13. Explain Counting Sort. Why is it $O(n + k)$? When is it better than comparison-based sorts? When is it impractical?
Algorithm
Counting Sort works by counting occurrences of each distinct value, then using those counts to place elements directly in their correct positions.
void countingSort(int A[], int n, int k) {
int count[k + 1]; // Count array for values 0..k
int output[n]; // Output array
// Phase 1: Initialize counts to 0
for (int i = 0; i <= k; i++) count[i] = 0;
// Phase 2: Count occurrences of each value
for (int i = 0; i < n; i++) count[A[i]]++;
// Phase 3: Compute cumulative (prefix) sums
for (int i = 1; i <= k; i++) count[i] += count[i - 1];
// Phase 4: Build output array (right-to-left for stability)
for (int i = n - 1; i >= 0; i--) {
output[count[A[i]] - 1] = A[i];
count[A[i]]--;
}
// Phase 5: Copy output back to original array
for (int i = 0; i < n; i++) A[i] = output[i];
}
Step-by-Step Explanation
Example: Sort [4, 2, 2, 8, 3, 3, 1], $k = 8$
Phase 1 β Initialize: count = [0, 0, 0, 0, 0, 0, 0, 0, 0]
Phase 2 β Count occurrences:
A[0]=4 β count[4]++ count = [0, 1, 0, 0, 1, 0, 0, 0, 0]
...after all elements: count = [0, 1, 2, 2, 1, 0, 0, 0, 1]
(0 zeros, 1 one, 2 twos, 2 threes, 1 four, 0 fives/sixes/sevens, 1 eight)
Phase 3 β Prefix sums:
count[i] += count[i-1]:
count = [0, 1, 3, 5, 6, 6, 6, 6, 7]
Meaning: count[v] = number of elements β€ v
β€0: 0 elements, β€1: 1 element, β€2: 3 elements, ..., β€8: 7 elements
Phase 4 β Place elements (right-to-left):
i=6: A[6]=2 β pos=count[2]-1=2 β output[2]=2, count[2]=2
i=5: A[5]=5 β pos=count[5]-1=5 β output[5]=5, count[5]=5
i=4: A[4]=7 β pos=count[7]-1=5 β output[5]... wait, let me redo with actual values.
(Using the actual array [4, 2, 2, 8, 3, 3, 1])
After prefix: count = [0, 1, 3, 5, 6, 6, 6, 6, 7]
i=6: A[6]=1 β pos = count[1]-1 = 0 β output[0]=1, count[1]=0
i=5: A[5]=3 β pos = count[3]-1 = 4 β output[4]=3, count[3]=4
i=4: A[4]=3 β pos = count[3]-1 = 3 β output[3]=3, count[3]=3
i=3: A[3]=8 β pos = count[8]-1 = 6 β output[6]=8, count[8]=6
i=2: A[2]=2 β pos = count[2]-1 = 2 β output[2]=2, count[2]=2
i=1: A[1]=2 β pos = count[2]-1 = 1 β output[1]=2, count[2]=1
i=0: A[0]=4 β pos = count[4]-1 = 5 β output[5]=4, count[4]=5
Output: [1, 2, 2, 3, 3, 4, 8] β
Why is it $O(n + k)$?
| Phase | Operations | Time |
|---|---|---|
| Initialize count array | $k + 1$ writes | $O(k)$ |
| Count occurrences | $n$ reads/writes | $O(n)$ |
| Compute prefix sums | $k$ additions | $O(k)$ |
| Build output | $n$ reads/writes | $O(n)$ |
| Copy back | $n$ copies | $O(n)$ |
| Total | Β | $O(n + k)$ |
Key insight: There are no nested loops. Each phase scans either the input array ($O(n)$) or the count array ($O(k)$) once. The sum is $O(n + k)$.
It is linear when $k = O(n)$ β then $O(n + k) = O(n)$.
When is Counting Sort better than comparison-based sorts?
Counting Sort wins when $k = O(n)$:
| Scenario | Comparison Sort | Counting Sort | Winner |
|---|---|---|---|
| $n = 10^6$, $k = 1000$ | $O(n \log n) \approx 2 \times 10^7$ | $O(n + k) \approx 10^6$ | Counting Sort (20Γ) |
| $n = 10^6$, $k = 10^6$ | $O(n \log n) \approx 2 \times 10^7$ | $O(n + k) = 2 \times 10^6$ | Counting Sort (10Γ) |
| $n = 10^6$, $k = 10^9$ | $O(n \log n) \approx 2 \times 10^7$ | $O(n + k) \approx 10^9$ | Comparison Sort |
Rule of thumb: Use counting sort when $k \leq n$ (or $k = O(n)$).
Specific scenarios where counting sort shines:
- Sorting exam scores (0β100) for a class of 1000 students: $k = 100, n = 1000$
- Sorting ages (0β150) of a population: $k = 150, n$ can be millions
- Sorting characters by ASCII value (0β127): $k = 127$
- As a subroutine in Radix Sort (sorting digit by digit)
When is Counting Sort impractical?
- Large key range ($k \gg n$):
- Sorting 100 numbers in range $[0, 10^9]$: the count array would need $10^9$ entries (4 GB memory!) for just 100 numbers. A comparison sort at $O(n \log n) = O(100 \times 7) = 700$ operations is vastly better.
- Floating-point keys:
- Counting sort requires integer keys (or keys that can be mapped to a finite integer range). Floating-point numbers have too many possible values (effectively infinite precision).
- Negative keys (without modification):
- The basic algorithm assumes keys in $[0, k]$. Negative keys require offsetting all values by the minimum, adding complexity.
- Keys are complex objects:
- If keys are strings, composite objects, or non-numeric, counting sort doesnβt directly apply. Youβd need a mapping to integers first.
- Memory constraints:
- Even when $k$ is reasonable, the extra space $O(n + k)$ might be unacceptable in memory-constrained environments. The output array alone is $O(n)$.
Summary:
\[\text{Use Counting Sort when: } k = O(n) \text{ and keys are non-negative integers}\] \[\text{Use Comparison Sort when: } k \gg n \text{ or keys are non-integer}\]