DIRSTUDY | Data Structures

Arrays & Memory Addressing

1. 2D Array Address Calculation

A 2D array physically exists as a contiguous 1D block of memory. To map 2D coordinates A[i][j] to a 1D memory address, the compiler uses either Row-Major Order (storing row by row) or Column-Major Order (storing column by column).

Row-Major Order Formula

To find the address of element A[i][j] in a matrix with N columns:

$$\text{Address} = \text{Base} + W \times [(i - L_r) \times N + (j - L_c)]$$

Base: Starting address of the array.
W: Size of each element (e.g., 2 bytes for int).
N: Total number of columns.
$L_r, L_c$: Lower bounds of row/col (Usually 0 in C/C++).

Column-Major Order Formula

To find the address of element A[i][j] in a matrix with M rows:

$$\text{Address} = \text{Base} + W \times [(j - L_c) \times M + (i - L_r)]$$

Example Trace: Calculate address of A[3][4] in a 5x5 matrix (M=5). Base = 1000, W = 2. Assume 0-based indexing ($L_r=0, L_c=0$).

Add = 1000 + 2 * [4 * 5 + 3]
Add = 1000 + 2 * [23] = 1046

Trees & Linked Lists

2. Binary Search Trees (BST) & AVL Trees

Binary Search Tree (BST): A binary tree where the left child contains only nodes with values less than the parent node, and the right child only contains nodes with values greater than the parent node.

The Problem with BSTs & Introduction to AVL

While the average time complexity for searching a BST is $O(\log n)$, if we insert data that is already sorted (e.g., 1, 2, 3, 4, 5), the tree becomes completely right-skewed. It essentially degrades into a Linked List, making the worst-case search time $O(n)$.

AVL Trees solve this by enforcing a strict Height-Balanced Property. The Balance Factor (Height of Left Subtree - Height of Right Subtree) of EVERY node must be strictly -1, 0, or +1. If an insertion violates this, the tree performs Rotations to self-balance.

LL Rotation (Right Rotation): Used when a node is inserted into the left sub-tree of the left child of an unbalanced node. The unbalanced node is pulled down to the right.

RR Rotation (Left Rotation): Used when a node is inserted into the right sub-tree of the right child. The unbalanced node is pulled down to the left.

LR Rotation: A double rotation. First, a Left Rotation on the left child, followed by a Right Rotation on the unbalanced root.

RL Rotation: A double rotation. First, a Right Rotation on the right child, followed by a Left Rotation on the unbalanced root.

3. Singly Linked List (SLL) Structure & Advantages

Unlike arrays which demand contiguous blocks of memory, a Singly Linked List utilizes Dynamic Memory Allocation. Memory is assigned bit-by-bit at runtime using pointers, scattering nodes across the RAM.

Node Structure: Contains a Data field to hold the value, and a Next pointer field to store the explicit memory address of the subsequent node. The final node points to NULL.
Head Pointer: A special external pointer that holds the address of the very first node. If Head is NULL, the list is empty.

Advantages Over Arrays:

No Memory Wastage: Arrays require pre-declaring size (e.g., int arr[100]). If you use only 5 slots, 95 are wasted. SLLs only consume memory for exactly what they store.
O(1) Insertions: Inserting a node at the front (or between known nodes) requires simply updating two pointer addresses. Arrays require an $O(n)$ operation to shift every subsequent element down one slot.

Stacks & Queues

4. In-Depth Applications of Stack & Infix to Prefix Evaluation

Why use a Stack (LIFO)?

A stack operates on the Last-In, First-Out principle. This makes it mathematically perfect for reversing sequences or remembering "where we came from".

Call Stack (Recursion): When function A calls function B, the OS suspends A and pushes its state/memory onto the stack. When B finishes (pops off), the OS resumes A exactly where it left off.
Syntax Parsing: Compilers read code character by character. When seeing an open bracket {, it is pushed to the stack. A closing bracket } pops it, ensuring balance.
Depth-First Search (DFS): Graph traversal algorithms use stacks to explore as deep as possible before backtracking.

Step-by-Step: Infix to Prefix

Convert the expression: (A - B) * (D / E)

Rule: Process inner parentheses first. Move the operator before the operands.

Process (A - B)
→ - A B
Process (D / E)
→ / D E
Substitute these partial prefixes back into the main equation:
[- A B] * [/ D E]
Now treat the brackets as single operands. Move the main * operator to the very front:

* - A B / D E

5. Why Circular Queues over Linear Queues?

In a standard Array-based Linear Queue, elements are inserted at the Rear and removed from the Front. If you enqueue 5 items and dequeue 3, the Front pointer moves forward, leaving 3 empty memory slots at the beginning of the array. However, if the Rear pointer has reached the end of the array (N-1), the queue reports an "Overflow" error, preventing new insertions, even though memory is completely available at the front.

A Circular Queue solves this memory wastage mathematically by linking the last index back to the 0th index using the modulo operator: Rear = (Rear + 1) % MaxSize. It allows infinite reuse of dequeued slots.

Sorting & Searching

6. Selection Sort: Algorithm, Trace & Complexity Proof

Selection Sort divides the array into a sorted and an unsorted part. It repeatedly scans the unsorted part to select the absolute minimum element, and swaps it with the first unsorted element.

Algorithm: SelectionSort(A, n)

For i = 0 to n-2 do:
    // Assume current element is minimum
    min_idx = i
    
    // Scan the rest of the array
    For j = i+1 to n-1 do:
        If A[j] < A[min_idx] then
            min_idx = j
        End If
    End For
    
    // Swap found minimum with element at i
    If min_idx != i then
        Swap A[i] with A[min_idx]
    End If
End For

Time Complexity Proof

Best, Average & Worst Case: $O(n^2)$

Why is Best Case $O(n^2)$? Unlike Bubble or Insertion sort, Selection Sort is "dumb". Even if you pass an array that is already perfectly sorted (e.g., [1,2,3,4]), the algorithm has no way of knowing. The outer loop runs $n$ times, and the inner loop always forcefully scans the remaining $(n-i)$ elements to verify if a smaller number exists. The total comparisons are always $\frac{n(n-1)}{2}$, making it strictly $O(n^2)$ in all scenarios.

Array Trace Example: Sort [64, 25, 12, 22, 11]

Pass (i)	Sub-array being searched	Minimum Found	Swap Action	Array State
Initial	-	-	-	[64, 25, 12, 22, 11]
i = 0	64, 25, 12, 22, 11	11	Swap 64 and 11	11, 25, 12, 22, 64
i = 1	25, 12, 22, 64	12	Swap 25 and 12	11, 12, 25, 22, 64
i = 2	25, 22, 64	22	Swap 25 and 22	11, 12, 22, 25, 64
i = 3	25, 64	25	No swap needed	11, 12, 22, 25, 64

Graph Theory & Traversals

7. Graph Traversals: BFS vs DFS

Traversing a graph means visiting every single vertex exactly once. Because graphs can contain cycles (loops), we must keep track of "Visited" nodes to prevent infinite loops.

Breadth-First Search (BFS)

Explores the graph level by level. It visits all direct neighbors of the starting node first, then moves to the neighbors of the neighbors.

Data Structure Used: Queue (FIFO)
Logic: Enqueue the start node. Dequeue it, mark it visited, and Enqueue all its unvisited adjacent nodes. Repeat until Queue is empty.
Applications: Finding the shortest path in unweighted graphs, peer-to-peer networking (BitTorrent), GPS navigation systems finding nearby locations.

Depth-First Search (DFS)

Explores the graph by going as deep as possible along each branch before backtracking. It follows a path until it hits a dead-end.

Data Structure Used: Stack (LIFO) (usually via Recursion)
Logic: Push the start node. Pop it, mark visited, and Push all unvisited neighbors to the stack. Repeat.
Applications: Solving mazes, topological sorting, finding Strongly Connected Components, detecting cycles in a graph.

8. Fundamental Graph Definitions

Isolated Vertex

A vertex with a degree of exactly 0. Mathematically, it does not belong to any edge set $E$.

Pendant Vertex

A vertex with a degree of exactly 1. Also known as a "leaf node" in tree terminologies.

Directed Graph

A graph where edges have a direction. Degree is split into In-Degree (arrows arriving) and Out-Degree (arrows leaving).

Hashing Techniques

9. Hash Collisions, Load Factor & Resolution Techniques

Hashing aims to map huge keys (like 10-digit phone numbers) into small array indices (like 0 to 99) using a mathematical hash function, allowing $O(1)$ constant time lookup. A Hash Collision occurs when the hash function computes the exact same array index for two completely different keys.

Collision Resolution: Open Addressing

Instead of attaching a linked list, we keep all data strictly inside the hash table array. If a collision happens at index H, we probe (search) for the next empty slot using a mathematical sequence.

Linear Probing: Check slots sequentially: H+1, H+2, H+3.... This causes "Primary Clustering" where blocks of filled slots merge together, slowing down searches.
Quadratic Probing: Check slots squarely: H+1², H+2², H+3².... This eliminates primary clustering by jumping further away.
Double Hashing: Uses a second, entirely different hash function to determine the jump step size. Reduces all clustering issues.

Load Factor ($\alpha$) & Rehashing

The load factor measures the density of the hash table.

$$ \alpha = \frac{\text{Number of elements stored (n)}}{\text{Total size of Hash Table (m)}} $$

Significance: As $\alpha$ increases towards 1 (meaning the table is getting full), collisions increase exponentially. The average search time degrades from $O(1)$ to $O(n)$.

Rehashing: To maintain performance, dynamic hash tables set a threshold (usually $\alpha = 0.75$). If the load factor exceeds this, a new array double the size is allocated, and all existing elements are mathematically rehashed into the new, sparser array.