Local Search - A Detailed Guide For Beginners

Welcome to local search – a completely different way of thinking about problem-solving! Unlike the systematic search methods we’ve seen (BFS, DFS, A*), local search is all about improving what you have rather than building paths from scratch.

PART 1: TWO TYPES OF PROBLEMS – A FUNDAMENTAL DISTINCTION

Type 1: Planning Problems (What We’ve Seen So Far)

We want a PATH to the solution

Characteristics:

The sequence of actions matters
Need to know HOW to get there
Usually want optimal path (cheapest, shortest, fastest)

Examples:

Route finding: “How do I get from Arad to Bucharest?” (the path itself matters)
8-puzzle: “What sequence of moves solves the puzzle?”
Chess: “What sequence of moves leads to checkmate?”

Formulation: Incremental – build solution step by step

Type 2: Identification Problems (Local Search Territory)

We just want the GOAL STATE itself

Characteristics:

The path doesn’t matter at all
Only the final configuration matters
Usually want optimal goal (best configuration)

Examples:

N-Queens: “Arrange 8 queens so none attack each other” (who cares HOW you placed them?)
Circuit layout: “Design the optimal chip layout” (only the final design matters)
Scheduling: “Create the best class schedule” (don’t care how you arrived at it)
Protein folding: “Find the lowest-energy configuration” (just need the final shape)

Formulation: Complete-state – start with a complete configuration and improve it

PART 2: WHAT IS LOCAL SEARCH?

The Core Idea

Imagine you’re hiking in the mountains at night. You can’t see the whole landscape, but you can feel the ground beneath your feet. You want to reach the highest peak. What do you do?

Feel around you
Take a step in the direction that feels uphill
Repeat until you can’t go any higher

That’s local search!

Key Characteristics

Single current state – Only keep track of where you are now
No path memory – Don’t remember how you got there
Move to neighbors – Only consider states similar to current
Very memory efficient – O(1) space complexity!
No backtracking – Once you move, you never go back

When to Use Local Search

✅ The goal state itself is the solution (path irrelevant)
✅ State space is huge (systematic search impossible)
✅ Need a “good enough” solution quickly
✅ Memory is limited
✅ Problem is an optimization task

❌ Need guaranteed optimal solution
❌ Need the actual path to solution
❌ Problem has few solutions (might not find them)

PART 3: THE N-QUEENS PROBLEM – PERFECT EXAMPLE

Problem Definition

Place N queens on an N×N chessboard so that no two queens attack each other.

Attacks happen if:

Same row
Same column
Same diagonal

Two Ways to Formulate

Incremental (Traditional Search):

Start with empty board
Add queens one by one
Path = sequence of placements
Need to backtrack when stuck

Complete-state (Local Search):

Start with all queens on board (one per column)
Move queens to reduce conflicts
Path irrelevant – just want final arrangement

Why Local Search Works for N-Queens

Many solutions – For 8-queens, there are 92 solutions
Solutions well-distributed – Scattered throughout state space
Obvious improvement metric – Number of attacking pairs
Simple moves – Move one queen in its column

PART 4: HILL CLIMBING – THE FOUNDATION

The Basic Algorithm

def hill_climbing(initial_state, evaluate, get_neighbors):
    current = initial_state
    current_value = evaluate(current)

    while True:
        # Find the best neighbor
        neighbors = get_neighbors(current)
        best_neighbor = None
        best_value = current_value

        for neighbor in neighbors:
            value = evaluate(neighbor)
            if value > best_value:  # Looking for maximum
                best_value = value
                best_neighbor = neighbor

        # If no better neighbor, we're done
        if best_neighbor is None:
            return current, current_value

        # Move to the best neighbor
        current = best_neighbor
        current_value = best_value

Analogy: Finding the Highest Peak

Imagine you’re in a mountain range at night:

Current state: Where you’re standing
Evaluate: Your altitude
Neighbors: Where you can step to
Move: Step to highest neighboring point
Terminate: When all neighbors are lower

Hill Climbing on 8-Queens

State Representation:

One queen in each column (positions 1-8)
Example: [4, 2, 7, 3, 6, 8, 5, 1] means:
Col 1: queen at row 4
Col 2: queen at row 2
etc.

Evaluation Function:

Count number of attacking pairs
Lower is better (0 = solution)
Or use negative: -attacks (higher is better)

Neighbors:

Move one queen to a different row in its column
For 8-queens: 7 alternatives × 8 columns = 56 neighbors

Step-by-Step Example

Start: [4,2,7,3,6,8,5,1] with 5 attacking pairs

Check all 56 neighbors:
- Move queen 1 to row 1: [1,2,7,3,6,8,5,1] → 7 attacks (worse)
- Move queen 1 to row 2: [2,2,7,3,6,8,5,1] → 6 attacks (worse)
- Move queen 1 to row 3: [3,2,7,3,6,8,5,1] → 4 attacks (better!)
- Move queen 1 to row 5: [5,2,7,3,6,8,5,1] → 8 attacks (worse)
...

Best neighbor: [3,2,7,3,6,8,5,1] with 4 attacks

Move there and repeat...

PART 5: HILL CLIMBING VARIANTS

1. Simple Hill Climbing

Always move to the first better neighbor found

Fast (don’t check all neighbors)
Might miss the best direction
Like taking the first path that goes up

2. Steepest-Ascent Hill Climbing

Check ALL neighbors, move to the absolute best

More thorough
More expensive per step
What we implemented above

3. Stochastic Hill Climbing

Choose randomly from uphill moves, weighted by improvement

def stochastic_hill_climbing(current, evaluate, get_neighbors):
    while True:
        # Get all neighbors that are better
        neighbors = get_neighbors(current)
        better_neighbors = []
        weights = []

        current_val = evaluate(current)

        for n in neighbors:
            val = evaluate(n)
            if val > current_val:
                better_neighbors.append(n)
                weights.append(val - current_val)  # Improvement amount

        if not better_neighbors:
            return current

        # Choose randomly weighted by improvement
        chosen = random.choices(better_neighbors, weights=weights)[0]
        current = chosen

4. First-Choice Hill Climbing

Generate neighbors randomly until finding a better one

Good when state space has many neighbors
Avoids generating all neighbors

def first_choice_hill_climbing(current, evaluate, generate_random_neighbor):
    current_val = evaluate(current)

    while True:
        # Keep trying random neighbors
        for _ in range(max_attempts):
            neighbor = generate_random_neighbor(current)
            neighbor_val = evaluate(neighbor)

            if neighbor_val > current_val:
                return neighbor  # Found a better one

        # If no better found after many attempts, we're stuck
        return current

5. Random-Restart Hill Climbing

The big idea: if you get stuck, start over somewhere else!

def random_restart_hill_climbing(generate_random_state, evaluate, 
                                  get_neighbors, num_restarts):
    best_state = None
    best_value = -infinity

    for _ in range(num_restarts):
        # Start from random initial state
        start = generate_random_state()
        final_state, final_value = hill_climbing(start, evaluate, get_neighbors)

        # Keep track of the best we've seen
        if final_value > best_value:
            best_state = final_state
            best_value = final_value

    return best_state, best_value

PART 6: THE PROBLEMS WITH HILL CLIMBING

Problem 1: Local Optima

                    /\peak (global max)
                   /  \
                  /    \
                 /      \
                /        \/\ local max
               /          \  \
              /            \  \
             /              \  \
            /                \  \
Start →→→→→→→→→→→→→→→→→→→→→→→→

What happens:

Hill climbing goes up to the first peak it finds
Gets stuck at local maximum
Can’t see that there’s a higher peak elsewhere

Real-world analogy:
You’re hiking at night. You reach a small hill and can’t go higher from there. But there’s a mountain nearby you never saw!

Problem 2: Plateaus

__________
          \
           \
            \
             \
              \__________

What happens:

All neighbors have same value
No direction to move
Like being on a flat plain

Types of plateaus:

Flat local optimum: Truly stuck
Shoulder: Can move sideways to eventually go up again
Ridge: Narrow path upward surrounded by down slopes

Problem 3: Ridges

        ↑
        |
   ←────┼────→
        |
        ↓

What happens:

All immediate moves go down
But going diagonally would go up
Problem: diagonal isn’t a “neighbor” in your definition

Problem 4: Evaluation Function Issues

The evaluation function might not accurately reflect progress:

In 8-queens, reducing attacks doesn’t always lead to solution
Can get stuck in states where any move increases attacks
Need to allow “sideways moves” (no improvement) occasionally

PART 7: THE 8-QUEENS EXPERIMENT – EYE-OPENING RESULTS

Experiment 1: No Sideways Moves

Success rate: 14% (0.14)
Average moves when succeeding: 4
Average moves when stuck: 3

Expected total moves needed:
3(1-0.14)/0.14 + 4 = 3(0.86)/0.14 + 4
= 3(6.14) + 4 = 18.42 + 4 ≈ 22 moves

Interpretation:

Only works 14% of the time
When it works, it’s fast (4 moves)
But mostly fails quickly (3 moves then stuck)

Experiment 2: With 100 Sideways Moves Allowed

Success rate: 94% (0.94)
Average moves when succeeding: 21
Average moves when stuck: 65

Expected total moves needed:
65(1-0.94)/0.94 + 21 = 65(0.06)/0.94 + 21
= 65(0.064) + 21 = 4.16 + 21 ≈ 25 moves

Interpretation:

Much higher success rate (94% vs 14%)
Takes longer when succeeding (21 vs 4 moves)
When fails, explores much longer (65 moves)
Overall expected moves only slightly higher (25 vs 22)

Key Insight: Allowing sideways moves dramatically improves success rate with modest increase in runtime!

PART 8: MIN-CONFLICTS HEURISTIC – THE N-QUEENS SECRET SAUCE

The Algorithm

def min_conflicts_nqueens(n, max_steps):
    # Initialize: one queen per column, random rows
    current = [random.randint(1, n) for _ in range(n)]

    for step in range(max_steps):
        # Count conflicts for each queen
        if count_conflicts(current) == 0:
            return current  # Solution found!

        # Pick a random conflicted queen
        conflicted_queens = get_conflicted_queens(current)
        queen = random.choice(conflicted_queens)

        # Find the row in this column with minimum conflicts
        best_row = find_min_conflict_row(current, queen)

        # Move the queen there
        current[queen] = best_row

    return None  # Failed within max_steps

Why It Works So Well

Informed moves: Not random – moves to minimize conflicts
Focus on problems: Only move queens that are attacked
Randomization: Avoids getting trapped in cycles
Restart strategy: If no solution in ~100 steps, start over

Performance for Large N

N        Runtime (rough)
100      < 1 second
1000     ~ 1 minute
1,000,000 ~ 1 hour (for specialized implementations)

Remarkable: For 1 million queens, the state space is astronomically huge, but min-conflicts finds a solution in reasonable time!

PART 9: TABU SEARCH – AVOIDING CYCLES

The Problem with Hill Climbing

Hill climbing can get stuck in cycles:

A → B → C → A → B → C → A ...

The Tabu Solution

Maintain a tabu list of recently visited states and forbid returning to them.

def tabu_search(initial_state, evaluate, get_neighbors, max_iterations, tabu_size):
    current = initial_state
    best_state = current
    best_value = evaluate(current)

    tabu_list = []  # Recently visited states

    for _ in range(max_iterations):
        # Get all neighbors
        neighbors = get_neighbors(current)

        # Filter out tabu states
        admissible = [n for n in neighbors if n not in tabu_list]

        if not admissible:
            # If all neighbors tabu, clear list
            tabu_list = []
            admissible = neighbors

        # Find best admissible neighbor
        best_neighbor = None
        best_neighbor_val = -infinity

        for n in admissible:
            val = evaluate(n)
            if val > best_neighbor_val:
                best_neighbor_val = val
                best_neighbor = n

        # Move to best neighbor
        current = best_neighbor
        tabu_list.append(current)

        # Keep tabu list from growing too large
        if len(tabu_list) > tabu_size:
            tabu_list.pop(0)

        # Update global best
        if best_neighbor_val > best_value:
            best_value = best_neighbor_val
            best_state = best_neighbor

    return best_state, best_value

Aspiration Criteria

Sometimes a tabu move might be REALLY good. We can override the tabu restriction if:

def is_admissible(move, best_value, current_value):
    if move not in tabu_list:
        return True
    if evaluate(move) > best_value:
        return True  # Aspiration: new global best!
    return False

Real Application: Political Districting

Tabu search has been used to create fair voting districts:

Constraints:

Districts must be contiguous
Equal population (± β%)
Respect natural boundaries
Socio-economic homogeneity
Preserve communities of interest

Moves:

Give: Transfer a unit from one district to another
Swap: Exchange units between adjacent districts

Tabu elements: Units recently transferred become tabu

Result: Fair, unbiased districting plans

PART 10: SIMULATED ANNEALING – ESCAPING LOCAL OPTIMA

The Metallurgy Analogy

In metallurgy, annealing is:

Heat metal to high temperature (atoms move freely)
Slowly cool it down (atoms settle into low-energy configuration)
Result: Strong, stable metal with minimal internal stress

The Algorithm Analogy

High temperature: Willing to make “bad” moves (explore)
Cooling down: Gradually become less willing to accept bad moves
Low temperature: Only accept good moves (exploit)

The Math

Probability of accepting a worse move:

P(accept) = e^(ΔE / T)
where:
ΔE = new_value - current_value (negative if worse)
T = current temperature

Key insight: Even if ΔE is negative (worse move), probability > 0 when T > 0

Implementation

def simulated_annealing(initial_state, evaluate, get_neighbors, 
                        schedule, max_iterations):
    current = initial_state
    current_val = evaluate(current)

    best_state = current
    best_val = current_val

    for iteration in range(max_iterations):
        T = schedule(iteration)  # Temperature at this iteration

        if T == 0:
            break

        # Pick a random neighbor
        neighbor = random.choice(get_neighbors(current))
        neighbor_val = evaluate(neighbor)

        # Should we move?
        delta = neighbor_val - current_val

        if delta > 0:
            # Better move - always accept
            current = neighbor
            current_val = neighbor_val

            if current_val > best_val:
                best_val = current_val
                best_state = current
        else:
            # Worse move - accept with probability e^(delta/T)
            # Note: delta is negative, so e^(negative/T) < 1
            if random.random() < math.exp(delta / T):
                current = neighbor
                current_val = neighbor_val

    return best_state, best_val

def linear_schedule(iteration, initial_temp=100, final_temp=0.1, max_iter=10000):
    """Linear cooling schedule"""
    return initial_temp - (initial_temp - final_temp) * iteration / max_iter

def exponential_schedule(iteration, initial_temp=100, alpha=0.95):
    """Exponential cooling"""
    return initial_temp * (alpha ** iteration)

Temperature Schedule Examples

Temperature
    ↑
100 |    ••••••
    |         ••••
    |            •••
    |              ••
    |                ••
    |                  ••
   0+--------------------→ Time

Traveling Salesman Problem with SA

Problem: Visit N cities, return to start, minimize distance

State representation: Permutation of cities [1,4,2,3,6,5]

Neighbor generation: Swap two cities (except first)

[1,4,2,3,6,5] → [1,4,3,2,6,5] (swap positions 3 and 4)

Evaluation: Total tour distance

Initial temperature: Set so ~80% of worse moves accepted initially

Cooling: Reduce temperature gradually (e.g., T’ = 0.95 × T)

SA Properties

Property	Value
Complete?	Yes (if cooled infinitely slowly)
Optimal?	Yes (if cooled infinitely slowly)
Practical?	Good solutions in reasonable time
Reproducible?	No (random)

The Catch: To guarantee optimality, you must run forever! But in practice, we get good solutions quickly.

PART 11: LOCAL BEAM SEARCH – KEEPING MULTIPLE OPTIONS

The Idea

Instead of one current state, keep k states at all times.

def local_beam_search(initial_states, evaluate, get_neighbors, k, max_iterations):
    # Start with k initial states
    current_states = initial_states[:k]

    for _ in range(max_iterations):
        # Generate all neighbors of all current states
        all_neighbors = []
        for state in current_states:
            all_neighbors.extend(get_neighbors(state))

        # Keep only the k best
        all_neighbors.sort(key=evaluate, reverse=True)
        current_states = all_neighbors[:k]

        # Check if any is goal
        for state in current_states:
            if is_goal(state):
                return state

    return best_state

Stochastic Beam Search

Instead of taking the k best, select k states with probability proportional to fitness:

def stochastic_beam_search(current_states, evaluate, get_neighbors, k):
    # Generate all neighbors
    all_neighbors = []
    for state in current_states:
        all_neighbors.extend(get_neighbors(state))

    # Evaluate all
    values = [evaluate(s) for s in all_neighbors]

    # Convert to probabilities (higher value = higher probability)
    # Need to handle negative values
    min_val = min(values)
    if min_val < 0:
        values = [v - min_val + 1 for v in values]

    total = sum(values)
    probs = [v/total for v in values]

    # Select k states randomly weighted by probability
    chosen_indices = random.choices(range(len(all_neighbors)), 
                                     weights=probs, k=k)
    return [all_neighbors[i] for i in chosen_indices]

PART 12: GENETIC ALGORITHMS – EVOLUTIONARY SEARCH

The Biological Inspiration

Nature evolves through:

Population – Many individuals
Selection – Fittest survive and reproduce
Crossover – Combine traits from two parents
Mutation – Random changes
Generation – Repeat

The Genetic Algorithm Framework

def genetic_algorithm(population_size, fitness_func, crossover_func, 
                      mutation_func, generations, mutation_rate):
    # Initialize population
    population = generate_initial_population(population_size)

    for generation in range(generations):
        # Evaluate fitness
        fitness_scores = [fitness_func(ind) for ind in population]

        # Check for solution
        best_idx = argmax(fitness_scores)
        if is_goal(population[best_idx]):
            return population[best_idx]

        # Create new population
        new_population = []

        while len(new_population) < population_size:
            # Select parents (tournament selection)
            parent1 = tournament_select(population, fitness_scores)
            parent2 = tournament_select(population, fitness_scores)

            # Crossover (with probability)
            if random.random() < 0.8:  # 80% crossover rate
                child1, child2 = crossover_func(parent1, parent2)
            else:
                child1, child2 = parent1, parent2

            # Mutate
            if random.random() < mutation_rate:
                child1 = mutation_func(child1)
            if random.random() < mutation_rate:
                child2 = mutation_func(child2)

            new_population.extend([child1, child2])

        population = new_population[:population_size]

    # Return best found
    return max(population, key=fitness_func)

def tournament_select(population, fitness_scores, tournament_size=3):
    """Select best from random tournament"""
    indices = random.sample(range(len(population)), tournament_size)
    best_idx = max(indices, key=lambda i: fitness_scores[i])
    return population[best_idx]

Key Components

1. Representation (The Chromosome)

How to encode a solution?

Bit strings:

Individual: [0,1,1,0,1,0,0,1]
Gene: Each bit
Chromosome: Entire string

Permutations (for TSP):

Individual: [3,5,1,4,2,6]  # Order of cities

Real values:

Individual: [0.34, 1.56, -2.89, 4.01]  # Parameters

Rules/Lists:

Individual: [R1, R3, R2, R5, R4]  # Sequence of rules

2. Fitness Function (The Environment)

Must accurately measure how good a solution is.

For N-Queens:

def fitness_nqueens(individual):
    # individual = [row1, row2, ..., rown] for each column
    conflicts = 0
    n = len(individual)

    # Check each pair of queens
    for i in range(n):
        for j in range(i+1, n):
            # Same row?
            if individual[i] == individual[j]:
                conflicts += 1
            # Same diagonal?
            if abs(individual[i] - individual[j]) == abs(i - j):
                conflicts += 1

    # More conflicts = lower fitness
    return 1 / (conflicts + 1)  # Add 1 to avoid division by zero

3. Selection (Reproduction)

How to choose parents?

Roulette Wheel Selection:

def roulette_select(population, fitness_scores):
    total_fitness = sum(fitness_scores)
    pick = random.uniform(0, total_fitness)
    current = 0

    for i, fitness in enumerate(fitness_scores):
        current += fitness
        if current > pick:
            return population[i]

Tournament Selection (shown above):

Pick k individuals randomly
Choose the best among them
Simple and effective

Rank Selection:

Sort by fitness
Selection probability based on rank, not raw fitness
Prevents domination by super-individuals

4. Crossover (Recombination)

One-Point Crossover (for bit strings):

Parent1: 1011|0010
Parent2: 1100|1101
           ↑ cut point
Child1:  1011|1101
Child2:  1100|0010

Two-Point Crossover:

Parent1: 10|1100|10
Parent2: 11|0011|01
           ↑    ↑
Child1:  10|0011|10
Child2:  11|1100|01

Uniform Crossover:
Each bit chosen randomly from either parent:

Parent1: 1 0 1 1 0 0 1 0
Parent2: 1 1 0 0 1 1 0 1
Mask:    0 1 0 1 1 0 1 0  (0=from P1, 1=from P2)
Child:   1 1 1 0 1 0 0 1

Order Crossover (for permutations):

Parent1: [3 5 7|2 1 6|4 8]
Parent2: [2 5 7|6 8 1|3 4]
          ↑     ↑
Child:   [5 8 7 2 1 6 3 4]

(Tricky! Maintains relative order from parents)

5. Mutation (Random Changes)

Bit flip mutation:

Before: [1,0,1,1,0,1,0,0]
            ↑
After:  [1,1,1,1,0,1,0,0]

Swap mutation (for permutations):

Before: [3,5,1,4,2,6]
            ↑     ↑
After:  [3,2,1,4,5,6]

Gaussian mutation (for real values):

Before: [0.34, 1.56, -2.89, 4.01]
            ↑ add N(0,0.1)
After:  [0.34, 1.62, -2.89, 4.01]

A Complete Simple Example

Problem: Evolve a 32-bit string to all 1’s

def evolve_all_ones():
    # Parameters
    POP_SIZE = 100
    GENS = 1000
    MUTATION_RATE = 0.01

    # Initialize random population
    population = []
    for _ in range(POP_SIZE):
        individual = [random.randint(0,1) for _ in range(32)]
        population.append(individual)

    for gen in range(GENS):
        # Fitness = number of 1's
        fitness = [sum(ind) for ind in population]

        # Check for solution
        best_fitness = max(fitness)
        if best_fitness == 32:
            best_idx = fitness.index(best_fitness)
            return population[best_idx]

        print(f"Gen {gen}: Best fitness = {best_fitness}")

        # Create new population
        new_pop = []

        while len(new_pop) < POP_SIZE:
            # Tournament selection
            p1 = tournament_select(population, fitness)
            p2 = tournament_select(population, fitness)

            # Crossover (one-point)
            if random.random() < 0.8:
                point = random.randint(1, 30)
                c1 = p1[:point] + p2[point:]
                c2 = p2[:point] + p1[point:]
            else:
                c1, c2 = p1[:], p2[:]

            # Mutation
            for i in range(32):
                if random.random() < MUTATION_RATE:
                    c1[i] = 1 - c1[i]  # Flip bit
                if random.random() < MUTATION_RATE:
                    c2[i] = 1 - c2[i]

            new_pop.extend([c1, c2])

        population = new_pop[:POP_SIZE]

    # Return best found
    fitness = [sum(ind) for ind in population]
    best_idx = fitness.index(max(fitness))
    return population[best_idx]

When to Use Genetic Algorithms

✅ Problem has many interacting parameters
✅ Fitness landscape is rough (many local optima)
✅ You can define a good fitness function
✅ You don’t need guaranteed optimal solution
✅ Problem is similar to one solved with GAs before
✅ You want to explore a wide range of solutions

❌ You need the absolute optimal solution
❌ Fitness evaluation is extremely expensive
❌ Problem has known efficient algorithm
❌ Solution must be explainable

GA Advantages

Conceptually simple – Easy to understand and implement
Modular – Separates problem from search mechanism
Parallelizable – Population can be evaluated in parallel
Multi-objective – Can optimize multiple criteria
No gradient needed – Works with non-differentiable functions
Always improves – Keeps best solution found
Escapes local optima – Population maintains diversity

GA Disadvantages

Parameter sensitive – Population size, mutation rate, etc.
No guarantees – May not find optimal solution
Premature convergence – Population becomes too uniform
Computationally expensive – Many evaluations needed
Hard to analyze – Theoretical properties complex

PART 13: COMPARISON OF LOCAL SEARCH METHODS

Method	Memory	Completeness	Optimality	Speed	When to Use
Hill Climbing	O(1)	No	No	Very Fast	Simple optimization, quick approximation
Random Restart HC	O(1)	Yes*	No	Fast	Many local optima, need guarantee
Tabu Search	O(k)	Yes*	No	Medium	Cycles are a problem
Simulated Annealing	O(1)	Yes*	Yes*	Medium	Need good solution, can tune time
Beam Search	O(k)	No	No	Fast	Want multiple alternatives
Genetic Algorithm	O(pop)	No	No	Slow	Complex search space, many interactions

*With infinite time/resources

PART 14: GRADIENT DESCENT – CONNECTION TO MACHINE LEARNING

The Math Connection

Local search isn’t just for discrete problems – continuous optimization uses gradient descent:

def gradient_descent(initial_params, compute_gradient, learning_rate, max_iterations):
    params = initial_params

    for _ in range(max_iterations):
        gradient = compute_gradient(params)
        params = params - learning_rate * gradient

    return params

For Linear Regression

We want to minimize:

J(θ) = ½ ∑(hθ(x) - y)²
where hθ(x) = θ₀ + θ₁x₁ + ... + θₙxₙ

Gradient:

∂J/∂θⱼ = ∑(hθ(x) - y) × xⱼ

Update rule:

θⱼ := θⱼ - α × ∂J/∂θⱼ

Why Gradient Descent is Local Search

Current state: Parameter vector θ
Neighbors: θ ± small changes
Evaluation: Cost function J(θ)
Move direction: Negative gradient (steepest descent)

Connection to Hill Climbing

Hill Climbing	Gradient Descent
Discrete states	Continuous parameters
Check all neighbors	Compute gradient
Move to best neighbor	Move in gradient direction
Stuck at local max	Stuck at local min

PART 15: KEY TAKEAWAYS FOR YOUR MIDTERM

The Big Ideas

Two problem types: Planning (need path) vs. Identification (just want goal)
Local search trades guarantees for efficiency:

Sacrifices completeness and optimality
Gains ability to handle huge state spaces

Hill climbing is the foundation:

Always move to better neighbor
Problems: local optima, plateaus, ridges

Min-conflicts heuristic is powerful:

Focuses on variables in conflict
Makes informed moves
Works amazingly well for N-Queens

Escaping local optima requires tricks:

Random restarts
Sideways moves
Tabu lists
Annealing (accepting worse moves)

Genetic algorithms use population:

Selection, crossover, mutation
Mimic biological evolution
Good for complex optimization

Memory Aids

Hill climbing = “Always go up”
Random restart = “If stuck, start over”
Tabu search = “Don’t go back to recent places”
Simulated annealing = “Sometimes go down, especially early”
Beam search = “Keep multiple possibilities”
Genetic algorithms = “Let them evolve”

When to Use Each

Simple problem, need quick answer?
    → Hill climbing

Problem has many local optima?
    → Random restart HC

Getting stuck in cycles?
    → Tabu search

Need really good solution, have time?
    → Simulated annealing

Multiple good solutions might exist?
    → Beam search

Complex, poorly understood problem?
    → Genetic algorithm

PRACTICE PROBLEMS

Problem 1

You’re solving 1000-queens. Which algorithm would you use and why?

Answer: Min-conflicts hill climbing with random restart. It’s been proven to work extremely well for large N-Queens, finding solutions in linear time.

Problem 2

Your hill climbing implementation keeps getting stuck. What three things could you try?

Answer:

Add random restart
Allow sideways moves
Use simulated annealing to accept occasional downhill moves

Problem 3

Why does crossover help in genetic algorithms?

Answer: It combines good “building blocks” from different parents. If one parent has good first half and another has good second half, crossover can create an individual with both good halves.

Problem 4

Your genetic algorithm’s population becomes all identical after a few generations. What’s wrong?

Answer: Premature convergence! Solutions:

Increase mutation rate
Increase population size
Use different selection method (less aggressive)
Add diversity preservation mechanism

Problem 5

When would you use tabu search instead of simulated annealing?

Answer: When cycles are the main problem (like in scheduling with repeating patterns) and you can afford to store visited states.

Local search is all about understanding the trade-offs between exploration and exploitation, and knowing which technique fits which problem. Practice tracing through these algorithms on small examples to build intuition.