Welcome to local search – a completely different way of thinking about problem-solving! Unlike the systematic search methods we’ve seen (BFS, DFS, A*), local search is all about improving what you have rather than building paths from scratch.
PART 1: TWO TYPES OF PROBLEMS – A FUNDAMENTAL DISTINCTION
Type 1: Planning Problems (What We’ve Seen So Far)
We want a PATH to the solution
Characteristics:
- The sequence of actions matters
- Need to know HOW to get there
- Usually want optimal path (cheapest, shortest, fastest)
Examples:
- Route finding: “How do I get from Arad to Bucharest?” (the path itself matters)
- 8-puzzle: “What sequence of moves solves the puzzle?”
- Chess: “What sequence of moves leads to checkmate?”
Formulation: Incremental – build solution step by step
Type 2: Identification Problems (Local Search Territory)
We just want the GOAL STATE itself
Characteristics:
- The path doesn’t matter at all
- Only the final configuration matters
- Usually want optimal goal (best configuration)
Examples:
- N-Queens: “Arrange 8 queens so none attack each other” (who cares HOW you placed them?)
- Circuit layout: “Design the optimal chip layout” (only the final design matters)
- Scheduling: “Create the best class schedule” (don’t care how you arrived at it)
- Protein folding: “Find the lowest-energy configuration” (just need the final shape)
Formulation: Complete-state – start with a complete configuration and improve it
PART 2: WHAT IS LOCAL SEARCH?
The Core Idea
Imagine you’re hiking in the mountains at night. You can’t see the whole landscape, but you can feel the ground beneath your feet. You want to reach the highest peak. What do you do?
- Feel around you
- Take a step in the direction that feels uphill
- Repeat until you can’t go any higher
That’s local search!
Key Characteristics
- Single current state – Only keep track of where you are now
- No path memory – Don’t remember how you got there
- Move to neighbors – Only consider states similar to current
- Very memory efficient – O(1) space complexity!
- No backtracking – Once you move, you never go back
When to Use Local Search
✅ The goal state itself is the solution (path irrelevant)
✅ State space is huge (systematic search impossible)
✅ Need a “good enough” solution quickly
✅ Memory is limited
✅ Problem is an optimization task
❌ Need guaranteed optimal solution
❌ Need the actual path to solution
❌ Problem has few solutions (might not find them)
PART 3: THE N-QUEENS PROBLEM – PERFECT EXAMPLE
Problem Definition
Place N queens on an N×N chessboard so that no two queens attack each other.
Attacks happen if:
- Same row
- Same column
- Same diagonal
Two Ways to Formulate
Incremental (Traditional Search):
- Start with empty board
- Add queens one by one
- Path = sequence of placements
- Need to backtrack when stuck
Complete-state (Local Search):
- Start with all queens on board (one per column)
- Move queens to reduce conflicts
- Path irrelevant – just want final arrangement
Why Local Search Works for N-Queens
- Many solutions – For 8-queens, there are 92 solutions
- Solutions well-distributed – Scattered throughout state space
- Obvious improvement metric – Number of attacking pairs
- Simple moves – Move one queen in its column
PART 4: HILL CLIMBING – THE FOUNDATION
The Basic Algorithm
def hill_climbing(initial_state, evaluate, get_neighbors):
current = initial_state
current_value = evaluate(current)
while True:
# Find the best neighbor
neighbors = get_neighbors(current)
best_neighbor = None
best_value = current_value
for neighbor in neighbors:
value = evaluate(neighbor)
if value > best_value: # Looking for maximum
best_value = value
best_neighbor = neighbor
# If no better neighbor, we're done
if best_neighbor is None:
return current, current_value
# Move to the best neighbor
current = best_neighbor
current_value = best_value
Analogy: Finding the Highest Peak
Imagine you’re in a mountain range at night:
- Current state: Where you’re standing
- Evaluate: Your altitude
- Neighbors: Where you can step to
- Move: Step to highest neighboring point
- Terminate: When all neighbors are lower
Hill Climbing on 8-Queens
State Representation:
- One queen in each column (positions 1-8)
- Example: [4, 2, 7, 3, 6, 8, 5, 1] means:
- Col 1: queen at row 4
- Col 2: queen at row 2
- etc.
Evaluation Function:
- Count number of attacking pairs
- Lower is better (0 = solution)
- Or use negative: -attacks (higher is better)
Neighbors:
- Move one queen to a different row in its column
- For 8-queens: 7 alternatives × 8 columns = 56 neighbors
Step-by-Step Example
Start: [4,2,7,3,6,8,5,1] with 5 attacking pairs
Check all 56 neighbors:
- Move queen 1 to row 1: [1,2,7,3,6,8,5,1] → 7 attacks (worse)
- Move queen 1 to row 2: [2,2,7,3,6,8,5,1] → 6 attacks (worse)
- Move queen 1 to row 3: [3,2,7,3,6,8,5,1] → 4 attacks (better!)
- Move queen 1 to row 5: [5,2,7,3,6,8,5,1] → 8 attacks (worse)
...
Best neighbor: [3,2,7,3,6,8,5,1] with 4 attacks
Move there and repeat...
PART 5: HILL CLIMBING VARIANTS
1. Simple Hill Climbing
Always move to the first better neighbor found
- Fast (don’t check all neighbors)
- Might miss the best direction
- Like taking the first path that goes up
2. Steepest-Ascent Hill Climbing
Check ALL neighbors, move to the absolute best
- More thorough
- More expensive per step
- What we implemented above
3. Stochastic Hill Climbing
Choose randomly from uphill moves, weighted by improvement
def stochastic_hill_climbing(current, evaluate, get_neighbors):
while True:
# Get all neighbors that are better
neighbors = get_neighbors(current)
better_neighbors = []
weights = []
current_val = evaluate(current)
for n in neighbors:
val = evaluate(n)
if val > current_val:
better_neighbors.append(n)
weights.append(val - current_val) # Improvement amount
if not better_neighbors:
return current
# Choose randomly weighted by improvement
chosen = random.choices(better_neighbors, weights=weights)[0]
current = chosen
4. First-Choice Hill Climbing
Generate neighbors randomly until finding a better one
- Good when state space has many neighbors
- Avoids generating all neighbors
def first_choice_hill_climbing(current, evaluate, generate_random_neighbor):
current_val = evaluate(current)
while True:
# Keep trying random neighbors
for _ in range(max_attempts):
neighbor = generate_random_neighbor(current)
neighbor_val = evaluate(neighbor)
if neighbor_val > current_val:
return neighbor # Found a better one
# If no better found after many attempts, we're stuck
return current
5. Random-Restart Hill Climbing
The big idea: if you get stuck, start over somewhere else!
def random_restart_hill_climbing(generate_random_state, evaluate,
get_neighbors, num_restarts):
best_state = None
best_value = -infinity
for _ in range(num_restarts):
# Start from random initial state
start = generate_random_state()
final_state, final_value = hill_climbing(start, evaluate, get_neighbors)
# Keep track of the best we've seen
if final_value > best_value:
best_state = final_state
best_value = final_value
return best_state, best_value
PART 6: THE PROBLEMS WITH HILL CLIMBING
Problem 1: Local Optima
/\peak (global max)
/ \
/ \
/ \
/ \/\ local max
/ \ \
/ \ \
/ \ \
/ \ \
Start →→→→→→→→→→→→→→→→→→→→→→→→
What happens:
- Hill climbing goes up to the first peak it finds
- Gets stuck at local maximum
- Can’t see that there’s a higher peak elsewhere
Real-world analogy:
You’re hiking at night. You reach a small hill and can’t go higher from there. But there’s a mountain nearby you never saw!
Problem 2: Plateaus
__________
\
\
\
\
\__________
What happens:
- All neighbors have same value
- No direction to move
- Like being on a flat plain
Types of plateaus:
- Flat local optimum: Truly stuck
- Shoulder: Can move sideways to eventually go up again
- Ridge: Narrow path upward surrounded by down slopes
Problem 3: Ridges
↑
|
←────┼────→
|
↓
What happens:
- All immediate moves go down
- But going diagonally would go up
- Problem: diagonal isn’t a “neighbor” in your definition
Problem 4: Evaluation Function Issues
The evaluation function might not accurately reflect progress:
- In 8-queens, reducing attacks doesn’t always lead to solution
- Can get stuck in states where any move increases attacks
- Need to allow “sideways moves” (no improvement) occasionally
PART 7: THE 8-QUEENS EXPERIMENT – EYE-OPENING RESULTS
Experiment 1: No Sideways Moves
Success rate: 14% (0.14)
Average moves when succeeding: 4
Average moves when stuck: 3
Expected total moves needed:
3(1-0.14)/0.14 + 4 = 3(0.86)/0.14 + 4
= 3(6.14) + 4 = 18.42 + 4 ≈ 22 moves
Interpretation:
- Only works 14% of the time
- When it works, it’s fast (4 moves)
- But mostly fails quickly (3 moves then stuck)
Experiment 2: With 100 Sideways Moves Allowed
Success rate: 94% (0.94)
Average moves when succeeding: 21
Average moves when stuck: 65
Expected total moves needed:
65(1-0.94)/0.94 + 21 = 65(0.06)/0.94 + 21
= 65(0.064) + 21 = 4.16 + 21 ≈ 25 moves
Interpretation:
- Much higher success rate (94% vs 14%)
- Takes longer when succeeding (21 vs 4 moves)
- When fails, explores much longer (65 moves)
- Overall expected moves only slightly higher (25 vs 22)
Key Insight: Allowing sideways moves dramatically improves success rate with modest increase in runtime!
PART 8: MIN-CONFLICTS HEURISTIC – THE N-QUEENS SECRET SAUCE
The Algorithm
def min_conflicts_nqueens(n, max_steps):
# Initialize: one queen per column, random rows
current = [random.randint(1, n) for _ in range(n)]
for step in range(max_steps):
# Count conflicts for each queen
if count_conflicts(current) == 0:
return current # Solution found!
# Pick a random conflicted queen
conflicted_queens = get_conflicted_queens(current)
queen = random.choice(conflicted_queens)
# Find the row in this column with minimum conflicts
best_row = find_min_conflict_row(current, queen)
# Move the queen there
current[queen] = best_row
return None # Failed within max_steps
Why It Works So Well
- Informed moves: Not random – moves to minimize conflicts
- Focus on problems: Only move queens that are attacked
- Randomization: Avoids getting trapped in cycles
- Restart strategy: If no solution in ~100 steps, start over
Performance for Large N
N Runtime (rough)
100 < 1 second
1000 ~ 1 minute
1,000,000 ~ 1 hour (for specialized implementations)
Remarkable: For 1 million queens, the state space is astronomically huge, but min-conflicts finds a solution in reasonable time!
PART 9: TABU SEARCH – AVOIDING CYCLES
The Problem with Hill Climbing
Hill climbing can get stuck in cycles:
A → B → C → A → B → C → A ...
The Tabu Solution
Maintain a tabu list of recently visited states and forbid returning to them.
def tabu_search(initial_state, evaluate, get_neighbors, max_iterations, tabu_size):
current = initial_state
best_state = current
best_value = evaluate(current)
tabu_list = [] # Recently visited states
for _ in range(max_iterations):
# Get all neighbors
neighbors = get_neighbors(current)
# Filter out tabu states
admissible = [n for n in neighbors if n not in tabu_list]
if not admissible:
# If all neighbors tabu, clear list
tabu_list = []
admissible = neighbors
# Find best admissible neighbor
best_neighbor = None
best_neighbor_val = -infinity
for n in admissible:
val = evaluate(n)
if val > best_neighbor_val:
best_neighbor_val = val
best_neighbor = n
# Move to best neighbor
current = best_neighbor
tabu_list.append(current)
# Keep tabu list from growing too large
if len(tabu_list) > tabu_size:
tabu_list.pop(0)
# Update global best
if best_neighbor_val > best_value:
best_value = best_neighbor_val
best_state = best_neighbor
return best_state, best_value
Aspiration Criteria
Sometimes a tabu move might be REALLY good. We can override the tabu restriction if:
def is_admissible(move, best_value, current_value):
if move not in tabu_list:
return True
if evaluate(move) > best_value:
return True # Aspiration: new global best!
return False
Real Application: Political Districting
Tabu search has been used to create fair voting districts:
Constraints:
- Districts must be contiguous
- Equal population (± β%)
- Respect natural boundaries
- Socio-economic homogeneity
- Preserve communities of interest
Moves:
- Give: Transfer a unit from one district to another
- Swap: Exchange units between adjacent districts
Tabu elements: Units recently transferred become tabu
Result: Fair, unbiased districting plans
PART 10: SIMULATED ANNEALING – ESCAPING LOCAL OPTIMA
The Metallurgy Analogy
In metallurgy, annealing is:
- Heat metal to high temperature (atoms move freely)
- Slowly cool it down (atoms settle into low-energy configuration)
- Result: Strong, stable metal with minimal internal stress
The Algorithm Analogy
- High temperature: Willing to make “bad” moves (explore)
- Cooling down: Gradually become less willing to accept bad moves
- Low temperature: Only accept good moves (exploit)
The Math
Probability of accepting a worse move:
P(accept) = e^(ΔE / T)
where:
ΔE = new_value - current_value (negative if worse)
T = current temperature
Key insight: Even if ΔE is negative (worse move), probability > 0 when T > 0
Implementation
def simulated_annealing(initial_state, evaluate, get_neighbors,
schedule, max_iterations):
current = initial_state
current_val = evaluate(current)
best_state = current
best_val = current_val
for iteration in range(max_iterations):
T = schedule(iteration) # Temperature at this iteration
if T == 0:
break
# Pick a random neighbor
neighbor = random.choice(get_neighbors(current))
neighbor_val = evaluate(neighbor)
# Should we move?
delta = neighbor_val - current_val
if delta > 0:
# Better move - always accept
current = neighbor
current_val = neighbor_val
if current_val > best_val:
best_val = current_val
best_state = current
else:
# Worse move - accept with probability e^(delta/T)
# Note: delta is negative, so e^(negative/T) < 1
if random.random() < math.exp(delta / T):
current = neighbor
current_val = neighbor_val
return best_state, best_val
def linear_schedule(iteration, initial_temp=100, final_temp=0.1, max_iter=10000):
"""Linear cooling schedule"""
return initial_temp - (initial_temp - final_temp) * iteration / max_iter
def exponential_schedule(iteration, initial_temp=100, alpha=0.95):
"""Exponential cooling"""
return initial_temp * (alpha ** iteration)
Temperature Schedule Examples
Temperature
↑
100 | ••••••
| ••••
| •••
| ••
| ••
| ••
0+--------------------→ Time
Traveling Salesman Problem with SA
Problem: Visit N cities, return to start, minimize distance
State representation: Permutation of cities [1,4,2,3,6,5]
Neighbor generation: Swap two cities (except first)
[1,4,2,3,6,5] → [1,4,3,2,6,5] (swap positions 3 and 4)
Evaluation: Total tour distance
Initial temperature: Set so ~80% of worse moves accepted initially
Cooling: Reduce temperature gradually (e.g., T’ = 0.95 × T)
SA Properties
| Property | Value |
|---|---|
| Complete? | Yes (if cooled infinitely slowly) |
| Optimal? | Yes (if cooled infinitely slowly) |
| Practical? | Good solutions in reasonable time |
| Reproducible? | No (random) |
The Catch: To guarantee optimality, you must run forever! But in practice, we get good solutions quickly.
PART 11: LOCAL BEAM SEARCH – KEEPING MULTIPLE OPTIONS
The Idea
Instead of one current state, keep k states at all times.
def local_beam_search(initial_states, evaluate, get_neighbors, k, max_iterations):
# Start with k initial states
current_states = initial_states[:k]
for _ in range(max_iterations):
# Generate all neighbors of all current states
all_neighbors = []
for state in current_states:
all_neighbors.extend(get_neighbors(state))
# Keep only the k best
all_neighbors.sort(key=evaluate, reverse=True)
current_states = all_neighbors[:k]
# Check if any is goal
for state in current_states:
if is_goal(state):
return state
return best_state
Stochastic Beam Search
Instead of taking the k best, select k states with probability proportional to fitness:
def stochastic_beam_search(current_states, evaluate, get_neighbors, k):
# Generate all neighbors
all_neighbors = []
for state in current_states:
all_neighbors.extend(get_neighbors(state))
# Evaluate all
values = [evaluate(s) for s in all_neighbors]
# Convert to probabilities (higher value = higher probability)
# Need to handle negative values
min_val = min(values)
if min_val < 0:
values = [v - min_val + 1 for v in values]
total = sum(values)
probs = [v/total for v in values]
# Select k states randomly weighted by probability
chosen_indices = random.choices(range(len(all_neighbors)),
weights=probs, k=k)
return [all_neighbors[i] for i in chosen_indices]
PART 12: GENETIC ALGORITHMS – EVOLUTIONARY SEARCH
The Biological Inspiration
Nature evolves through:
- Population – Many individuals
- Selection – Fittest survive and reproduce
- Crossover – Combine traits from two parents
- Mutation – Random changes
- Generation – Repeat
The Genetic Algorithm Framework
def genetic_algorithm(population_size, fitness_func, crossover_func,
mutation_func, generations, mutation_rate):
# Initialize population
population = generate_initial_population(population_size)
for generation in range(generations):
# Evaluate fitness
fitness_scores = [fitness_func(ind) for ind in population]
# Check for solution
best_idx = argmax(fitness_scores)
if is_goal(population[best_idx]):
return population[best_idx]
# Create new population
new_population = []
while len(new_population) < population_size:
# Select parents (tournament selection)
parent1 = tournament_select(population, fitness_scores)
parent2 = tournament_select(population, fitness_scores)
# Crossover (with probability)
if random.random() < 0.8: # 80% crossover rate
child1, child2 = crossover_func(parent1, parent2)
else:
child1, child2 = parent1, parent2
# Mutate
if random.random() < mutation_rate:
child1 = mutation_func(child1)
if random.random() < mutation_rate:
child2 = mutation_func(child2)
new_population.extend([child1, child2])
population = new_population[:population_size]
# Return best found
return max(population, key=fitness_func)
def tournament_select(population, fitness_scores, tournament_size=3):
"""Select best from random tournament"""
indices = random.sample(range(len(population)), tournament_size)
best_idx = max(indices, key=lambda i: fitness_scores[i])
return population[best_idx]
Key Components
1. Representation (The Chromosome)
How to encode a solution?
Bit strings:
Individual: [0,1,1,0,1,0,0,1]
Gene: Each bit
Chromosome: Entire string
Permutations (for TSP):
Individual: [3,5,1,4,2,6] # Order of cities
Real values:
Individual: [0.34, 1.56, -2.89, 4.01] # Parameters
Rules/Lists:
Individual: [R1, R3, R2, R5, R4] # Sequence of rules
2. Fitness Function (The Environment)
Must accurately measure how good a solution is.
For N-Queens:
def fitness_nqueens(individual):
# individual = [row1, row2, ..., rown] for each column
conflicts = 0
n = len(individual)
# Check each pair of queens
for i in range(n):
for j in range(i+1, n):
# Same row?
if individual[i] == individual[j]:
conflicts += 1
# Same diagonal?
if abs(individual[i] - individual[j]) == abs(i - j):
conflicts += 1
# More conflicts = lower fitness
return 1 / (conflicts + 1) # Add 1 to avoid division by zero
3. Selection (Reproduction)
How to choose parents?
Roulette Wheel Selection:
def roulette_select(population, fitness_scores):
total_fitness = sum(fitness_scores)
pick = random.uniform(0, total_fitness)
current = 0
for i, fitness in enumerate(fitness_scores):
current += fitness
if current > pick:
return population[i]
Tournament Selection (shown above):
- Pick k individuals randomly
- Choose the best among them
- Simple and effective
Rank Selection:
- Sort by fitness
- Selection probability based on rank, not raw fitness
- Prevents domination by super-individuals
4. Crossover (Recombination)
One-Point Crossover (for bit strings):
Parent1: 1011|0010
Parent2: 1100|1101
↑ cut point
Child1: 1011|1101
Child2: 1100|0010
Two-Point Crossover:
Parent1: 10|1100|10
Parent2: 11|0011|01
↑ ↑
Child1: 10|0011|10
Child2: 11|1100|01
Uniform Crossover:
Each bit chosen randomly from either parent:
Parent1: 1 0 1 1 0 0 1 0
Parent2: 1 1 0 0 1 1 0 1
Mask: 0 1 0 1 1 0 1 0 (0=from P1, 1=from P2)
Child: 1 1 1 0 1 0 0 1
Order Crossover (for permutations):
Parent1: [3 5 7|2 1 6|4 8]
Parent2: [2 5 7|6 8 1|3 4]
↑ ↑
Child: [5 8 7 2 1 6 3 4]
(Tricky! Maintains relative order from parents)
5. Mutation (Random Changes)
Bit flip mutation:
Before: [1,0,1,1,0,1,0,0]
↑
After: [1,1,1,1,0,1,0,0]
Swap mutation (for permutations):
Before: [3,5,1,4,2,6]
↑ ↑
After: [3,2,1,4,5,6]
Gaussian mutation (for real values):
Before: [0.34, 1.56, -2.89, 4.01]
↑ add N(0,0.1)
After: [0.34, 1.62, -2.89, 4.01]
A Complete Simple Example
Problem: Evolve a 32-bit string to all 1’s
def evolve_all_ones():
# Parameters
POP_SIZE = 100
GENS = 1000
MUTATION_RATE = 0.01
# Initialize random population
population = []
for _ in range(POP_SIZE):
individual = [random.randint(0,1) for _ in range(32)]
population.append(individual)
for gen in range(GENS):
# Fitness = number of 1's
fitness = [sum(ind) for ind in population]
# Check for solution
best_fitness = max(fitness)
if best_fitness == 32:
best_idx = fitness.index(best_fitness)
return population[best_idx]
print(f"Gen {gen}: Best fitness = {best_fitness}")
# Create new population
new_pop = []
while len(new_pop) < POP_SIZE:
# Tournament selection
p1 = tournament_select(population, fitness)
p2 = tournament_select(population, fitness)
# Crossover (one-point)
if random.random() < 0.8:
point = random.randint(1, 30)
c1 = p1[:point] + p2[point:]
c2 = p2[:point] + p1[point:]
else:
c1, c2 = p1[:], p2[:]
# Mutation
for i in range(32):
if random.random() < MUTATION_RATE:
c1[i] = 1 - c1[i] # Flip bit
if random.random() < MUTATION_RATE:
c2[i] = 1 - c2[i]
new_pop.extend([c1, c2])
population = new_pop[:POP_SIZE]
# Return best found
fitness = [sum(ind) for ind in population]
best_idx = fitness.index(max(fitness))
return population[best_idx]
When to Use Genetic Algorithms
✅ Problem has many interacting parameters
✅ Fitness landscape is rough (many local optima)
✅ You can define a good fitness function
✅ You don’t need guaranteed optimal solution
✅ Problem is similar to one solved with GAs before
✅ You want to explore a wide range of solutions
❌ You need the absolute optimal solution
❌ Fitness evaluation is extremely expensive
❌ Problem has known efficient algorithm
❌ Solution must be explainable
GA Advantages
- Conceptually simple – Easy to understand and implement
- Modular – Separates problem from search mechanism
- Parallelizable – Population can be evaluated in parallel
- Multi-objective – Can optimize multiple criteria
- No gradient needed – Works with non-differentiable functions
- Always improves – Keeps best solution found
- Escapes local optima – Population maintains diversity
GA Disadvantages
- Parameter sensitive – Population size, mutation rate, etc.
- No guarantees – May not find optimal solution
- Premature convergence – Population becomes too uniform
- Computationally expensive – Many evaluations needed
- Hard to analyze – Theoretical properties complex
PART 13: COMPARISON OF LOCAL SEARCH METHODS
| Method | Memory | Completeness | Optimality | Speed | When to Use |
|---|---|---|---|---|---|
| Hill Climbing | O(1) | No | No | Very Fast | Simple optimization, quick approximation |
| Random Restart HC | O(1) | Yes* | No | Fast | Many local optima, need guarantee |
| Tabu Search | O(k) | Yes* | No | Medium | Cycles are a problem |
| Simulated Annealing | O(1) | Yes* | Yes* | Medium | Need good solution, can tune time |
| Beam Search | O(k) | No | No | Fast | Want multiple alternatives |
| Genetic Algorithm | O(pop) | No | No | Slow | Complex search space, many interactions |
*With infinite time/resources
PART 14: GRADIENT DESCENT – CONNECTION TO MACHINE LEARNING
The Math Connection
Local search isn’t just for discrete problems – continuous optimization uses gradient descent:
def gradient_descent(initial_params, compute_gradient, learning_rate, max_iterations):
params = initial_params
for _ in range(max_iterations):
gradient = compute_gradient(params)
params = params - learning_rate * gradient
return params
For Linear Regression
We want to minimize:
J(θ) = ½ ∑(hθ(x) - y)²
where hθ(x) = θ₀ + θ₁x₁ + ... + θₙxₙ
Gradient:
∂J/∂θⱼ = ∑(hθ(x) - y) × xⱼ
Update rule:
θⱼ := θⱼ - α × ∂J/∂θⱼ
Why Gradient Descent is Local Search
- Current state: Parameter vector θ
- Neighbors: θ ± small changes
- Evaluation: Cost function J(θ)
- Move direction: Negative gradient (steepest descent)
Connection to Hill Climbing
| Hill Climbing | Gradient Descent |
|---|---|
| Discrete states | Continuous parameters |
| Check all neighbors | Compute gradient |
| Move to best neighbor | Move in gradient direction |
| Stuck at local max | Stuck at local min |
PART 15: KEY TAKEAWAYS FOR YOUR MIDTERM
The Big Ideas
- Two problem types: Planning (need path) vs. Identification (just want goal)
- Local search trades guarantees for efficiency:
- Sacrifices completeness and optimality
- Gains ability to handle huge state spaces
- Hill climbing is the foundation:
- Always move to better neighbor
- Problems: local optima, plateaus, ridges
- Min-conflicts heuristic is powerful:
- Focuses on variables in conflict
- Makes informed moves
- Works amazingly well for N-Queens
- Escaping local optima requires tricks:
- Random restarts
- Sideways moves
- Tabu lists
- Annealing (accepting worse moves)
- Genetic algorithms use population:
- Selection, crossover, mutation
- Mimic biological evolution
- Good for complex optimization
Memory Aids
- Hill climbing = “Always go up”
- Random restart = “If stuck, start over”
- Tabu search = “Don’t go back to recent places”
- Simulated annealing = “Sometimes go down, especially early”
- Beam search = “Keep multiple possibilities”
- Genetic algorithms = “Let them evolve”
When to Use Each
Simple problem, need quick answer?
→ Hill climbing
Problem has many local optima?
→ Random restart HC
Getting stuck in cycles?
→ Tabu search
Need really good solution, have time?
→ Simulated annealing
Multiple good solutions might exist?
→ Beam search
Complex, poorly understood problem?
→ Genetic algorithm
PRACTICE PROBLEMS
Problem 1
You’re solving 1000-queens. Which algorithm would you use and why?
Answer: Min-conflicts hill climbing with random restart. It’s been proven to work extremely well for large N-Queens, finding solutions in linear time.
Problem 2
Your hill climbing implementation keeps getting stuck. What three things could you try?
Answer:
- Add random restart
- Allow sideways moves
- Use simulated annealing to accept occasional downhill moves
Problem 3
Why does crossover help in genetic algorithms?
Answer: It combines good “building blocks” from different parents. If one parent has good first half and another has good second half, crossover can create an individual with both good halves.
Problem 4
Your genetic algorithm’s population becomes all identical after a few generations. What’s wrong?
Answer: Premature convergence! Solutions:
- Increase mutation rate
- Increase population size
- Use different selection method (less aggressive)
- Add diversity preservation mechanism
Problem 5
When would you use tabu search instead of simulated annealing?
Answer: When cycles are the main problem (like in scheduling with repeating patterns) and you can afford to store visited states.
Local search is all about understanding the trade-offs between exploration and exploitation, and knowing which technique fits which problem. Practice tracing through these algorithms on small examples to build intuition.
