Optimizing Nash Equilibrium With Rust: A Performance Boost
Hey guys! Today, we're diving deep into a fascinating challenge: how to seriously speed up Nash equilibrium computation. This is a big deal, especially when dealing with complex scenarios. We're talking about taking a process that currently takes hours and shrinking it down to minutes. How? By strategically moving some key logic from Python to Rust. Let's break it down!
The Objective: Supercharge Nash Equilibrium Computation
Our main goal here is crystal clear: optimize the performance of Nash equilibrium computation. Currently, it's a bit of a time hog, taking anywhere from 1 to 4 hours for just 12 scenarios on a pretty beefy 48-vCPU machine. That's a lot of waiting around! The culprit? The heuristic agent logic, which is currently chilling in Python. The constant back-and-forth between Python and Rust during simulations is creating a major bottleneck. We need to fix this, and fast!
Background: Unmasking the Performance Bottleneck
To really understand the problem, let's dig into the details. Right now, the Nash equilibrium computation process (as highlighted in issue #85) is taking way too long. We're talking hours, not minutes. Performance profiling has revealed the core issue: the Python heuristic agent logic. This logic is getting hammered repeatedly during simulations, and each call involves a costly trip across the Python/Rust boundary. These trips add up, big time!
Current Architecture Bottleneck: The Python/Rust Shuffle
The problem lies in how our simulation loop is structured. We're constantly hopping between Python and Rust – roughly 50 to 100 times per episode. Let's look at some code to illustrate this:
# payoff_evaluator_rust.py lines 97-129
while not done:
    # Rust → Python: Get observations
    observations = []
    for agent_id in range(num_agents):
        obs = game.get_observation(agent_id)  # Boundary crossing!
        
    # Pure Python: Compute actions
    actions = []
    for agent_id in range(num_agents):
        action = _heuristic_action(theta, obs, agent_id, rng)  # Python logic
        
    # Python → Rust: Step game
    rewards, done, info = game.step(actions)  # Boundary crossing!
See those comments, "Boundary crossing!"? That's where the trouble is. Each time we call game.get_observation() or game.step(), we're jumping between Rust and Python. When you factor in 2000 simulations, 16 payoff matrix entries, and about 75 steps per episode, we're looking at a whopping 2.4 million boundary crossings per Double Oracle iteration. No wonder things are slow!
Proposed Optimization Strategy: A Three-Phase Attack
To tackle this performance bottleneck, we're proposing a three-phase optimization strategy. Each phase builds on the previous one, progressively boosting our speed. Let's dive into each phase:
Phase 1: Rust Heuristic Agent (Highest Impact)
This is where we expect to see the biggest gains. By moving the heuristic agent logic directly into Rust, we can eliminate those costly Python/Rust boundary crossings within the simulation loop. Think of it as cutting out the middleman – way more efficient!
Expected speedup: 5-10x (That means potentially reducing computation time from 1-4 hours down to a much more manageable 10-30 minutes!)
The core idea is to reimplement the heuristic agent's decision-making process in Rust. This involves creating a HeuristicAgent struct in Rust and porting the logic from the existing Python _heuristic_action function. Here’s a glimpse of what this might look like in Rust:
// bucket-brigade-core/src/agents/heuristic.rs
pub struct HeuristicAgent {
    /// Agent strategy parameters [honesty, work_tendency, neighbor_help, ...]
    pub params: [f64; 10],
}
impl HeuristicAgent {
    pub fn select_action(
        &self, 
        obs: &Observation, 
        agent_id: usize, 
        rng: &mut impl Rng
    ) -> Action {
        let work_tendency = self.params[1];
        let own_house_priority = self.params[3];
        let rest_reward_bias = self.params[8];
        
        // Implement heuristic decision logic in Rust
        if rng.gen::<f64>() < work_tendency * (1.0 - rest_reward_bias) {
            // Work - choose which house
            let owned_house = agent_id % 10;
            
            if obs.houses[owned_house] == HouseState::Burning 
               && rng.gen::<f64>() < own_house_priority {
                Action { house: owned_house, mode: WorkMode::Work }
            } else {
                // Choose burning house
                let burning: Vec<_> = obs.houses.iter()
                    .enumerate()
                    .filter(|(_, &h)| h == HouseState::Burning)
                    .map(|(i, _)| i)
                    .collect();
                    
                let house = if !burning.is_empty() {
                    burning[rng.gen_range(0..burning.len())]
                } else {
                    owned_house
                };
                
                Action { house, mode: WorkMode::Work }
            }
        } else {
            // Rest
            Action {
                house: agent_id % 10, 
                mode: WorkMode::Rest 
            }
        }
    }
}
// Expose to Python via PyO3
#[pyfunction]
fn run_heuristic_episode(
    scenario: PyScenario,
    theta_focal: Vec<f64>,
    theta_opponents: Vec<f64>,
    seed: u64,
) -> PyResult<f64> {
    // Create agents
    let focal_agent = HeuristicAgent { params: theta_focal.try_into().unwrap() };
    let opponent_agent = HeuristicAgent { params: theta_opponents.try_into().unwrap() };
    
    // Create game
    let mut game = BucketBrigade::new(scenario.into(), seed);
    let mut rng = Pcg64::seed_from_u64(seed);
    
    // Run episode entirely in Rust
    let mut episode_reward = 0.0;
    let mut done = false;
    let mut step_count = 0;
    
    while !done && step_count < 100 {
        // Get observations (no Python boundary!)
        let observations: Vec<_> = (0..scenario.num_agents)
            .map(|id| game.get_observation(id))
            .collect();
        
        // Get actions (all in Rust!)
        let actions: Vec<_> = (0..scenario.num_agents)
            .map(|id| {
                let agent = if id == 0 { &focal_agent } else { &opponent_agent };
                agent.select_action(&observations[id], id, &mut rng)
            })
            .collect();
        
        // Step game (all in Rust!)
        let (rewards, is_done, _) = game.step(&actions);
        episode_reward += rewards[0];
        done = is_done;
        step_count += 1;
    }
    
    Ok(episode_reward)
}
This Rust code will handle the agent's decision-making process, keeping everything nice and speedy within Rust. We'll also use PyO3 to expose this functionality to Python.
On the Python side, things become much simpler:
def _run_rust_simulation(args):
    """Run entire simulation in Rust - no boundary crossings."""
    theta_focal, theta_opponents, python_scenario, seed = args
    rust_scenario = _convert_scenario_to_rust(python_scenario)
    
    # Single Rust call for entire episode
    return core.run_heuristic_episode(
        rust_scenario, 
        theta_focal.tolist(), 
        theta_opponents.tolist(), 
        seed
    )
Now, instead of a bunch of back-and-forth, we have a single call to Rust to run the entire episode. This is a huge win for performance!
Phase 2: Parallel Payoff Matrix (Easy Win)
Next up, we're going to tackle the payoff matrix computation. Currently, this is done sequentially, meaning one entry at a time. We can significantly speed this up by evaluating all entries in parallel.
Expected additional speedup: 2-4x (This could bring our total computation time down to 5-15 minutes!)
Currently, the payoff matrix is calculated in a loop, like this:
# double_oracle.py line 163
payoff_matrix = self.evaluator.evaluate_payoff_matrix(strategy_pool)
# payoff_evaluator_rust.py lines 204-227
for i in range(K):
    for j in range(K):
        payoff_matrix[i, j] = self.evaluate_symmetric_payoff(...)
Our optimization involves evaluating all K×K entries simultaneously using Python's multiprocessing library:
def evaluate_payoff_matrix(self, strategy_pool: list[np.ndarray]) -> np.ndarray:
    """Compute payoff matrix with parallel evaluation of all entries."""
    K = len(strategy_pool)
    
    # Create all (i, j, theta_i, theta_j) tuples
    tasks = [
        (i, j, strategy_pool[i], strategy_pool[j])
        for i in range(K) for j in range(K)
    ]
    
    # Evaluate all K×K entries in parallel
    with Pool(processes=self.num_workers) as pool:
        results = pool.starmap(self._evaluate_single_entry, tasks)
    
    # Reshape to matrix
    payoff_matrix = np.zeros((K, K))
    for idx, (i, j, _, _) in enumerate(tasks):
        payoff_matrix[i, j] = results[idx]
    
    return payoff_matrix
def _evaluate_single_entry(self, i, j, theta_i, theta_j):
    """Evaluate single payoff matrix entry."""
    return self.evaluate_symmetric_payoff(theta_i, theta_j)
We create a pool of worker processes and use pool.starmap to distribute the workload. This lets us calculate all the payoff matrix entries concurrently, leading to a significant speed boost.
Important Note: We'll need to carefully tune the number of worker processes to avoid overwhelming our system, especially since evaluate_symmetric_payoff already uses multiprocessing internally. We don't want to oversubscribe resources!
Phase 3: GPU/CUDA (Only if Needed)
This phase is the heavy hitter, and we'll only consider it if Phases 1 and 2 don't quite get us where we need to be. GPU acceleration can be incredibly powerful, but it also comes with its own set of challenges and overhead. In this case, we anticipate that the returns may diminish compared to the effort required.
Expected additional speedup: 1.5-2x (But remember, this comes with a higher complexity cost.)
While GPUs are amazing for parallel processing, they might not be the perfect fit for this particular problem. Here's why:
- Game logic is inherently sequential: The game unfolds step by step, limiting the amount of parallelism we can exploit.
 - Small state space: We're dealing with only 10 houses and 4 agents, which means the GPU might not be fully utilized.
 - GPU memory transfer overhead: Moving data back and forth between the CPU and GPU can eat into our performance gains.
 - Significant rewrite required: Implementing GPU acceleration would likely involve using libraries like cupy, numba, or even CUDA directly, which means a substantial amount of code rewriting.
 
That being said, there are potential areas where GPUs could help:
- Batch simulations: We could run thousands of simulations simultaneously on the GPU.
 - Vectorized fire spread computation: If fire spread becomes a bottleneck, we could potentially parallelize it on the GPU.
 - Parallel random number generation: Generating random numbers in parallel is something GPUs excel at.
 
If we do decide to go down this road, it would involve carefully analyzing the bottlenecks and identifying specific kernels that can benefit from GPU acceleration.
Expected Overall Performance: A Dramatic Improvement
Let's put it all together and see what kind of performance gains we're anticipating:
| Phase | Implementation | Time for 12 scenarios | Speedup | 
|---|---|---|---|
| Current | Python heuristic | 1-4 hours | 1x | 
| Phase 1 | Rust heuristic | 10-30 minutes | 5-10x | 
| Phase 2 | + Parallel matrix | 5-15 minutes | 2-4x more | 
| Phase 3 | + GPU (optional) | 3-10 minutes | 1.5-2x more | 
As you can see, we're expecting a massive improvement in performance. The biggest jump comes from Phase 1 (moving the heuristic agent to Rust), which should give us a 5-10x speedup. Phase 2 (parallelizing the payoff matrix) adds another significant boost. Phase 3 (GPU acceleration) is more of a question mark, but it could potentially shave off some extra time if needed.
Recommendation: We strongly recommend implementing Phase 1 first. It gives us the most bang for our buck, delivering 80% of the performance improvement with only 20% of the effort. It's the low-hanging fruit we should definitely grab!
Implementation Plan: Step-by-Step to Success
To make sure we stay on track, we've laid out a detailed implementation plan with clear steps and milestones. Let's take a look:
Step 1: Rust Heuristic Agent Module
- [ ] Create 
bucket-brigade-core/src/agents/module: This is where our Rust agent code will live. - [ ] Implement 
HeuristicAgentstruct with strategy parameters: We'll define the data structure to hold the agent's parameters. - [ ] Port Python 
_heuristic_actionlogic to Rustselect_actionmethod: This is the heart of the operation – translating the Python logic to Rust. - [ ] Add unit tests for heuristic agent behavior: We need to make sure our Rust agent is acting as expected.
 
Step 2: Rust Episode Runner
- [ ] Implement 
run_heuristic_episodefunction in Rust: This function will run an entire episode within Rust, avoiding Python callbacks. - [ ] Handle episode loop entirely in Rust (no Python callbacks): We want to keep everything in Rust for maximum speed.
 - [ ] Expose function to Python via PyO3 
#[pyfunction]: This allows us to call the Rust function from Python. - [ ] Add integration tests comparing Rust vs Python episode results: We need to verify that the Rust and Python implementations produce the same results.
 
Step 3: Python Integration
- [ ] Update 
payoff_evaluator_rust.pyto use new Rust episode runner: We'll modify the Python code to call our new Rust function. - [ ] Simplify 
_run_rust_simulationto single Rust call: We'll streamline the Python code to make it cleaner and more efficient. - [ ] Remove old 
_heuristic_actionPython implementation: We don't need the old Python logic anymore. - [ ] Update tests and benchmarks: We need to make sure everything still works after the changes.
 
Step 4: Benchmark and Validate
- [ ] Run benchmark on single scenario (before/after): We'll measure the performance improvement on a single scenario.
 - [ ] Validate Nash equilibrium results match (within Monte Carlo variance): We need to ensure that our changes don't affect the accuracy of the Nash equilibrium computation.
 - [ ] Test on all 12 scenarios: We'll run the tests on all scenarios to get a comprehensive view of the performance improvement.
 - [ ] Document performance improvements: We'll record the results of our benchmarking and validation efforts.
 
Step 5: Phase 2 (Parallel Matrix)
- [ ] Implement parallel payoff matrix evaluation: We'll use Python's multiprocessing library to parallelize the payoff matrix calculation.
 - [ ] Tune worker count to avoid oversubscription: We'll experiment with different worker counts to find the optimal setting.
 - [ ] Benchmark and validate: We'll measure the performance improvement and ensure that the results are accurate.
 - [ ] Update documentation: We'll document the changes we made and the performance improvements we achieved.
 
Acceptance Criteria: How We Define Success
To ensure we've achieved our goals, we've established clear acceptance criteria:
- [ ] Rust heuristic agent implementation matches Python behavior (validated via tests): The Rust agent should behave identically to the Python agent.
 - [ ] Nash equilibrium results are statistically equivalent (within MC variance): Our changes should not significantly alter the Nash equilibrium results.
 - [ ] Performance improves by at least 5x on 12-scenario benchmark: We want to see a substantial speedup in the computation time.
 - [ ] All existing tests pass: We don't want to break anything in the process.
 - [ ] Documentation updated with new architecture: We need to keep the documentation up-to-date.
 
Technical Considerations: Diving into the Details
Let's talk about some of the technical aspects of this optimization effort:
Rust Dependencies
We'll need to add a couple of dependencies to our bucket-brigade-core/Cargo.toml file:
[dependencies]
rand = "0.8"
rand_pcg = "0.3"
These dependencies provide us with random number generation capabilities, which are essential for the heuristic agent.
Python/Rust Interface
We'll use PyO3's #[pyfunction] macro to expose the run_heuristic_episode function to Python. This will allow us to call the Rust function from Python code. The function signature will look something like this:
run_heuristic_episode(scenario, theta_focal, theta_opponents, seed) -> float
It takes the scenario, agent parameters, and a random seed as input and returns the episode reward.
Testing Strategy
We'll employ a multi-faceted testing strategy to ensure the quality and correctness of our changes:
- Unit tests (Rust): We'll write unit tests to verify the behavior of the heuristic agent's action selection logic.
 - Integration tests (Python): We'll compare the results of running episodes in Rust and Python to ensure consistency.
 - Property tests: We'll use property-based testing to verify the statistical equivalence of Nash equilibrium results.
 - Benchmark tests: We'll measure the performance improvement using benchmark tests.
 
Risks and Mitigations: Planning for the Unexpected
Like any software project, this optimization effort comes with potential risks. Let's identify the key risks and outline our mitigation strategies:
Risk: Rust implementation differs from Python logic
- Mitigation: We'll implement comprehensive integration tests and validate the results on known scenarios.
 
Risk: Parallel matrix causes resource contention
- Mitigation: We'll carefully tune the worker count and consider adaptive parallelism if needed.
 
Risk: Breaking changes to API
- Mitigation: We'll initially keep the old API and deprecate it gracefully.
 
Related Work: Standing on the Shoulders of Giants
This optimization effort builds upon previous work and addresses existing issues. Here are some relevant links:
- Issue #85: Current Nash equilibrium computation (provides benchmark baseline)
 bucket_brigade/equilibrium/payoff_evaluator_rust.py: Python heuristic to portbucket-brigade-core/src/game.rs: Rust game engine (already fast)bucket_brigade/agents/heuristic.py: Original Python heuristic agent
References: Learning from the Experts
We'll be drawing on various resources to guide our implementation, including:
- PyO3 documentation: https://pyo3.rs
 - Rust multiprocessing patterns
 - Nash equilibrium algorithm benchmarks
 
Conclusion: A Faster Future for Nash Equilibrium Computation
By strategically moving the heuristic agent logic to Rust and parallelizing the payoff matrix computation, we can significantly speed up Nash equilibrium computation. This will not only save us time but also enable us to tackle more complex scenarios. Let's get to work and make it happen!
Type: Performance optimization
Priority: Medium (not critical, but significant speedup potential)
Complexity: Medium (Rust development + Python integration)
Estimated effort: 2-3 days for Phase 1, 1 day for Phase 2
Blocked by: Issue #85 (provides benchmark baseline)