Learnings from a project inception

Agile is simple but not easy

When I was learning about Agile and people around were saying it is simple but not easy my initial reaction was “yeah, right, what can be hard with harnessing few common sense principles?” After some time in this space. Once again I was reminded why I was wrong.

Agile is simple, principles are simple, frameworks are relatively lightweight, and come one, it’s just a common sense - isn’t it? Yes, it is. But this common sense is being constantly challenged by a human factor, by all those people with great ideas and good intentions thinking about the same thing - in a different way. That makes it really hard to master.

So, I was facilitating an activity today focused on creating a Definition of Done and a Definition of Ready for my new team. Session was supposed to take 30 minutes. I wanted to come with some new idea how to do it - I’ve done my homework, read a nice chapter about helping a team in building a Definition of Done. In fact when I was reading that chapter I thought - “wow, that is bulletproof”. Couldn’t be more wrong.

After over 2 hours of a very vivid brainstorming, the team came with both definitions. In between, I felt like we were going circles, opening closed doors and throwing more and more ideas that were increasing complexity of our definitions. On the other hand, I’m not sure if confidence put in them was growing along with increasing complexity.

Could we have a simple definition that would help us go in the first sprint? I don’t know. After 2 hours, most of the room energy was gone, some people were still arguing about semantics and we had to cut the discussion.

Learnings from today:

  • Clearly express what is an expected outcome of a workshop you facilitate
  • Set up and communicate a timebox for an exercise - duh!
  • It’s an Agile - start small and improve your definitions over time. Finding the best one on a day 1 makes no sense.
  • Even most bulletproof ideas have to be practiced
  • Don’t worry, clear your head, talk to people about how they found your facilitation and do better next time
  • Get a coach or a mentor - it helps a lot when you can challenge your ideas with someone experienced (even after the fact)

Working with Pools in python

Problem - long running single-threaded simulations

I was writing a simulation for the 6.00.2x - Introduction to Computational Thinking and Data Science by MITx on EDX. One of problem sets required a simulation to be run few hundred times to collect enough samples. It took 4-5 minutes to get results for few hundred trials.

The base version of the simulation was written in Python 2.7 and executed in a single thread.

I started to explore how to use Python multiprocessing library to utilise all of my CPUs to get the results faster.

Base solution

I’ve started with writting a function that runs a single simulation.

def runSimulation(numViruses, maxPop, maxBirthProb, clearProb,
                  resistances, mutProb, numStepsBeforeDrug, numStepsAfterDrug):
    
    # Lots of stuff happening in here - removed for brevity
    return (numViruses, numResistantViruses)

I call this function in a sequential manner:

def simulate(numTrials):
    # Setting up simulation environment - removed for brevity 
    for t in range(numTrials):
        total, resistant = runSimulation(num_viruses,
                                         max_pop,
                                         max_birth_prob,
                                         clear_prob,
                                         resistances,
                                         mut_prob,
                                         num_steps_before_drug,
                                         num_steps_after_drug)
    
    # Plotting results - removed for brevity

Possible solution - using Pool from multiprocessing library

I started exploring multiprocessing library and its capacity of using process pools.

Sample from Python doco:

from multiprocessing import Pool
import time

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)              # start 4 worker processes

    result = pool.apply_async(f, (10,))   # evaluate "f(10)" asynchronously in a single process
    print result.get(timeout=1)           # prints "100" unless your computer is *very* slow

    print pool.map(f, range(10))          # prints "[0, 1, 4,..., 81]"

    it = pool.imap(f, range(10))
    print it.next()                       # prints "0"
    print it.next()                       # prints "1"
    print it.next(timeout=1)              # prints "4" unless your computer is *very* slow

    result = pool.apply_async(time.sleep, (10,))
    print result.get(timeout=1)           # raises multiprocessing.TimeoutError

Pool method map seemed to be a good choice - it takes an iterable and executes a function, passed in the first argument, for each of items in the iterable.

Mapping arguments

My function accepted 8 arguments and I wanted exactly N runs with the same arguments to be performed. Also I wanted my parallel execution pattern to be universal - few more exercises to go and I had to write separate simulation functions for other scenarios.

Simple wrapper

First things first, we’ll need an import:

from multiprocessing import Pool 

Next, I’ve created a wrapper that will help me pass a packed argument list to individual simulation runs in a positional manner:

def runSimulationWrapper(args):
    return runSimulation(*args)

I’ve created a simple function that will set up the parallel execution:

def runParallel(args, num_trials):
    try:
        p_args = [args for i in range(num_trials)]
        pool = Pool(8)                                                  
        arg_len = len(p_args)
        p_result = pool.map(runSimulationWrapper, p_args)
        return p_result
    finally:
        pool.close()
        pool.join()    

I modified my main simulate function to use the parallel execution function above


def simulate(numTrials):
    # Setting up simulation environment - removed for brevity 
    
    # Packing arguments
    args = [num_viruses,
            max_pop,
            max_birth_prob,
            clear_prob,
            resistances,
            mut_prob,
            num_steps_before_drug,
            num_steps_after_drug]

    results = runParallel(args, numTrials)
    
    # Plotting results - removed for brevity

Executing simulations

Initially, for my sequential approach I just dropped a call to the simulate method to the bottom of my file and used VSCode or IDLE to run it.

After modifying my execution model to parallel - my code just hanged. I spent few hours checking different samples of using multiprocessing library and some of them ran nicely while other hanged, finally I found what was the problem.

As stated in Programming guildelines - Windows we need to make sure that the main module can be safely imported by protecting the “entry point” of the program

So, instead of:

simulate(100)

Do this *

if __name__ == '__main__':
    simulate(100)
    

* and read docos before spending hours debugging solutions from the internet

blog built using modified cayman-theme by Jason Long. LICENSE