How parallel for loops work

Parallel for loops are a way to take a group of independent tasks and compute more than one at a time. Matlab implements parallel for loops by using a master Matlab process that takes each of the possible steps in a for loop and assigning it to a worker Matlab process in a pool of workers that has been created for this purpose. This example shows how to set up and use a parallel for loop that is capable of using processors on more than one machine.

The spectral processing example

We will use the spec function from the spectral processing example to demonstrate how this can be done. The complete file can be found on Flux at

/scratch/data/examples/matlab/ovarian/spec_parfor_current.m

For this example, we begin by setting variables to contain the file names and precreate some data structures to hold the final results. The first command from spec_parfor_current.m that pertains to the parfor setup is

% If not inside a PBS job, use 4 processors
if isempty(getenv('PBS_NP'))
    NP = 4;
else
    NP = str2double(getenv('PBS_NP'));
end

That bit of code reads from the environment the number of processors that PBS assigned to this job. If there is no environment variable – that is, this is not being run from within PBS – then it defaults to setting the number of processors to four. This method can be used to create a Matlab script that can be run interactively outside of PBS or within it.

There are three cluster profiles that are defined at startup of Matlab 2015a and subsequent versions: local, current, and flux. When Matlab is run from within PBS, the current profile is the correct one to use, and it defines the nodes in the current job to be members of the cluster.

We initialize the pool of workers with the parpool command, as in

% Initialize the pool to use
myPool = parpool('current', NP);

The next bit of code in spec_parfor_current.m starts a timer, then the actual parallel for loop, which uses the pool just created.

tic
parfor k = 1:N
   data_file = [ data_repository files{k} ];
   Y(:,k) = spec(data_file, k);
end
stop_time = toc;

We print the timing information (not shown here), and finally, we shut down the workers in the pool and delete the pool object.

% Shut down the parallel pool
delete(myPool);

Note that the delete command takes the name of the pool object we created with parpool. You should always delete your pools before you exit.

Timing example using the blackjack simulation

This example shows an example of how to run some timings to measure whether and by how much performance increases as processors get added. It also shows how to check that the worker pool got created and exit with a message if it did not. To make this suitable to run outside of a PBS job, you would have to change that to the mechanism used in the spectral analysis example.

%%%%  We get from the environment the number of processors
NP = str2num(getenv('PBS_NP'));

%%%%  Create the pool for parfor to use
thePool = parpool('current', NP);

%%%%  That worked, right?  If not, exit
if isempty(thePool)
    error('pctexample:backslashbench:poolClosed', ...
         ['This example requires a parallel pool. ' ...
          'Manually start a pool using the parpool command or set ' ...
          'your parallel preferences to automatically start a pool.']);
    exit
end

%%%%  Some parameters
numHands = 2000;
numPlayers = 6;
poolSize = thePool.NumWorkers;

%%%%  Precreate and initialize our results vector
t1 = zeros(1, poolSize);

%%%%  Run simulation to see decreased time with increased processors
fprintf('Simulating each player playing %d hands.n', numHands);
for n = 2:poolSize
    tic;
        pctdemo_aux_parforbench(numHands, poolSize*numPlayers, n);
    t1(n) = toc;
    fprintf('%d workers simulated %d players in %3.2f seconds.n', ...
            n, poolSize*numPlayers, t1(n));
end

%%%%  Run one simulation many times to get average performance and std dev
numIter = 50;
t2 = zeros(1, numIter);
for i = 1:numIter
    tic;
        pctdemo_aux_parforbench(numHands, poolSize*numPlayers, poolSize);
    t2(i) = toc;
    if mod(i,20) == 0
        fprintf('Benchmark has run %d out of %d times.n', i, numIter);
    end
end
[muhat, sigmahat, muci] = normfit(t2)
fprintf('nnMean:    %8.4fnStdDev:  %8.4fn', muhat, sigmahat);
fprintf('n95%% CI for the meannLower:   %8.4fnUpper:   %8.4fnn', ...
        muci(1), muci(2));

%%%%  Delete the pool explicitly to prevent future problems
delete(thePool);
exit