How parallel for loops work
Parallel for loops are a way to take a group of independent tasks and compute more than one at a time. Matlab implements parallel for loops by using a master Matlab process that takes each of the possible steps in a for loop and assigning it to a worker Matlab process in a pool of workers that has been created for this purpose. This example shows how to set up and use a parallel for loop that is capable of using processors on more than one machine.
The spectral processing example
We will use the spec
function from the spectral processing example to demonstrate how this can be done. The complete file can be found on Flux at
/scratch/data/examples/matlab/ovarian/spec_parfor_current.m
For this example, we begin by setting variables to contain the file names and precreate some data structures to hold the final results. The first command from spec_parfor_current.m
that pertains to the parfor
setup is
% If not inside a PBS job, use 4 processors if isempty(getenv('PBS_NP')) NP = 4; else NP = str2double(getenv('PBS_NP')); end
That bit of code reads from the environment the number of processors that PBS assigned to this job. If there is no environment variable – that is, this is not being run from within PBS – then it defaults to setting the number of processors to four. This method can be used to create a Matlab script that can be run interactively outside of PBS or within it.
There are three cluster profiles that are defined at startup of Matlab 2015a and subsequent versions: local, current, and flux. When Matlab is run from within PBS, the current profile is the correct one to use, and it defines the nodes in the current job to be members of the cluster.
We initialize the pool of workers with the parpool
command, as in
% Initialize the pool to use myPool = parpool('current', NP);
The next bit of code in spec_parfor_current.m
starts a timer, then the actual parallel for loop, which uses the pool just created.
tic parfor k = 1:N data_file = [ data_repository files{k} ]; Y(:,k) = spec(data_file, k); end stop_time = toc;
We print the timing information (not shown here), and finally, we shut down the workers in the pool and delete the pool object.
% Shut down the parallel pool delete(myPool);
Note that the delete
command takes the name of the pool object we created with parpool
. You should always delete your pools before you exit.
Timing example using the blackjack simulation
This example shows an example of how to run some timings to measure whether and by how much performance increases as processors get added. It also shows how to check that the worker pool got created and exit with a message if it did not. To make this suitable to run outside of a PBS job, you would have to change that to the mechanism used in the spectral analysis example.
%%%% We get from the environment the number of processors NP = str2num(getenv('PBS_NP')); %%%% Create the pool for parfor to use thePool = parpool('current', NP); %%%% That worked, right? If not, exit if isempty(thePool) error('pctexample:backslashbench:poolClosed', ... ['This example requires a parallel pool. ' ... 'Manually start a pool using the parpool command or set ' ... 'your parallel preferences to automatically start a pool.']); exit end %%%% Some parameters numHands = 2000; numPlayers = 6; poolSize = thePool.NumWorkers; %%%% Precreate and initialize our results vector t1 = zeros(1, poolSize); %%%% Run simulation to see decreased time with increased processors fprintf('Simulating each player playing %d hands.n', numHands); for n = 2:poolSize tic; pctdemo_aux_parforbench(numHands, poolSize*numPlayers, n); t1(n) = toc; fprintf('%d workers simulated %d players in %3.2f seconds.n', ... n, poolSize*numPlayers, t1(n)); end %%%% Run one simulation many times to get average performance and std dev numIter = 50; t2 = zeros(1, numIter); for i = 1:numIter tic; pctdemo_aux_parforbench(numHands, poolSize*numPlayers, poolSize); t2(i) = toc; if mod(i,20) == 0 fprintf('Benchmark has run %d out of %d times.n', i, numIter); end end [muhat, sigmahat, muci] = normfit(t2) fprintf('nnMean: %8.4fnStdDev: %8.4fn', muhat, sigmahat); fprintf('n95%% CI for the meannLower: %8.4fnUpper: %8.4fnn', ... muci(1), muci(2)); %%%% Delete the pool explicitly to prevent future problems delete(thePool); exit