Skip to content
rhodricusack edited this page Mar 25, 2012 · 1 revision

Note March 2011: local parallel processing for aa version 4 is coming soon.

Table of Contents

Parallel processing with aa



AA has the facility to run multiple parts of your analysis at the same time in parallel. It uses coarse grain parallelism: different instances of modules execute simultaneously, but there is no attempt to subdivide single modules. A single subject analysis should speed up by around a factor of 2, and multiple subject analyses by a factor of 5-10. The precise speed increase depends on the number of jobs you are allocated, which is determined by the memory, processor and Matlab license load on the Linux system.

How to use it

Starting

aa version 4 has entirely new facilities for parallel processing. To run through the Condor grid computing system, in your User Script add the line:

  aap.options.processwhere='condor';

Scheduling

Multiple modules are run simultaneously where possible. Within a module, there is no parallel execution. Part of the AA module definition specifies whether a module is run once per study, once per subject, or once per session. This affects parallel scheduling as shown in the table.

Domain When run in parallel Benefit
Session Always Any time there are multiple sessions
Subject When multiple subjects are being processed Any time there are multiple subjects
Study If multiple study-level stages are marked as executing simultanously Not in standard recipes at present

Dependencies

Most processing stages wait for the previous stage to complete before executing. However, some stages can execute before this. For example, realignment and tsdiffana can both execute together as soon as the dicom-to-nifti conversion of the EPIs is complete.

In aa version 4, the order of parallel processing, and which items may execute simultaneously, is calculated using the data streams. Where one module takes data from another, it must wait for it to complete. Otherwise, there is no interaction, and no need for it to wait.

You no need to specify "tobecompletedfirst" fields in the XML or your user script, as each module automatically connects to the previous one that exported the relevant data type. Where you wish one set of code to execute before another, we recommend the use of branches in the XML (see "branched analyses").

However, if you different streams to a module to come from different prior input (say to compare the EPIs before and after realignment) then you can qualify the stream names, like this. Instead of

<inputstreams>
  <stream><name>epi</name></stream>
</inputstreams>
do lines like this...
<inputstreams>
  <stream><name>aamod_realign.epi</name></stream>
</inputstreams>

Again, tasks with the same dependencies can be run in parallel. More importantly, the exact form of the dependency depends on the domain of each of the stages:

  • If a stage is executed once-per-study, it will wait for all subjects/sessions from the stage it is dependent on to completed.
  • If it is executed once-per-subject, each subject will be executed as soon as all of the sessions from this subject of the stage to be completed are available.
  • If it is executed once-per-session, it will execute as soon as the session is completed from the stage it is dependent on.

Good practice

Getting optimal performance

You will get the best performance if your worker jobs are distributed across machines, and if those machines have low load. If you already have many SPM jobs open and have a limited number of Matlab licenses (one necessary for each machine you are using), you will be restricted to this selection. This makes it more likely that your workers will be allocated to the same machine, and that this machine will not be the least loaded available. You will get better performance in general if you clear out your old jobs with the following command before starting SPM to run a parallel job:

 closeallmyspms

Executing your own scripts in parallel

You may wrap up your own code as an AA module, which has a low overhead (approximately 10 additional lines). AA will then happily schedule them to run in parallel.

When writing modules, if possible, it is good practice to make them execute at the session rather than subject level, as this allows greater parallism. For this reason, aamod_smoooth and aamod_normwrite have been modified to run once per session rather than once per subject.

Clone this wiki locally