xcpEngine containers (Extra Info)

All atlases, software dependencies and scripts are included in the xcpEngine Docker/Singularity image. These instructions are not needed if you are using ``xcpengine-docker`` or ``xcpengine-singularity``. They are here in case you need to run them manually

Using xcpEngine with Singularity

The easiest way to get started with xcpEngine on a HPC system is to build a Singularity image from the xcpEngine released on dockerhub.:

$ singularity build xcpEngine.simg docker://pennbbl/xcpengine:latest

The only potentially tricky part about using a singularity image is the need to bind directories from your host operating system so they can be accessed from inside the container. Suppose there is a /data directory that is shared across your cluster as an nfs mount. All your data is stored in /data/study and you have a cohort file and design file there. When running the container, these will be seen as existing relative to the bind point. This means they need to be specified like so.:

$ singularity run \
    -B /data:/home/user/data \
    xcpEngine.simg \
    -c /home/user/data/study/my_cohort_rel_container.csv \
    -d /home/user/data/study/my_design.dsn \
    -o /home/user/data/study/output \
    -i $TMPDIR

The above command will work fine as long as your cohort file points to the data as it would be seen by the container. Specifically, the paths in my_cohort_rel_container.csv would all need to start with /home/user/data instead of /data. If you would like to keep the paths in your cohort relative to their locations in the host OS, you would need to specify a relative root when you run the container.:

$ singularity run \
    -B /data:/home/user/data \
    xcpEngine.simg \
    -c /home/user/data/study/my_cohort_host_paths.csv \
    -d /home/user/data/study/my_design.dsn \
    -o /home/user/data/study/output \
    -r /home/user \
    -i $TMPDIR

Where the paths in my_cohort_host_paths.csv all start with /data.

NOTE: Singularity typically mounts the host’s /tmp as /tmp in the container. This is useful in the case where you are running xcpEngine using a queueing system and want to write intermediate files to the locally-mounted scratch space provided in a $TMPDIR variable specific to the job. If you want to use a different temporary directory, be sure that it’s accessible from inside the container and provide the container-bound path to it.

Using xcpEngine with Docker

Using Docker is almost identical to Singularity, with the -B arguments substituted for -v. Here is an example::

$ docker --rm -it \
    -v /data:/data \
    -v /tmp:/tmp \
    pennbbl/xcpengine:latest \
    -c /data/study/my_cohort_host_paths.csv \
    -d /data/study/my_design.dsn \
    -o /data/study/output \
    -i $TMPDIR

Mounting directories in Docker is easier than with Singularity.

Using SGE to parallelize across subjects

By running xcpEngine from a container, you lose the ability to submit jobs to the cluster directly from xcpEngine. Here is a way to split your cohort file and submit a qsub job for each line. Note that we are using my_cohort_rel_container.csv, which means we don’t need to specify an -r flag. If your cohort file uses paths relative to the host’s file system you will need to specify -r:

#!/bin/bash
FULL_COHORT=/data/study/my_cohort_rel_container.csv
NJOBS=`wc -l < ${FULL_COHORT}`

if [[ ${NJOBS} == 0 ]]; then
    exit 0
fi

cat << EOF > xcpParallel.sh
#$ -V
#$ -t 1-${NJOBS}

# Adjust these so they work on your system
SNGL=/share/apps/singularity/2.5.1/bin/singularity
SIMG=/data/containers/xcpEngine.simg
FULL_COHORT=${FULL_COHORT}

# Create a temp cohort file with 1 line
HEADER=\$(head -n 1 \$FULL_COHORT)
LINE_NUM=\$( expr \$SGE_TASK_ID + 1 )
LINE=\$(awk "NR==\$LINE_NUM" \$FULL_COHORT)
TEMP_COHORT=\${FULL_COHORT}.\${SGE_TASK_ID}.csv
echo \$HEADER > \$TEMP_COHORT
echo \$LINE >> \$TEMP_COHORT

\$SNGL run -B /data:/home/user/data \$SIMG \\
  -c /home/user\${TEMP_COHORT} \\
  -d /home/user/data/study/my_design.dsn \\
  -o /home/user/data/study/output \\
  -i \$TMPDIR

EOF
qsub xcpParallel.sh

You will need to collate group-level outputs after batching subjects with the script ${XCPEDIR}/utils/combineOutput script, provided in utils.

Using SLURM to parallelize across subjects

By running xcpEngine from a container, you lose the ability to submit jobs to the cluster directly from xcpEngine. Here is a way to split your cohort file and submit an sbatch job for each line. Note that we are using my_cohort_rel_host.csv, which means we need to specify an -r flag. If your cohort file uses paths relative to the container you dont need to specify -r.

#!/bin/bash
# Adjust these so they work on your system
FULL_COHORT=/data/study/my_cohort_rel_host.csv
NJOBS=`wc -l < ${FULL_COHORT}`
HEADER="$(head -n 1 $FULL_COHORT)"
SIMG=/data/containers/xcpEngine.simg
# memory, CPU and time depend on the designfile and your dataset. Adjust values correspondingly
XCP_MEM=0G
XCP_C=0
XCP_TIME=0:0:0

if [[ ${NJOBS} == 0 ]]; then
    exit 0
fi

cat << EOF > xcpParallel.sh
#!/bin/bash -l
#SBATCH --array 1-${NJOBS}
#SBATCH --job-name xcp_engine
#SBATCH --mem $XCP_MEM
#SBATCH -c $XCP_C
#SBATCH --time $XCP_TIME
#SBATCH --workdir /my_working_directory
#SBATCH --output /my_working_directory/logs/slurm-%A_%a.out


LINE_NUM=\$( expr \$SLURM_ARRAY_TASK_ID + 1 )
LINE=\$(awk "NR==\$LINE_NUM" $FULL_COHORT)
TEMP_COHORT=${FULL_COHORT}.\${SLURM_ARRAY_TASK_ID}.csv
echo $HEADER > \$TEMP_COHORT
echo \$LINE >> \$TEMP_COHORT

singularity run -B /home/user/data:/data $SIMG \\
  -d /data/study/my_design.dsn \\
  -c \${TEMP_COHORT} \\
  -o /data/study/output \\
  -r /data \\
  -i \$TMPDIR

EOF
sbatch xcpParallel.sh

Keep in mind that - next to the directories and settings you need to adjust as mentioned in the script above - the logs directory needs to exist in your working-directory (see /my_working_directory/logs ) and you need to define the TMPDIR variable (see $TMPDIR). You will need to collate group-level outputs after batching subjects with the script ${XCPEDIR}/utils/combineOutput script, provided in utils.

Using the bundled software

All the neuroimaging software used by xcpEngine is available inside the Singularity image. Suppose you couldn’t get FSL 5.0.11 to run on your host OS. You could access it by:

$ singularity shell -B /data:/home/user/data xcpEngine.simg
Singularity: Invoking an interactive shell within container...

Singularity xcpEngine.simg:~> flirt -version
FLIRT version 6.0

Singularity xcpEngine.simg:~> antsRegistration --version
ANTs Version: 2.2.0.dev815-g0740f
Compiled: Jun 27 2017 17:39:25

This can be useful on a system where you don’t have current compilers or root permissions.