Computation node¶

How to send jobs to CALMODULIN¶

We will connect via SSH as follows:

$ ssh username@calmodulin.lc.ehu.es
$ ssh username@u036898.lc.ehu.es

What will log us into the system:

  ************************************************* 
             _                  _       _  _        
            | |                | |     | |(_)       
  ____ _____| | ____   ___   __| |_   _| | _ ____   
 / ___|____ | ||    \ / _ \ / _  | | | | || |  _ \  
( (___/ ___ | || | | | |_| ( (_| | |_| | || | | | | 
 \____)_____|\_)_|_|_|\___/ \____|____/ \_)_|_| |_| 

  ************************************************* 


 Welcome user

 Date: jue nov 18 20:11:59 CET 2021

 Hostname:     u036898.lc.ehu.es
 CPU Model:    Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz

 Number of procs:      48
 Total Memory:         791009840 kB
 Free Memory:          743464560 kB

 cluster related issues --> aritz.leonardo@ehu.eus 
 check the wiki for more information

Specifications¶

Compute Node	# nodes	Processor	# of cores	memory (GB)	Accelerator
u036898	1	Intel Xeon Gold 6240R	48	800	1x NVIDIA A40

Working Spaces¶

Ideally, each researcher would create a user directory in each of these spaces:

Role	mount point	Size
scratch	/scratch	900 GB
almacenamiento	/bigdisk	7.3 TB

Sending Jobs¶

The resource manager is SLURM and you can find more information on its use in this page.

QoS and partitions¶

QoS/Partition	Priority	MaxWall	MaxNodesPU	MaxJobsPU	MaxSubmitPU	MaxTRES
batch	-	INFINITY	1

The columns mean the following:

MaxWall: Maximum time that a job can run.
MaxNodesPU: The maximum number of nodes that a job can request.

Batch scripts¶

Calmodulin: Job MPI

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

srun binario < input

Calmodulin: Job OpenMP

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

binary < input

Calmodulin: Job híbrido (MPI+OpenMP)

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input

Calmodulin: Job with 1 GPU

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

srun binary < input

Toolchains¶

A toolchain is a coherent grouping of a series of tools that can be used to: compile, install or for the runtime of an application.

Its definition allows delimiting a work environment at the level of versions, compilers and libraries.

The description of the table indicates which are the components that we would load to the environment if we loaded an associated module to a toolchain:

Toolchain	Description
foss	GCC, OpenMPI, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW
fosscuda	GCC, OpenMPI, CUDA, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW
gompi	GCC, OpenMPI
gompic	GCC, OpenMPI, CUDA
gcccuda	GCC, CUDA
intel	icc, ifort, imkl, impi
iimpi	icc, ifort, impi

Foe example, if we loaded foss/2021a:

$ module load foss/2021a

We would see that the follwing modules have been loaded:

$ module list

Currently Loaded Modules:
  1) GCCcore/10.3.0                        6) XZ/.5.2.5-GCCcore-10.3.0          (H)  11) libevent/.2.1.12-GCCcore-10.3.0  (H)  16) OpenBLAS/0.3.15-GCC-10.3.0      21) foss/2021a
  2) zlib/.1.2.11-GCCcore-10.3.0     (H)   7) libxml2/.2.9.10-GCCcore-10.3.0    (H)  12) UCX/.1.10.0-GCCcore-10.3.0       (H)  17) FlexiBLAS/3.0.4-GCC-10.3.0
  3) binutils/.2.36.1-GCCcore-10.3.0 (H)   8) libpciaccess/.0.16-GCCcore-10.3.0 (H)  13) libfabric/.1.12.1-GCCcore-10.3.0 (H)  18) gompi/2021a
  4) GCC/10.3.0                            9) hwloc/.2.4.1-GCCcore-10.3.0       (H)  14) PMIx/.3.2.3-GCCcore-10.3.0       (H)  19) FFTW/3.3.9-gompi-2021a
  5) numactl/.2.0.14-GCCcore-10.3.0  (H)  10) OpenSSL/.1.1                      (H)  15) OpenMPI/4.1.1-GCC-10.3.0              20) ScaLAPACK/2.1.0-gompi-2021a-fb

  Where:
   H:  Hidden Module

As we can see, among other dependencies, the environment has been loaded with:

A compiler: GCCcore/10.3.0, GCC/10.3.0
A particular implementatio of MPI: OpenMPI/4.1.1-GCC-10.3.0
Scientific libraries: OpenBLAS/0.3.15-GCC-10.3.0, OpenBLAS/0.3.15-GCC-10.3.0, ScaLAPACK/2.1.0-gompi-2021a-fb, FFTW/3.3.9-gompi-2021a

The other modules are, as a general rule, dependencies of these main modules.

Also note that some toolchains may have another as a dependency. In this case gompi/2021a would be a subtoolchain of foss/2021a.

Compilers¶

Generally, in multipurpose computers with x86_64 architecture, two families of compilers are used:

GNU: the open source compilers of the GNU initiative.
Intel: Intel’s proprietary (and free as of 2021) compilers. They are especially interesting on platforms with Intel processors, since they generate optimized machine code for their own processors.

Here’s a list of how to call these compilers.

Compiler	C	FORTRAN	C++	MPI C	MPI FORTRAN	MPI C++
GNU	gcc	gfortran	g++	mpicc	mpif90	mpicxx
intel	icc	ifort	icpc	mpiicc	mpiifort	mpiicpc

Software¶

QuantumESPRESSO¶

Version	CPU	GPU
QuantumESPRESSO/6.8-intel-2021	✔

Calmodulin: QuantumESPRESSO

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=QuantumESPRESSO_JOB
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load QuantumESPRESSO/<version>

srun pw.x < input.in

GROMACS¶

Version	CPU	GPU
GROMACS/2016.4-fosscuda-2020b-PLUMED-2.4.0	✔	✔
GROMACS/2021-foss-2020b	✔
GROMACS/2021.2-fosscuda-2020b	✔	✔
GROMACS/2021.3-foss-2021a-CUDA-11.3.1	✔	✔
GROMACS/2021.3-fosscuda-2020b-PLUMED-2.7.2	✔	✔

Here you can find documentation associated with the GROMACS versions to launch jobs efficiently. It’s interesting given that the 2016 and 2021 versions have different release options.

Versión	Documentación
2016	https://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html
2021	https://manual.gromacs.org/documentation/2021/user-guide/mdrun-performance.html

Calmodulin: GROMACS

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -s input.tpr

Calmodulin: GROMACS with GPU (for versions older than 2020)

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -bonded auto -pme auto -gpu_id 0 -s input.tpr

Calmodulin: GROMACS with GPU

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr

In the case of GROMACS, what usually works best is choosing a balanced number of processes and threads. Here we show a table with cases in which all 48 cores of the machine (that is, the entire machine) or 24 cores (that is, half machine).

Number of total cores	Processes MPI	Threads per process
48	8 (`--ntasks-per-node=8`)	6 (`--cpus-per-task=6`)
48	6 (`--ntasks-per-node=6`)	8 (`--cpus-per-task=8`)
48	12 (`--ntasks-per-node=12`)	4 (`--cpus-per-task=4`)
48	4 (`--ntasks-per-node=4`)	12 (`--cpus-per-task=12`)
24	6 (`--ntasks-per-node=6`)	4 (`--cpus-per-task=4`)
24	4 (`--ntasks-per-node=4`)	6 (`--cpus-per-task=6`)

As we can see, the product of the number of MPI processes and the number of threads per process is equal to the number of cores that we wish to use.

This is how the script would look for the first of the cases that we show in the table:

GROMACS 48 cores and 1 GPU: 8 MPI processes and 6 OpenMP threads per process

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr

It is also convenient to use the environment variables that SLURM makes available to us:

environment variable	Value
SLURM_CPUS_PER_TASK	Threads per process. Equals the amount of `--cpus-per-task=` in the batch script.
SLURM_NTASKS_PER_NODE	MPI processes. Equals the number of `--ntasks-per-node=` in the batch script.

NAMD¶

Version	CPU	GPU
NAMD/2.14-intel-2021a-mpi	✔
NAMD/2.14-fosscuda-2020b	✔	✔

Calmodulin: NAMD

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=NAMD_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load NAMD/<version>

srun namd2 mysim.conf

Calmodulin: NAMD with GPU

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=NAMD_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load NAMD/<version>

namd2 +ppn $SLURM_NTASKS +p $SLURM_NTASKS +devices $CUDA_VISIBLE_DEVICES +idlepoll mysim.conf

AlphaFold¶

AlphaFold needs some genetic databases to run:

BFD
MGnify
PDB70
PDB (structures in the mmCIF format)
PDB seqres – only for AlphaFold-Multimer
Uniclust30
UniProt – only for AlphaFold-Multimer
UniRef90

These databases occupy a total of 2.2 TB and are located at:

/bigdisk/AlphaFold/DATA

To launch AlphaFold jobs you can use this template script:

Calmodulin: AlphaFold

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load AlphaFold/<version>

run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f input.fasta -t 2021-05-12

The run_alphafold.sh script makes it easy to run AlphaFold. These are the options that should be used to start a calculation:

usage() {
        echo ""
        echo "Please make sure all required parameters are given"
        echo "Usage: $0 <OPTIONS>"
        echo "Required Parameters:"
        echo "-d <data_dir>         Path to directory of supporting data"
        echo "-o <output_dir>       Path to a directory that will store the results."
        echo "-f <fasta_path>       Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer"
        echo "-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets"
        echo "Optional Parameters:"
        echo "-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)"
        echo "-n <openmm_threads>   OpenMM threads (default: all available cores)"
        echo "-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)"
        echo "-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multi
mer model (default: 'monomer')"
        echo "-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (
default: 'full_dbs')"
        echo "-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have
 changed (default: 'false')"
        echo "-l <is_prokaryote>    Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a proka
ryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')"
        echo "-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time
 required for inferencing many proteins (default: 'false')"
        echo ""
        exit 1
}

For example, if we wanted to fold the chain of amino acids that corresponds to a wildtype sequence of calmodulin we could use this file with the sequence:

calmodulin.fasta

The aminoacid sequence was obtained from here and the link to the file is this one.

and the script would look like this:

Calmodulin: AlphaFold

#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load AlphaFold/2.1.1-fosscuda-2020b

run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f calmodulin.fasta -t 2021-05-12

Once the process starts, one of the first things to do is MSA (Multiple sequence alignment) with JackHMMER and HHBlits. It then reads quite a bit of the downloaded databases into $ALPHAFOLD_DATA_DIR and since the disk is slow, it slows down the computation quite a bit. Soon we will acquire an SSD or M.2 disk (if possible) with enough capacity to store the databases and make reading faster.

System manager¶

Aritz Leonardo