Skip to content

Computation node

How to send jobs to CALMODULIN

We will connect via SSH as follows:

$ ssh username@calmodulin.lc.ehu.es
$ ssh username@u036898.lc.ehu.es

What will log us into the system:

  ************************************************* 
             _                  _       _  _        
            | |                | |     | |(_)       
  ____ _____| | ____   ___   __| |_   _| | _ ____   
 / ___|____ | ||    \ / _ \ / _  | | | | || |  _ \  
( (___/ ___ | || | | | |_| ( (_| | |_| | || | | | | 
 \____)_____|\_)_|_|_|\___/ \____|____/ \_)_|_| |_| 

  ************************************************* 


 Welcome user

 Date: jue nov 18 20:11:59 CET 2021

 Hostname:     u036898.lc.ehu.es
 CPU Model:    Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz

 Number of procs:      48
 Total Memory:         791009840 kB
 Free Memory:          743464560 kB

 cluster related issues --> aritz.leonardo@ehu.eus 
 check the wiki for more information

Specifications

Compute Node # nodes Processor # of cores memory (GB) Accelerator
u036898 1 Intel Xeon Gold 6240R 48 800 1x NVIDIA A40

Working Spaces

Ideally, each researcher would create a user directory in each of these spaces:

Role mount point Size
scratch /scratch 900 GB
almacenamiento /bigdisk 7.3 TB

Sending Jobs

The resource manager is SLURM and you can find more information on its use in this page.

QoS and partitions

QoS/Partition Priority MaxWall MaxNodesPU MaxJobsPU MaxSubmitPU MaxTRES
batch - INFINITY 1

The columns mean the following:

  • MaxWall: Maximum time that a job can run.
  • MaxNodesPU: The maximum number of nodes that a job can request.

Batch scripts

Calmodulin: Job MPI
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

srun binario < input 
Calmodulin: Job OpenMP
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

binary < input
Calmodulin: Job híbrido (MPI+OpenMP)
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input
Calmodulin: Job with 1 GPU
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load program/program_version

srun binary < input

Toolchains

A toolchain is a coherent grouping of a series of tools that can be used to: compile, install or for the runtime of an application.

Its definition allows delimiting a work environment at the level of versions, compilers and libraries.

The description of the table indicates which are the components that we would load to the environment if we loaded an associated module to a toolchain:

Toolchain Description
foss GCC, OpenMPI, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW
fosscuda GCC, OpenMPI, CUDA, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW
gompi GCC, OpenMPI
gompic GCC, OpenMPI, CUDA
gcccuda GCC, CUDA
intel icc, ifort, imkl, impi
iimpi icc, ifort, impi

Foe example, if we loaded foss/2021a:

$ module load foss/2021a

We would see that the follwing modules have been loaded:

$ module list

Currently Loaded Modules:
  1) GCCcore/10.3.0                        6) XZ/.5.2.5-GCCcore-10.3.0          (H)  11) libevent/.2.1.12-GCCcore-10.3.0  (H)  16) OpenBLAS/0.3.15-GCC-10.3.0      21) foss/2021a
  2) zlib/.1.2.11-GCCcore-10.3.0     (H)   7) libxml2/.2.9.10-GCCcore-10.3.0    (H)  12) UCX/.1.10.0-GCCcore-10.3.0       (H)  17) FlexiBLAS/3.0.4-GCC-10.3.0
  3) binutils/.2.36.1-GCCcore-10.3.0 (H)   8) libpciaccess/.0.16-GCCcore-10.3.0 (H)  13) libfabric/.1.12.1-GCCcore-10.3.0 (H)  18) gompi/2021a
  4) GCC/10.3.0                            9) hwloc/.2.4.1-GCCcore-10.3.0       (H)  14) PMIx/.3.2.3-GCCcore-10.3.0       (H)  19) FFTW/3.3.9-gompi-2021a
  5) numactl/.2.0.14-GCCcore-10.3.0  (H)  10) OpenSSL/.1.1                      (H)  15) OpenMPI/4.1.1-GCC-10.3.0              20) ScaLAPACK/2.1.0-gompi-2021a-fb

  Where:
   H:  Hidden Module

As we can see, among other dependencies, the environment has been loaded with:

  • A compiler: GCCcore/10.3.0, GCC/10.3.0
  • A particular implementatio of MPI: OpenMPI/4.1.1-GCC-10.3.0
  • Scientific libraries: OpenBLAS/0.3.15-GCC-10.3.0, OpenBLAS/0.3.15-GCC-10.3.0, ScaLAPACK/2.1.0-gompi-2021a-fb, FFTW/3.3.9-gompi-2021a

The other modules are, as a general rule, dependencies of these main modules.

Also note that some toolchains may have another as a dependency. In this case gompi/2021a would be a subtoolchain of foss/2021a.

Compilers

Generally, in multipurpose computers with x86_64 architecture, two families of compilers are used:

  • GNU: the open source compilers of the GNU initiative.
  • Intel: Intel’s proprietary (and free as of 2021) compilers. They are especially interesting on platforms with Intel processors, since they generate optimized machine code for their own processors.

Here’s a list of how to call these compilers.

Compiler C FORTRAN C++ MPI C MPI FORTRAN MPI C++
GNU gcc gfortran g++ mpicc mpif90 mpicxx
intel icc ifort icpc mpiicc mpiifort mpiicpc

Software

QuantumESPRESSO

Version CPU GPU
QuantumESPRESSO/6.8-intel-2021
Calmodulin: QuantumESPRESSO
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=QuantumESPRESSO_JOB
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load QuantumESPRESSO/<version>

srun pw.x < input.in

GROMACS

Version CPU GPU
GROMACS/2016.4-fosscuda-2020b-PLUMED-2.4.0
GROMACS/2021-foss-2020b
GROMACS/2021.2-fosscuda-2020b
GROMACS/2021.3-foss-2021a-CUDA-11.3.1
GROMACS/2021.3-fosscuda-2020b-PLUMED-2.7.2

Here you can find documentation associated with the GROMACS versions to launch jobs efficiently. It’s interesting given that the 2016 and 2021 versions have different release options.

Versión Documentación
2016 https://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html
2021 https://manual.gromacs.org/documentation/2021/user-guide/mdrun-performance.html
Calmodulin: GROMACS
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -s input.tpr
Calmodulin: GROMACS with GPU (for versions older than 2020)
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -bonded auto -pme auto -gpu_id 0 -s input.tpr
Calmodulin: GROMACS with GPU
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr

In the case of GROMACS, what usually works best is choosing a balanced number of processes and threads. Here we show a table with cases in which all 48 cores of the machine (that is, the entire machine) or 24 cores (that is, half machine).

Number of total cores Processes MPI Threads per process
48 8 (--ntasks-per-node=8) 6 (--cpus-per-task=6)
48 6 (--ntasks-per-node=6) 8 (--cpus-per-task=8)
48 12 (--ntasks-per-node=12) 4 (--cpus-per-task=4)
48 4 (--ntasks-per-node=4) 12 (--cpus-per-task=12)
24 6 (--ntasks-per-node=6) 4 (--cpus-per-task=4)
24 4 (--ntasks-per-node=4) 6 (--cpus-per-task=6)

As we can see, the product of the number of MPI processes and the number of threads per process is equal to the number of cores that we wish to use.

This is how the script would look for the first of the cases that we show in the table:

GROMACS 48 cores and 1 GPU: 8 MPI processes and 6 OpenMP threads per process
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load GROMACS/<version>

srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr

It is also convenient to use the environment variables that SLURM makes available to us:

environment variable Value
SLURM_CPUS_PER_TASK Threads per process. Equals the amount of --cpus-per-task= in the batch script.
SLURM_NTASKS_PER_NODE MPI processes. Equals the number of --ntasks-per-node= in the batch script.

NAMD

Version CPU GPU
NAMD/2.14-intel-2021a-mpi
NAMD/2.14-fosscuda-2020b
Calmodulin: NAMD
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=NAMD_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load NAMD/<version>

srun namd2 mysim.conf
Calmodulin: NAMD with GPU
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=NAMD_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load NAMD/<version>

namd2 +ppn $SLURM_NTASKS +p $SLURM_NTASKS +devices $CUDA_VISIBLE_DEVICES +idlepoll mysim.conf

AlphaFold

AlphaFold needs some genetic databases to run:

  • BFD
  • MGnify
  • PDB70
  • PDB (structures in the mmCIF format)
  • PDB seqres – only for AlphaFold-Multimer
  • Uniclust30
  • UniProt – only for AlphaFold-Multimer
  • UniRef90

These databases occupy a total of 2.2 TB and are located at:

/bigdisk/AlphaFold/DATA

To launch AlphaFold jobs you can use this template script:

Calmodulin: AlphaFold
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load AlphaFold/<version>

run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f input.fasta -t 2021-05-12

The run_alphafold.sh script makes it easy to run AlphaFold. These are the options that should be used to start a calculation:

usage() {
        echo ""
        echo "Please make sure all required parameters are given"
        echo "Usage: $0 <OPTIONS>"
        echo "Required Parameters:"
        echo "-d <data_dir>         Path to directory of supporting data"
        echo "-o <output_dir>       Path to a directory that will store the results."
        echo "-f <fasta_path>       Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer"
        echo "-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets"
        echo "Optional Parameters:"
        echo "-g <use_gpu>          Enable NVIDIA runtime to run with GPUs (default: true)"
        echo "-n <openmm_threads>   OpenMM threads (default: all available cores)"
        echo "-a <gpu_devices>      Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)"
        echo "-m <model_preset>     Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multi
mer model (default: 'monomer')"
        echo "-c <db_preset>        Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (
default: 'full_dbs')"
        echo "-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have
 changed (default: 'false')"
        echo "-l <is_prokaryote>    Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a proka
ryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')"
        echo "-b <benchmark>        Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time
 required for inferencing many proteins (default: 'false')"
        echo ""
        exit 1
}

For example, if we wanted to fold the chain of amino acids that corresponds to a wildtype sequence of calmodulin we could use this file with the sequence:

The aminoacid sequence was obtained from here and the link to the file is this one.

and the script would look like this:

Calmodulin: AlphaFold
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

module load AlphaFold/2.1.1-fosscuda-2020b

run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f calmodulin.fasta -t 2021-05-12

Once the process starts, one of the first things to do is MSA (Multiple sequence alignment) with JackHMMER and HHBlits. It then reads quite a bit of the downloaded databases into $ALPHAFOLD_DATA_DIR and since the disk is slow, it slows down the computation quite a bit. Soon we will acquire an SSD or M.2 disk (if possible) with enough capacity to store the databases and make reading faster.

System manager