Computation node¶
How to send jobs to CALMODULIN¶
We will connect via SSH as follows:
What will log us into the system:
*************************************************
_ _ _ _
| | | | | |(_)
____ _____| | ____ ___ __| |_ _| | _ ____
/ ___|____ | || \ / _ \ / _ | | | | || | _ \
( (___/ ___ | || | | | |_| ( (_| | |_| | || | | | |
\____)_____|\_)_|_|_|\___/ \____|____/ \_)_|_| |_|
*************************************************
Welcome user
Date: jue nov 18 20:11:59 CET 2021
Hostname: u036898.lc.ehu.es
CPU Model: Intel(R) Xeon(R) Gold 6240R CPU @ 2.40GHz
Number of procs: 48
Total Memory: 791009840 kB
Free Memory: 743464560 kB
cluster related issues --> aritz.leonardo@ehu.eus
check the wiki for more information
Specifications¶
Compute Node | # nodes | Processor | # of cores | memory (GB) | Accelerator |
---|---|---|---|---|---|
u036898 | 1 | Intel Xeon Gold 6240R | 48 | 800 | 1x NVIDIA A40 |
Working Spaces¶
Ideally, each researcher would create a user directory in each of these spaces:
Role | mount point | Size |
---|---|---|
scratch | /scratch | 900 GB |
almacenamiento | /bigdisk | 7.3 TB |
Sending Jobs¶
The resource manager is SLURM and you can find more information on its use in this page.
QoS and partitions¶
QoS/Partition | Priority | MaxWall | MaxNodesPU | MaxJobsPU | MaxSubmitPU | MaxTRES |
---|---|---|---|---|---|---|
batch | - | INFINITY | 1 |
The columns mean the following:
MaxWall
: Maximum time that a job can run.MaxNodesPU
: The maximum number of nodes that a job can request.
Batch scripts¶
Calmodulin: Job MPI
Calmodulin: Job OpenMP
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
binary < input
Calmodulin: Job híbrido (MPI+OpenMP)
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input
Calmodulin: Job with 1 GPU
Toolchains¶
A toolchain is a coherent grouping of a series of tools that can be used to: compile, install or for the runtime of an application.
Its definition allows delimiting a work environment at the level of versions, compilers and libraries.
The description of the table indicates which are the components that we would load to the environment if we loaded an associated module to a toolchain:
Toolchain | Description |
---|---|
foss | GCC, OpenMPI, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW |
fosscuda | GCC, OpenMPI, CUDA, OpenBLAS/LAPACK, ScaLAPACK(/BLACS), FFTW |
gompi | GCC, OpenMPI |
gompic | GCC, OpenMPI, CUDA |
gcccuda | GCC, CUDA |
intel | icc, ifort, imkl, impi |
iimpi | icc, ifort, impi |
Foe example, if we loaded foss/2021a
:
We would see that the follwing modules have been loaded:
$ module list
Currently Loaded Modules:
1) GCCcore/10.3.0 6) XZ/.5.2.5-GCCcore-10.3.0 (H) 11) libevent/.2.1.12-GCCcore-10.3.0 (H) 16) OpenBLAS/0.3.15-GCC-10.3.0 21) foss/2021a
2) zlib/.1.2.11-GCCcore-10.3.0 (H) 7) libxml2/.2.9.10-GCCcore-10.3.0 (H) 12) UCX/.1.10.0-GCCcore-10.3.0 (H) 17) FlexiBLAS/3.0.4-GCC-10.3.0
3) binutils/.2.36.1-GCCcore-10.3.0 (H) 8) libpciaccess/.0.16-GCCcore-10.3.0 (H) 13) libfabric/.1.12.1-GCCcore-10.3.0 (H) 18) gompi/2021a
4) GCC/10.3.0 9) hwloc/.2.4.1-GCCcore-10.3.0 (H) 14) PMIx/.3.2.3-GCCcore-10.3.0 (H) 19) FFTW/3.3.9-gompi-2021a
5) numactl/.2.0.14-GCCcore-10.3.0 (H) 10) OpenSSL/.1.1 (H) 15) OpenMPI/4.1.1-GCC-10.3.0 20) ScaLAPACK/2.1.0-gompi-2021a-fb
Where:
H: Hidden Module
As we can see, among other dependencies, the environment has been loaded with:
- A compiler:
GCCcore/10.3.0
,GCC/10.3.0
- A particular implementatio of MPI:
OpenMPI/4.1.1-GCC-10.3.0
- Scientific libraries:
OpenBLAS/0.3.15-GCC-10.3.0
,OpenBLAS/0.3.15-GCC-10.3.0
,ScaLAPACK/2.1.0-gompi-2021a-fb
,FFTW/3.3.9-gompi-2021a
The other modules are, as a general rule, dependencies of these main modules.
Also note that some toolchains may have another as a dependency. In this case gompi/2021a
would be a subtoolchain of foss/2021a
.
Compilers¶
Generally, in multipurpose computers with x86_64 architecture, two families of compilers are used:
- GNU: the open source compilers of the GNU initiative.
- Intel: Intel’s proprietary (and free as of 2021) compilers. They are especially interesting on platforms with Intel processors, since they generate optimized machine code for their own processors.
Here’s a list of how to call these compilers.
Compiler | C | FORTRAN | C++ | MPI C | MPI FORTRAN | MPI C++ |
---|---|---|---|---|---|---|
GNU | gcc | gfortran | g++ | mpicc | mpif90 | mpicxx |
intel | icc | ifort | icpc | mpiicc | mpiifort | mpiicpc |
Software¶
QuantumESPRESSO¶
Version | CPU | GPU |
---|---|---|
QuantumESPRESSO/6.8-intel-2021 | ✔ |
Calmodulin: QuantumESPRESSO
GROMACS¶
Version | CPU | GPU |
---|---|---|
GROMACS/2016.4-fosscuda-2020b-PLUMED-2.4.0 | ✔ | ✔ |
GROMACS/2021-foss-2020b | ✔ | |
GROMACS/2021.2-fosscuda-2020b | ✔ | ✔ |
GROMACS/2021.3-foss-2021a-CUDA-11.3.1 | ✔ | ✔ |
GROMACS/2021.3-fosscuda-2020b-PLUMED-2.7.2 | ✔ | ✔ |
Here you can find documentation associated with the GROMACS versions to launch jobs efficiently. It’s interesting given that the 2016 and 2021 versions have different release options.
Versión | Documentación |
---|---|
2016 | https://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html |
2021 | https://manual.gromacs.org/documentation/2021/user-guide/mdrun-performance.html |
Calmodulin: GROMACS
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load GROMACS/<version>
srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -s input.tpr
Calmodulin: GROMACS with GPU (for versions older than 2020)
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load GROMACS/<version>
srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -bonded auto -pme auto -gpu_id 0 -s input.tpr
Calmodulin: GROMACS with GPU
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load GROMACS/<version>
srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr
In the case of GROMACS, what usually works best is choosing a balanced number of processes and threads. Here we show a table with cases in which all 48 cores of the machine (that is, the entire machine) or 24 cores (that is, half machine).
Number of total cores | Processes MPI | Threads per process |
---|---|---|
48 | 8 (--ntasks-per-node=8 ) |
6 (--cpus-per-task=6 ) |
48 | 6 (--ntasks-per-node=6 ) |
8 (--cpus-per-task=8 ) |
48 | 12 (--ntasks-per-node=12 ) |
4 (--cpus-per-task=4 ) |
48 | 4 (--ntasks-per-node=4 ) |
12 (--cpus-per-task=12 ) |
24 | 6 (--ntasks-per-node=6 ) |
4 (--cpus-per-task=4 ) |
24 | 4 (--ntasks-per-node=4 ) |
6 (--cpus-per-task=6 ) |
As we can see, the product of the number of MPI processes and the number of threads per process is equal to the number of cores that we wish to use.
This is how the script would look for the first of the cases that we show in the table:
GROMACS 48 cores and 1 GPU: 8 MPI processes and 6 OpenMP threads per process
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=GROMACS_JOB
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load GROMACS/<version>
srun gmx_mpi mdrun -ntomp $SLURM_CPUS_PER_TASK -nb auto -gpu_id 0 -s input.tpr
It is also convenient to use the environment variables that SLURM makes available to us:
environment variable | Value |
---|---|
SLURM_CPUS_PER_TASK | Threads per process. Equals the amount of --cpus-per-task= in the batch script. |
SLURM_NTASKS_PER_NODE | MPI processes. Equals the number of --ntasks-per-node= in the batch script. |
NAMD¶
Version | CPU | GPU |
---|---|---|
NAMD/2.14-intel-2021a-mpi | ✔ | |
NAMD/2.14-fosscuda-2020b | ✔ | ✔ |
Calmodulin: NAMD
Calmodulin: NAMD with GPU
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=NAMD_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load NAMD/<version>
namd2 +ppn $SLURM_NTASKS +p $SLURM_NTASKS +devices $CUDA_VISIBLE_DEVICES +idlepoll mysim.conf
AlphaFold¶
AlphaFold needs some genetic databases to run:
- BFD
- MGnify
- PDB70
- PDB (structures in the mmCIF format)
- PDB seqres – only for AlphaFold-Multimer
- Uniclust30
- UniProt – only for AlphaFold-Multimer
- UniRef90
These databases occupy a total of 2.2 TB and are located at:
To launch AlphaFold jobs you can use this template script:
Calmodulin: AlphaFold
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load AlphaFold/<version>
run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f input.fasta -t 2021-05-12
The run_alphafold.sh
script makes it easy to run AlphaFold. These are the options that should be used to start a
calculation:
usage() {
echo ""
echo "Please make sure all required parameters are given"
echo "Usage: $0 <OPTIONS>"
echo "Required Parameters:"
echo "-d <data_dir> Path to directory of supporting data"
echo "-o <output_dir> Path to a directory that will store the results."
echo "-f <fasta_path> Path to a FASTA file containing sequence. If a FASTA file contains multiple sequences, then it will be folded as a multimer"
echo "-t <max_template_date> Maximum template release date to consider (ISO-8601 format - i.e. YYYY-MM-DD). Important if folding historical test sets"
echo "Optional Parameters:"
echo "-g <use_gpu> Enable NVIDIA runtime to run with GPUs (default: true)"
echo "-n <openmm_threads> OpenMM threads (default: all available cores)"
echo "-a <gpu_devices> Comma separated list of devices to pass to 'CUDA_VISIBLE_DEVICES' (default: 0)"
echo "-m <model_preset> Choose preset model configuration - the monomer model, the monomer model with extra ensembling, monomer model with pTM head, or multi
mer model (default: 'monomer')"
echo "-c <db_preset> Choose preset MSA database configuration - smaller genetic database config (reduced_dbs) or full genetic database config (full_dbs) (
default: 'full_dbs')"
echo "-p <use_precomputed_msas> Whether to read MSAs that have been written to disk. WARNING: This will not check if the sequence, database or configuration have
changed (default: 'false')"
echo "-l <is_prokaryote> Optional for multimer system, not used by the single chain system. A boolean specifying true where the target complex is from a proka
ryote, and false where it is not, or where the origin is unknown. This value determine the pairing method for the MSA (default: 'None')"
echo "-b <benchmark> Run multiple JAX model evaluations to obtain a timing that excludes the compilation time, which should be more indicative of the time
required for inferencing many proteins (default: 'false')"
echo ""
exit 1
}
For example, if we wanted to fold the chain of amino acids that corresponds to a wildtype sequence of calmodulin we could use this file with the sequence:
The aminoacid sequence was obtained from here and the link to the file is this one.
and the script would look like this:
Calmodulin: AlphaFold
#!/bin/bash
#SBATCH --partition=batch
#SBATCH --job-name=AlphaFold_JOB
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:A40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=12
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err
module load AlphaFold/2.1.1-fosscuda-2020b
run_alphafold.sh -d $ALPHAFOLD_DATA_DIR -o ./output -f calmodulin.fasta -t 2021-05-12
Once the process starts, one of the first things to do is MSA (Multiple sequence alignment) with JackHMMER
and HHBlits
. It then reads quite a bit of the downloaded databases into $ALPHAFOLD_DATA_DIR
and since the disk is slow, it slows down the computation quite a bit. Soon we will acquire an SSD or M.2 disk (if possible) with enough capacity to store the databases and make reading faster.