Commit da1a20ed authored by Giuseppe Fiameni's avatar Giuseppe Fiameni
Browse files


parent 061a20cc
Pipeline #138 canceled with stages
# intro-to-slurm
Simple submission script
\ No newline at end of file
## Creating a job
A job consists in two parts: resource requests and job steps. Resource requests consist in a number of CPUs, computing expected duration, amounts of RAM or disk space, etc. Job steps describe tasks that must be done, software which must be run.
The typical way of creating a job is to write a submission script. A submission script is a shell script, e.g. a Bash script, whose comments, if they are prefixed with SBATCH, are understood by Slurm as parameters describing resource requests and other submissions options. You can get the complete list of parameters from the sbatch manpage man sbatch.
The SBATCH directives must appear at the top of the submission file, before any other line except for the very first line which should be the shebang (e.g. #!/bin/bash).
The script itself is a job step. Other job steps are created with the **srun** command.
For instance, the following script, hypothetically named,
#SBATCH --job-name=test
#SBATCH --output=res.txt
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
srun hostname
srun sleep 60
would request one CPU for 10 minutes, along with 100 MB of RAM, in the default queue. When started, the job would run a first job step srun hostname, which will launch the UNIX command hostname on the node on which the requested CPU was allocated. Then, a second job step will start the sleep command. Note that the --job-name parameter allows giving a meaningful name to the job and the --output parameter defines the file to which the output of the job must be sent.
Once the submission script is written properly, you need to submit it to slurm through the sbatch command, which, upon success, responds with the jobid attributed to the job. (The dollar sign below is the shell prompt)
$ sbatch
sbatch: Submitted batch job 99999999
Simple submission script
The job then enters the queue in the PENDING state. Once resources become available and the job has highest priority, an allocation is created for it and it goes to the RUNNING state. If the job completes correctly, it goes to the COMPLETED state, otherwise, it is set to the FAILED state.
Interestingly, you can get near-realtime information about your running program (memory consumption, etc.) with the sstat command, by introducing sstat -j jobid. You can select what you want sstat to output with the --format parameter. Refer to the manpage for more information man sstat.
Upon completion, the output file contains the result of the commands run in the script file. In the above example, you can see it with cat res.txt command.
This example illustrates a serial job which runs a single CPU on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster. The next sections explain how to create parallel jobs.
Going parallel
There are several ways a parallel job, one whose tasks are run simultaneously, can be created:
* by running a multi-process program (SPMD paradigm, e.g. with MPI)
* by running a multithreaded program (shared memory paradigm, e.g. with OpenMP or pthreads)
* by running several instances of a single-threaded program (so-called embarrassingly parallel paradigm or a job array)
* by running one master program controlling several slave programs (master/slave paradigm)
In the Slurm context, a task is to be understood as a process. So a multi-process program is made of several tasks. By contrast, a multithreaded program is composed of only one task, which uses several CPUs.
Tasks are requested/created with the --ntasks option, while CPUs, for the multithreaded programs, are requested with the --cpus-per-task option. Tasks cannot be split across several compute nodes, so requesting several CPUs with the --cpus-per-task option will ensure all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the --ntasks option may lead to several CPUs
More submission script examples
Here are some quick sample submission scripts. For more detailed information, make sure to have a look at the Slurm FAQ and to follow our training sessions. There is also an interactive Script Generation Wizard you can use to help you in submission scripts creation.
Message passing example (MPI)
#SBATCH --job-name=test_mpi
#SBATCH --output=res_mpi.txt
#SBATCH --ntasks=4
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100
module load OpenMPI
srun hello.mpi
Request four cores on the cluster for 10 minutes, using 100 MB of RAM per core. Assuming hello.mpi was compiled with MPI support, srun will create four instances of it, on the nodes allocated by Slurm.
You can try the above example by downloading the example hello world program from Wikipedia (name it for instance wiki_mpi_example.c), and compiling it with
module load openmpi
mpicc wiki_mpi_example.c -o hello.mpi
The res_mpi.txt file should contain something like
0: We have 4 processors
0: Hello 1! Processor 1 reporting for duty
0: Hello 2! Processor 2 reporting for duty
0: Hello 3! Processor 3 reporting for duty
Shared me
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment