README.md 4.97 KB
Newer Older
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
1
2
# intro-to-slurm

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
3
4
5
6
7
8
9
10
## Creating a job

A job consists in two parts: resource requests and job steps. Resource requests consist in a number of CPUs, computing expected duration, amounts of RAM or disk space, etc. Job steps describe tasks that must be done, software which must be run.

The typical way of creating a job is to write a submission script. A submission script is a shell script, e.g. a Bash script, whose comments, if they are prefixed with SBATCH, are understood by Slurm as parameters describing resource requests and other submissions options. You can get the complete list of parameters from the sbatch manpage man sbatch.

**Important**

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
11
The SBATCH directives must appear at the top of the submission file, before any other line except for the very first line which should be the shebang (e.g. `#!/bin/bash`).
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
12

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
13
The script itself is a job step. Other job steps are created with the `srun` command.
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
14
15
16
17
18
19
20

For instance, the following script, hypothetically named submit.sh,

```
#!/bin/bash
#
#SBATCH --job-name=test
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
21
#SBATCH --output=res.out
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
22
23
24
25
26
27
28
29
30
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

srun hostname
srun sleep 60
```

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
31
would request one CPU for 10 minutes, along with 100 MB of RAM, in the default queue. When started, the job would run a first job step srun hostname, which will launch the UNIX command hostname on the node on which the requested CPU was allocated. Then, a second job step will start the sleep command. Note that the `--job-name` parameter allows giving a meaningful name to the job and the `--output` parameter defines the file to which the output of the job must be sent.
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
32
33
34
35
36
37
38
39
40

Once the submission script is written properly, you need to submit it to slurm through the sbatch command, which, upon success, responds with the jobid attributed to the job. (The dollar sign below is the shell prompt)

```
$ sbatch submit.sh
sbatch: Submitted batch job 99999999
Simple submission script
```

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
41
The job then enters the queue in the **PENDING** state. Once resources become available and the job has highest priority, an allocation is created for it and it goes to the **RUNNING** state. If the job completes correctly, it goes to the **COMPLETED** state, otherwise, it is set to the **FAILED** state.
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
42
43
44

Interestingly, you can get near-realtime information about your running program (memory consumption, etc.) with the sstat command, by introducing sstat -j jobid. You can select what you want sstat to output with the --format parameter. Refer to the manpage for more information man sstat.

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
45
Upon completion, the output file contains the result of the commands run in the script file. In the above example, you can see it with `cat res.out` command.
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
46
47
48

This example illustrates a serial job which runs a single CPU on a single node. It does not take advantage of multi-processor nodes or the multiple compute nodes available with a cluster. The next sections explain how to create parallel jobs.

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
49
50
# Going parallel

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
51
52
53
54
55
56
57
58
59
There are several ways a parallel job, one whose tasks are run simultaneously, can be created:

* by running a multi-process program (SPMD paradigm, e.g. with MPI)
* by running a multithreaded program (shared memory paradigm, e.g. with OpenMP or pthreads)
* by running several instances of a single-threaded program (so-called embarrassingly parallel paradigm or a job array)
* by running one master program controlling several slave programs (master/slave paradigm)

In the Slurm context, a task is to be understood as a process. So a multi-process program is made of several tasks. By contrast, a multithreaded program is composed of only one task, which uses several CPUs.

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
60
Tasks are requested/created with the `--ntasks` option, while CPUs, for the multithreaded programs, are requested with the `--cpus-per-task` option. Tasks cannot be split across several compute nodes, so requesting several CPUs with the --cpus-per-task option will ensure all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the `--ntasks` option may lead to several CPUs
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
61

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
62
63
## More submission script examples

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
64
Here are some quick sample submission scripts. 
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
65

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
66
67
68
**Message passing example (MPI)**

```
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
69
#!/bin/bash
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
70

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
71
72
#SBATCH --job-name=test_mpi
#SBATCH --output=res_mpi.txt
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
73
74
75

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=36
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
76
77
78
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=100

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
79
80
81
82
83
84
85
86
87
#SBATCH --account=train_ceudat19
#SBATCH --partition=gll_usr_prod

# #SBATCH --reservation=s_tra_eudat


module load autoload intelmpi
srun hello_mpi_world

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
88
89
```

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
90
Request four cores on the cluster for 10 minutes, using 100 MB of RAM per core. Assuming `hello_mpi_world` was compiled with MPI support, `srun` will create four instances of it, on the nodes allocated by Slurm.
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
91

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
92
You can try the above example by using the example hello world program (`hello_mpi_world.c`), and compiling it with
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
93

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
94
```
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
95
96
module load autoload intelmpi
mpicc hello_mpi_world.c -o hello_mpi_world
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
97
```
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
98

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
99
The `res_mpi.out` file should contain something like:
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
100

Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
101
102
103
104
105
```
We have 4 processors
Processor 1 reporting for duty
Processor 2 reporting for duty
Processor 3 reporting for duty
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
106
```
Giuseppe Fiameni's avatar
Giuseppe Fiameni committed
107