Running your first job¶
This tutorial walks you through submitting and running your first SLURM job
on the HPC cluster.
By the end of this page, you will be able to:
- Create a job script
- Submit it to SLURM
- Monitor its execution
- Retrieve the output
Step 1 — Prepare a working directory¶
Move to your project work directory (recommended):
cd /work/<project_name>
mkdir first-job
cd first-job
Alternatively, if your environment defines it:
cd $WORK
This keeps job files organized and avoids cluttering your HOME directory.
Step 2 — Create a simple job script¶
Create a file called hello.slurm:
nano hello.slurm
Paste the following content:
#!/bin/bash
#SBATCH --job-name=hello-world
#SBATCH --output=hello.out
#SBATCH --error=hello.err
#SBATCH --time=00:05:00
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
echo "Job started on $(date)"
echo "Running on node: $(hostname)"
echo "Working directory: $(pwd)"
sleep 10
echo "Job finished on $(date)"
Save and exit the editor.
Step 3 — Submit the job¶
Submit the job to SLURM:
sbatch hello.slurm
You should see output similar to:
Submitted batch job 123456
The number returned is your job ID.
Step 4 — Monitor the job¶
Check the job status:
squeue -u $USER
Possible states:
PD— pending (waiting for resources)R— runningCD— completed
For very short jobs, the job may complete before appearing as running.
Step 5 — Check job output¶
Once the job has completed, list the files:
ls -l
You should see:
hello.out— standard outputhello.err— standard error (empty if no errors occurred)
View the output:
cat hello.out
Understanding job output files¶
-
Standard output (
--output)
Contains everything printed to the terminal (stdout) -
Standard error (
--error)
Contains error messages (stderr)
Always check both files when debugging a job.
Using SCRATCH for computations¶
For real workloads, use SCRATCH (or SCRATCH_LOCAL) for temporary data and heavy I/O.
Example pattern:
#!/bin/bash
#SBATCH --job-name=example-scratch
#SBATCH --time=01:00:00
#SBATCH --cpus-per-task=4
#SBATCH --mem=8G
SCRATCH_DIR=$SCRATCH/$SLURM_JOB_ID
mkdir -p $SCRATCH_DIR
cp /work/<project_name>/input_data.dat $SCRATCH_DIR
cd $SCRATCH_DIR
# Run your computation here
cp results.dat /work/<project_name>/
rm -rf $SCRATCH_DIR
This improves performance and reduces load on persistent filesystems.
Always copy important results to WORK before job completion.
Example: requesting GPUs¶
If your workload requires GPUs, request them explicitly:
#!/bin/bash
#SBATCH --job-name=gpu-example
#SBATCH --partition=gpu
#SBATCH --gres=gpu:1
#SBATCH --time=02:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=32G
nvidia-smi
Always request only the resources you actually need.
Common issues¶
Job stays in PENDING (PD)¶
Possible reasons:
- Requested resources are not currently available
- Walltime is too long
- Partition limits have been reached
Use:
squeue -j <job_id> -o "%i %t %r"
to see the reason.
Job fails immediately¶
Check:
- Output files (
.outand.err) - Requested resources
- Script syntax
Make sure the script is correctly submitted via sbatch.
Cleaning up¶
After jobs complete:
- Remove unnecessary output files
- Clean temporary directories
- Keep WORK organized
Good housekeeping improves overall system efficiency.
Next steps¶
- Learn how to monitor and debug jobs