Slurm is the job scheduling system on the Princeton HPC machines and most clusters we use. Very useful about Slurm can be found in the following guide by Princeton Research Computing:
For a description on where to submit calculations on the various supercomputers, please refer to 📁Filesystems.
Write the Slurm files below as submit.job and submit them as sbatch submit.job. You can check out your queued jobs using squeue -u <NetID>. To cancel a job, run scancel <JobID>. To cancel all your jobs, run scancel -u <NetID>. If you prefer to run an interactive session, you can use salloc as described in the Princeton Research Computing KnowledgeBase article.
Tiger3
Python
The following is a typical submit.job file for a serial Python calculation.
#!/bin/bash
#SBATCH --job-name=python# create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=8G# memory (up to 1 TB per node)
#SBATCH --time=00:10:00 # total run time limit(HH:MM:SS)
#SBATCH --account=rosengroup
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py>job.out
VASP
To run VASP, we modify the Slurm submission script so that we load the necessary modules and run the VASP executable via the srun command. Looking for an example to run? Check out the VASP tutorials. The files for the Si example is reproduced below.
#!/bin/bash
#SBATCH --job-name=vasp # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=112 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=512G # memory (up to 1 TB per node)
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
#SBATCH --account=rosengroup
source ~/.bashrc
module purge
module load vasp/6.5.1
srun vasp_std > vasp.out # or vasp_gam for 1x1x1 kpoints
Running a VASP calculation via ASE works essentially the same way as running VASP directly except now you call a Python script and define the VASP parallelization flags by setting the ASE_VASP_COMMAND environment variable as defined in the ASE documentation. Looking for an example ASE calculation to run? Refer to 🦾Using ASE to run VASP.
#!/bin/bash
#SBATCH --job-name=vasp # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=112 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=512G # memory (up to 1 TB per node)
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
#SBATCH --account=rosengroup
source ~/.bashrc
module purge
module load anaconda3/2024.10
module load vasp/6.5.1
conda activate cms
exportASE_VASP_COMMAND="srun vasp_std"# or "srun vasp_gam" for 1x1x1 kpts
python job.py > vasp.out
Quacc w/ VASP
The following is a submit script to run VASP with quacc. Looking for an example quacc calculation to run? Refer to 🤖Automation with Quacc.
#!/bin/bash
#SBATCH --job-name=vasp # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=112 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=512G # memory (up to 1 TB per node)
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
#SBATCH --account=rosengroup
source ~/.bashrc
module purge
module load anaconda3/2024.10
module load vasp/6.5.1
conda activate cms
exportQUACC_VASP_PARALLEL_CMD="srun -N 1 --ntasks-per-node 112"# should typically match Slurm directives
python job.py
Della
Submitting jobs on Della works essentially the same way as on Tiger except that you should not use the --account=rosengroup flag since we do not currently have a special account on Della. The CPU nodes on Della are also different, so you may need to modify the number of cores you're requesting if using a CPU. There are also GPU nodes, which require adding --gres=gpu:1 to your submission script to request 1 GPU node.
Python
CPU Tasks
Submitting a CPU-based serial Python job works exactly the same way as on Tiger. Simply remove the --account flag. Note that different CPU architectures have different cores per node on Della. Refer to the "Hardware Configuration" in the Della documentation for details. You can use the --constraint flag to make sure you land on a specific type of hardware if desired.
#!/bin/bash
#SBATCH --job-name=python # create a short name for your job
#SBATCH --nodes=1 # node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G # memory
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py > job.out
GPU Tasks
Submitting a GPU-based Python job (e.g. for ML) requires adding the --gres flag. Note that different GPUs have different CPU architectures and therefore different values to use for --ntasks-per-node. See the "Hardware Configuration" in the Della documentation for details.
#!/bin/bash
#SBATCH --job-name=python # create a short name for your job
#SBATCH --gres=gpu:1 # GPU node count
#SBATCH --nodes=1 # CPU node count
#SBATCH --ntasks-per-node=20 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=36G # memory
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py > job.out
VASP
The GPU version of VASP can be run as follows:
#!/bin/bash
#SBATCH --job-name=vasp # create a short name for your job
#SBATCH --gres=gpu:1 # GPU node count
#SBATCH --nodes=1 # CPU node count
#SBATCH --ntasks-per-node=20 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=36G # memory
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
source ~/.bashrc
module purge
module load vasp/6.5.1_gpu
srun vasp_std > vasp.out # or vasp_gam for 1x1x1 kpoints
Neuronic
Submitting jobs on Neuronic works pretty similarly to other Slurm clusters on campus. The main difference is that some of the modules may be different or absent, and you will want to specify the number of GPUs via the --gres flag.
Python
CPU Tasks
The following is a typical submission script for a CPU-based Python job.
#!/bin/bash
#SBATCH --job-name=python # create a short name for your job
#SBATCH --nodes=1 # CPU node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G # memory
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)
source ~/.bashrc
module purge
module load anaconda3/2024.02
conda activate cms
python job.py > job.out
GPU Tasks
The following is a typical submission script for a GPU-based Python job (e.g. for ML).
#!/bin/bash
#SBATCH --job-name=python # create a short name for your job
#SBATCH --gres=gpu:1 # GPU node count
#SBATCH --nodes=1 # CPU node count
#SBATCH --ntasks-per-node=1 # total number of tasks per node
#SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem=4G # memory
#SBATCH --time=00:10:00 # total run time limit (HH:MM:SS)