Slurm is the job scheduling system on the Princeton HPC machines and most clusters we use. Very useful about Slurm can be found in the following guide by Princeton Research Computing:
Write the Slurm files below as submit.job
and submit them as sbatch submit.job
. You can check out your queued jobs using squeue -u <NetID>
. To cancel a job, run scancel <JobID>
. To cancel all your jobs, run scancel -u <NetID>
. If you prefer to run an interactive session, you can use salloc
as described in the Princeton Research Computing KnowledgeBase article . The following is a typical submit.job
file for a serial Python calculation.
#!/bin/bash
python
4G
total run time limit (HH:MM:SS)
rosengroup
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py > job.out
To run VASP, we modify the Slurm submission script so that we load the necessary modules and run the VASP executable via the srun
command. Looking for an example to run? Check out the VASP tutorials . #!/bin/bash
source ~/.bashrc
module purge
module load vasp/6.5.0
srun vasp_std > vasp.out
Running a VASP calculation via ASE works essentially the same way as running VASP directly except now you call a Python script and define the VASP parallelization flags by setting the ASE_VASP_COMMAND
environment variable as defined in the ASE documentation . Looking for an example ASE calculation to run? Refer to 🦾Using ASE to run VASP . #!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.10
module load vasp/6.5.0
conda activate cms
export ASE_VASP_COMMAND="srun vasp_std"
python job.py > vasp.out
#!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.10
module load vasp/6.5.0
conda activate cms
export QUACC_VASP_PARALLEL_CMD="srun -N 1 --ntasks-per-node 112"
python job.py
Submitting jobs on Della works essentially the same way as on Tiger except that you should not use the --account=rosengroup
flag since we do not currently have a special account on Della. The CPU nodes on Della are also different, so you may need to modify the number of cores you're requesting if using a CPU. There are also GPU nodes, which require adding --gres=gpu:1
to your submission script to request 1 GPU node.
Submitting a CPU-based serial Python job works exactly the same way as on Tiger. Simply remove the --account
flag. Note that different CPU architectures have different cores per node on Della. Refer to the "Hardware Configuration" in the Della documentation for details. You can use the --constraint
flag to make sure you land on a specific type of hardware if desired. #!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py > job.out
Submitting a GPU-based Python job (e.g. for ML) requires adding the --gres
flag. Note that different GPUs have different CPU architectures and therefore different values to use for --ntasks-per-node
. See the "Hardware Configuration" in the Della documentation for details. #!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.10
conda activate cms
python job.py > job.out
The GPU version of VASP can be run as follows:
#!/bin/bash
source ~/.bashrc
module purge
module load vasp/6.5.0_gpu
srun vasp_std > vasp.out
Submitting jobs on Neuronic works pretty similarly to other Slurm clusters on campus. The main difference is that some of the modules may be different or absent, and you will want to specify the number of GPUs via the --gres
flag.
The following is a typical submission script for a CPU-based Python job.
#!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.02
conda activate cms
python job.py > job.out
The following is a typical submission script for a GPU-based Python job (e.g. for ML).
#!/bin/bash
source ~/.bashrc
module purge
module load anaconda3/2024.02
conda activate cms
python job.py > job.out