Parsl

Overview﻿ Parsl  is a parallel scripting library with a powerful feature referred to as the pilot job model — it can be used to run many calculations in a single Slurm allocation. For instance, if you requested 10 nodes in a single Slurm allocation, you could run 10 simultaneous one-node VASP jobs.
To install parsl, run pip install parsl.
Local ParallelizationConsider the following toy example. Normally, you would expect this calculation to take a total of 10 seconds since there are two sequential 5-second add functions being called. We will pretend that the add function is a surrogate for some compute-heavy task (e.g. a DFT calculation).
import time
﻿
def add(a, b):
    time.sleep(5)
    return a + b
﻿
def workflow(a, b):
    output1 = add(a, b)
    output2 = add(a, b)
    return output1 + output2
﻿
result = workflow(1, 2)
print(result)
With Parsl, the above code could be run in only 5 seconds by parallelizing the work over two CPU cores. This is achieved by decorating the compute task with the @python_app decorator.
First, load the default Parsl configuration in an IPython console or Jupyter Notebook.
import parsl
﻿
parsl.load()
Then, in the same session run the modified code block:
import time
from parsl import python_app
﻿
@python_app
def add(a, b):
    time.sleep(5)
    return a + b
﻿
def workflow(a, b):
    output1 = add(a, b)
    output2 = add(a, b)
    return output1.result() + output2.result()
﻿
result = workflow(1, 2)
print(result)
The code will now finish in 5 seconds instead of 10. Note that add(a, b) now returns a Future object rather than the actual result. To fetch the result, one must call .result() on the Future — but note that this "blocks" the execution and Python will wait until the value is returned before continuing, so be careful where you call it.
Slurm ParallelizationNow we will extend the above example to one where the compute tasks wrapped by @python_app are run in a Slurm job rather than on the login node.
First, we must load a new configuration in an IPython console or Jupyter Notebook. The example below works on Tiger, and only the variables before the Config() call need to be modified. In this example, we are running each @python_app function on 1 CPU core on 1 node that has 112 CPU cores. We will start out by requesting 0 Slurm allocations and will have at most 1 Slurm active allocation at a time. The walltime will be set for 10 minutes. If you're running on another machine like Della, you will likely need to change the cores_per_node and remove the account keyword arguments.
import parsl
from parsl.config import Config
from parsl.dataflow.dependency_resolvers import DEEP_DEPENDENCY_RESOLVER
from parsl.executors import HighThroughputExecutor
from parsl.launchers import SrunLauncher
from parsl.providers import SlurmProvider
﻿
cores_per_job = 1
cores_per_node = 112
nodes_per_allocation = 1
min_allocations = 0
max_allocations = 1
walltime = "00:10:00"
﻿
config = Config(
    dependency_resolver=DEEP_DEPENDENCY_RESOLVER,  
    strategy="htex_auto_scale",  
    executors=[
        HighThroughputExecutor(
            label="cms_htex",  
            max_workers_per_node=cores_per_node,  
            cores_per_worker=cores_per_job,  
            provider=SlurmProvider(
                account="rosengroup",
                worker_init="source ~/.bashrc && module load anaconda3/2024.10 && conda activate cms",  
                walltime=walltime,  
                nodes_per_block=nodes_per_allocation,  
                init_blocks=0,  
                min_blocks=min_allocations,  
                max_blocks=max_allocations,  
                launcher=SrunLauncher(),  
                cmd_timeout=60,  
            ),
        )
    ],
    initialize_logging=False,  
)
﻿
parsl.load(config)
Now we once again run our workflow in the same session as the loaded Parsl configuration. You will see a single Slurm allocation be queued up, and once it runs, it will finish in 5 seconds instead of 10 seconds.
from parsl import python_app
﻿
@python_app
def add(a, b):
    import time
﻿
    time.sleep(5)
    return a + b
﻿
def workflow(a, b):
    output1 = add(a, b)
    output2 = add(a, b)
    return output1.result() + output2.result()
﻿
result = workflow(1, 2)
print(result)
In practice, the @python_app-decorated function is typically something very compute-intensive (e.g. a DFT calculation). The power of Parsl in this scenario is you can request a single "wide" Slurm allocation of many nodes and run numerous jobs within that larger allocation, continually pulling in new work as @python_app-decorated functions are called. Parsl will also automatically track task dependencies, so if you were to pass output1 into add(output1, 2), Parsl would know not to run this add() task until the task producing output1 is finished.