PumasCluster

These are the docs for PumasCluster, also known as "Pumas for Linux Clusters".

PumasCluster Administrator Documentation

Installation steps

1. Prerequisites

Bash

Your cluster needs to have Bash available at /bin/bash.

PumasCluster installation tarball

In order to install PumasCluster, you will need the PumasCluster installation tarball. PumasAI Support will send you the tarball via a private link.

License Key

PumasAI Support will send you a license key via email or support ticket. The license key is a string of the form AAAA-AAAA-AAAA-AAAA. There is no whitespace in the license key string. Your license key is specific to you and your Linux cluster.

2. Download and extract tarball

PumasAI Support will send you the tarball via a private link. Download the tarball. Let /path/to/my/download/Pumas.tar.gz denote the location to which you have saved the tarball.

Once you have downloaded the tarball, you need to extract it. If your cluster consists of multiple nodes, you need to extract the tarball to a folder on a shared filesystem that all nodes can access.

Let ${PUMAS_INSTALL_PATH:?} denote the folder into which you want to extract the tarball. We will use the ${PUMAS_INSTALL_PATH:?} variable throughout the remainder of this document.

You can extract the tarball by doing, for example:

First become root (assuming that root permissions are required in order to create and write to ${PUMAS_INSTALL_PATH:?} ).

$ sudo -s

Now, as root:

# mkdir -p "${PUMAS_INSTALL_PATH:?}"

# tar xf "/mnt/nfs/local/home/myusername/Pumas.tar.gz" -C "${PUMAS_INSTALL_PATH:?}"

# rm "/mnt/nfs/local/home/myusername/Pumas.tar.gz"

The ${PUMAS_INSTALL_PATH:?} directory needs to be globally-readable (recursively):

# chmod -R ugo+r "${PUMAS_INSTALL_PATH:?}"

Make sure that ${PUMAS_INSTALL_PATH:?} is not globally-writable:

# chmod -R go-w "${PUMAS_INSTALL_PATH:?}"

Finally, exit your root session:

# exit

3. Environment variables and PATH

You need to set the PUMAS_INSTALL_PATH environment variable to whatever the correct value is for your cluster. For example, something like this:

export PUMAS_INSTALL_PATH="paste_your_PUMAS_INSTALL_PATH_here"

The PUMAS_INSTALL_PATH environment variable needs to be defined for all users that will be using Pumas.

You also need to add the ${PUMAS_INSTALL_PATH:?}/bin to the PATH. For example, something like this:

export PATH="${PUMAS_INSTALL_PATH:?}/bin:${PATH:?}"

The ${PUMAS_INSTALL_PATH:?}/bin directory needs to be in the PATH for all users that will be using Punas.

Do NOT add ${PUMAS_INSTALL_PATH:?} to the PATH. You only need to add ${PUMAS_INSTALL_PATH:?}/bin to the PATH.

4. Set up license

First, become root.

$ sudo -s

Now, as root, start up Pumas:

# pumas

Side note: the absolute path of the pumas launch script is ${PUMAS_INSTALL_PATH:?}/bin/pumas. But, because you added ${PUMAS_INSTALL_PATH:?}/bin to the PATH in the previous step, you can just run pumas.

Wait a minute while Pumas starts up.

You will be prompted for your license key. Paste your license key into the prompt, and then press Enter.

You will finally arrive at a Julia prompt that looks like this: julia>

Exit Julia by typing exit() and pressing Enter.

Currently, you are still root. Go into the license directory and make the license files globally writable (so that all users can write to them):

# ls
# cd "${PUMAS_INSTALL_PATH:?}/bin"
# cd "${PUMAS_INSTALL_PATH:?}/license"
# ls
# cd "${PUMAS_INSTALL_PATH:?}/license/Pumas-2.0"
# ls
# chmod 666 "${PUMAS_INSTALL_PATH:?}/license/Pumas-2.0/license.key"
# chmod 666 "${PUMAS_INSTALL_PATH:?}/license/Pumas-2.0/LicenseSpringLog.txt"

Now, exit your root session:

# exit

Now, as your regular user (non-root), launch Pumas:

$ pumas

Pumas will print some information about your license, and then you should arrive at a Julia prompt (julia>). Exit Julia by typing exit() and pressing Enter.

Other notes

  1. All nodes on your Linux cluster need to have the exact same Linux kernel version.
  2. Every time you do a Linux kernel upgrade on your cluster, you'll need to re-enter your license key. To do so, simply start Pumas as your regular user (non-root). To start Pumas, you simply run pumas. When Pumas starts up, you'll be prompted to re-enter your license key. Paste in your license key, and press Enter.

PumasCluster User Documentation: Slurm example

This is a brief tutorial for using PumasCluster. This tutorial uses the Slurm scheduler as an example.

Tutorial

Fist, create your Slurm allocation (by asking the Slurm scheduler to give you a new allocation):

# Print the queue:
squeue

# Create a new Slurm allocation with 2 nodes:
salloc -N 2

# Print the queue again.
# You should now see your currently-active allocation.
squeue

Next, start Pumas.

# Start Pumas:
pumas

Once Pumas has started, you will arrive at a Julia prompt that looks like this: julia>

Now, in your Julia session, first you need to load the PumasClusterUtilities package:

import PumasClusterUtilities

Now, get the list of all compute nodes in your currently active Slurm allocation:

julia> allocated_nodes = readlines(`scontrol show hostnames`)

Now, still in Julia, start worker processes on the nodes:

julia> PumasClusterUtilities.addprocs_pumas_slurm()

Now, still in Julia, you can load the Distributed library, load the Pumas package on all workers, and run your Pumas code:

julia> using Distributed: Distributed, @everywhere

# Print the number of workers:

julia> Distributed.nworkers()

# Print the worker ID for each worker:

julia> Distributed.workers()

# Print the hostname of each worker:

julia> @everywhere println(gethostname())
# Expected output: Each worker should print a different host name.

# Load the Pumas package.
# Make sure to use `@everywhere`, which will load the Pumas package on all workers.

julia> @everywhere import Pumas

# Now you can run your Pumas code here.


# When you are done, exit Julia:

julia> exit()

Now, you have exited Julia, and you are back in your Bash session:

# Print the queue, and find the ID of your currently-active allocation.
squeue

# End your allocation.
scancel MY_ALLOCATION_ID

# Print the queue again.
# The allocation that you just ended should no longer be listed.
squeue