Cluster setups¶
This section goes over some instructions on how to compile & run the Entity on some of the most widely utilized clusters. While the main libraries we rely on, Kokkos and ADIOS2 can be built in-tree (i.e., together with the code when you launch the compiler), it is nonetheless recommended to pre-install them separately (if not already installed on the cluster) and use them as external dependencies, since that will significantly cut down the compilation time.
Contribute!
If you don't see a cluster you are running the code on here, please be kind to those that will come after us and contribute instructions for that specific cluster. Entity is only as strong as the community supporting it, and by contributing a few sentences, you may have an immense effect in the longrun.
Stellar cluster at Princeton University has 6 nodes with 2 NVIDIA A100 GPUs (Ampere 8.0 microarchitecture) each and 128-core AMD EPYC Rome CPUs (Zen2 microarchitecture).
Installing the dependencies
The most straightforward way to set things up on the Stellar cluster, is to use spack as described here. After downloading and initializing the shell-env, load the proper modules to-be-used during compilation:
module load gcc-toolset/10
module load cudatoolkit/12.5
module load openmpi/gcc-toolset-10/4.1.0
Then manually add the following two entries to ~/.spack/packages.yaml:
packages:
  cuda:
    buildable: false
    externals:
    - spec: cuda@12.5
      prefix: /usr/local/cuda-12.5
  openmpi:
    buildable: false
    externals:
    - spec: openmpi@4.1.0
      prefix: /usr/local/openmpi/4.1.0/gcc-toolset-10
And run spack compiler add and spack external find. Since the login nodes (on which all the libraries will be compiled) are different from the compute nodes on Stellar, you will need to allow spack to compile for non-native CPU architectures by running:
spack config add concretizer:targets:host_compatible:false
Now we can install 3 libraries we will need: HDF5, ADIOS2 and Kokkos. First create and activate a new environment:
spack env create entity-env
spack env activate entity-env
To install the packages within the spack environment, run the following commands:
spack add hdf5 +mpi +cxx target=zen2
spack add adios2 +hdf5 +pic target=zen2
spack add kokkos +cuda +wrapper cuda_arch=80 +pic +aggressive_vectorization target=zen2
You might want to first run these commands with spec instead of add to make sure spack recognizes the correct cuda & openmpi (they should be marked as [e] and should point to a local directory specified above). After add-ing you can launch the installer via spack install and wait until all installations are done.
Compiling & running the code
To compile the code, first activate the environment (if not already), then manually using both modules and spack load all the necessary libraries: 
module load gcc-toolset/10
module load cudatoolkit/12.5
module load openmpi/gcc-toolset-10/4.1.0
spack load gcc cuda openmpi kokkos adios2
During the compilation, passing any -D Kokkos_*** or -D ADIOS2_*** flags is not necessary, while -D mpi=ON/OFF is still needed, since in theory the code can also be compiled without MPI.
To run the code, the submit script should look something like this:
#!/bin/bash
#SBATCH -n 4 (1)
#SBATCH -t 00:30:00
#SBATCH -J entity-run
#SBATCH --gres=gpu:2 (2)
#SBATCH --gpus=4 (3)
# .. other sbatch directives
module load gcc-toolset/10
module load cudatoolkit/12.5
module load openmpi/gcc-toolset-10/4.1.0
. <HOME>/spack/share/spack/setup-env.sh
spack env activate entity-env
spack load gcc cuda openmpi kokkos adios2
srun entity.xc -input <INPUT>.toml
- total number of tasks (GPUs)
- requesting nodes with 2 GPUs per node
- total number of GPUs
Last updated: 4/28/2025
Zaratan cluster at the University of Maryland has 20 nodes with 4 NVIDIA A100 GPUs (Ampere 8.0) and a 128-core AMD EPYC (Zen2) CPUs each, as well as 8 nodes with NVIDIA H100 GPUs (Hopper 9.0) and Intel Xeon Platinum 8468 (Sapphire Rapids). Below, we describe how to run the code on the A100 nodes; for the H100 nodes the procedure is similar with the only exception being that different flags need to be specified when installing the Kokkos library (plus, you might need to manually specify target=<CPUARCH> as the login nodes have a different microarchitecture than the H100 compute nodes).
Installing the dependencies
We will rely on spack to compile on Zaratan. But first of all, the correct compiler should be loaded:
module load gcc/11.3.0
~/.spack/packages.yaml:
packages:
  cuda:
    buildable: false
    externals:
    - prefix: /cvmfs/hpcsw.umd.edu/spack-software/2023.11.20/linux-rhel8-x86_64/gcc-11.3.0/cuda-12.3.0-fvfg7yyq63nunqvkn7a5fzh6e77quxty
      spec: cuda@12.3
    - modules:
      - cuda/12.3
      spec: cuda@12.3
  cmake:
     buildable: false
     externals:
     - prefix: /usr
       spec: cmake@3.26.5
spack env create entity-env
spack env activate entity-env
spack add hdf5 +mpi +cxx
spack add adios2 +hdf5 +pic
spack add kokkos +cuda +wrapper cuda_arch=80 +pic +aggressive_vectorization
spack add openmpi +cuda
spack install. Now, to load the packages within the environment, do:spack load hdf5 adios2 kokkos openmpi
Compiling & running the code
Compilation of the code is performed as usual, and there is no need for any additional -D Kokkos_*** or -D ADIOS2_*** flags. The batch script for submitting the job should look like this:
#!/bin/bash
#SBATCH -p gpu
#SBATCH -t 00:30:00
#SBATCH -n 1
#SBATCH -c 1
#SBATCH --gpus=a100_1g.5gb:1
#SBATCH --output=test.out
#SBATCH --error=test.err
module load gcc/11.3.0
. <HOME>/spack/share/spack/setup-env.sh
spack env activate entity-env
spack load hdf5 kokkos adios2 cuda openmpi
mpirun ./entity.xc -input <INPUT>.toml 
Last updated: 4/29/2025
Rusty cluster at Flatiron Institute has 36 nodes with 4 NVIDIA A100-80GB each and 36 nodes with 4 NVIDIA A100-40GB GPUs each and 64-core Icelake CPUs. It also has 18 nodes with 8 NVIDIA H100-80GB GPUs each and same CPU architecture. We will use nodes with A100 GPUs for the example below.
Installing the dependencies
The most straightforward way to set things up on the Rusty cluster, is to use spack as described here. After downloading and initializing the spack shell-env, start an interactive session to make compilation faster:
srun -C a100 -p gpu -N1 -n1 -c32 --gpus-per-task=1 --pty bash -i
Next, you can create a virtual environment and activate it:
spack env create entity-env
spack env activate entity-env
Then load the proper modules to-be-used during compilation, add compiler to spack, and find external libraries:
module purge
ml modules/2.4-20250724 gcc/13.3.0 cuda/12.5.1 openmpi/cuda-4.1.8 hdf5/mpi-1.12.3
spack compiler add
spack external find
spack external find cuda
spack external find openmpi
spack external find hdf5
spack spec [library].
Now we can install 2 libraries we will need: Kokkos and ADIOS2.
spack add kokkos +cuda +wrapper cuda_arch=80 +pic +aggressive_vectorization
spack add adios2 +hdf5 +pic
spack install
Note: if you wish to install this from the login nodes, you need to allow compilation on non-native architectures and specify target architecture (linux-rocky8-icelake for A100 GPUs nodes on Rusty):
spack config add concretizer:targets:host_compatible:false
spack add kokkos +cuda +wrapper cuda_arch=80 +pic +aggressive_vectorization target=linux-rocky8-icelake 
spack add adios2 +hdf5 +pic target=linux-rocky8-icelake 
You might want to first run these commands with spec instead of add to make sure spack recognizes the correct gcc, cuda, openmpi, and hdf5 (they should be marked as [e] and should point to a local directory specified above).
Compiling & running the code
To compile the code, load the modules and activate the spack enviroment:
module purge
ml modules/2.4-20250724 gcc/13.3.0 cuda/12.5.1 openmpi/cuda-4.1.8 hdf5/mpi-1.12.3
spack env activate entity-env
During the compilation, passing any -D Kokkos_*** or -D ADIOS2_*** flags is not necessary, while -D mpi=ON/OFF is still needed, since in theory the code can also be compiled without MPI.
To run the code, the submit script should look something like this:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gpus-per-task=1
#SBATCH --cpus-per-task=16
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=1 (*)
#SBATCH --gres=gpu:4
#SBATCH --constraint=a100-80gb
#SBATCH --time=00:30:00
# .. other sbatch directives
module purge
ml modules/2.4-20250724 gcc/13.3.0 cuda/12.5.1 openmpi/cuda-4.1.8 hdf5/mpi-1.12.3
. <HOME>/spack/share/spack/setup-env.sh
spack env activate entity-env
export LD_PRELOAD=/mnt/sw/fi/cephtweaks/lib/libcephtweaks.so
export CEPHTWEAKS_LAZYIO=1
srun entity.xc -input <INPUT>.toml
(*) total number of nodes
Last updated: 10/21/2025
Vista cluster is a part of TACC research center. It consists of 600 Grace Hopper nodes, each hosting H100 GPU and 72 Grace CPUs. 
Installing the dependencies
Vista does not require any specific modules to be installed. Before compiling, the following modules should be loaded:
module load nvidia/24.7
module load cuda/12.5
module load kokkos/4.5.01-cuda
module load openmpi/5.0.5
module load adios2/2.10.2
module load phdf5/1.14.4
module load ucx/1.18.8
Compiling & running the code
The code can be then configured with the following command:
cmake -B build -D mpi=ON -D pgen=<YOUR_PGEN>  -D output=ON -D Kokkos_ENABLE_CUDA=ON -D Kokkos_ARCH90=ON -D ADIOS2_USE_CUDA=ON -D ADIOS2_USE_MPI=ON
hdf5 output format works on Vista, we advise to use BPFile, as currently hdf5 write is extremely slow with MPI for 2- and 3-dimensional problems. The sample submit script should look similar to this:
#!/bin/bash
#SBATCH -A <PROJECT NUMBER>
#SBATCH -p gh
#SBATCH -t 16:00:00 #the code will run for 16 hours
#SBATCH -N 64       # 64 nodes will be used
#SBATCH -n 64       # 64 tasks in total will be launched
#SBATCH -J your_job_name
#SBATCH --output=test.out
#SBATCH --error=test.err
export UCX_MEMTYPE_CACHE=n
export UCX_TLS=rc,cuda_copy
export UCX_IB_REG_METHODS=rcache,direct
export UCX_RNDV_MEMTYPE_CACHE=n
echo "Launching application..."
ibrun ./entity.xc -input <INPUT>.toml
Last updated: 6/19/2025
DeltaAI uses GH200 nodes. These are NVIDIA superchip nodes with 4x H100 GPUs and 4x ARM CPUs with 72 cores each.
Installing the dependencies
This makes the setup a bit more tedious, but luckily most dependencies are already installed.
You can load the installed dependencies with
module restore
module unload gcc-native
module load gcc-native/12
module load craype-accel-nvidia90
module load cray-hdf5-parallel
We would recommend installing Kokkos and ADIOS2 from source with the following settings:
# Kokkos
cmake -B build  \
    -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_CXX_EXTENSIONS=OFF \
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE \
    -D CMAKE_C_COMPILER=cc \
    -D CMAKE_CXX_COMPILER=CC \
    -D Kokkos_ARCH_ARMV9_GRACE=ON \
    -D Kokkos_ARCH_HOPPER90=ON \
    -D Kokkos_ENABLE_CUDA=ON \
    -D Kokkos_ENABLE_DEBUG=ON \
    -D CMAKE_INSTALL_PREFIX=/path/to/install/location/for/kokkos && \
cmake --build build -j && \
cmake --install build 
# ADIOS2
cmake -B build  \
    -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_CXX_EXTENSIONS=OFF \
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE \
    -D BUILD_SHARED_LIBS=ON \
    -D ADIOS2_USE_HDF5=ON \
    -D ADIOS2_USE_Python=OFF \
    -D ADIOS2_USE_Fortran=OFF \
    -D ADIOS2_USE_ZeroMQ=OFF \
    -D BUILD_TESTING=ON \
    -D CMAKE_C_COMPILER=cc \
    -D CMAKE_CXX_COMPILER=CC \
    -D ADIOS2_BUILD_EXAMPLES=OFF \
    -D ADIOS2_USE_MPI=ON \
    -D ADIOS2_USE_BLOSC=ON \
    -D HDF5_ROOT=/opt/cray/pe/hdf5-parallel \
    -D CMAKE_INSTALL_PREFIX=/path/to/install/location/for/adios2 && \
cmake --build build -j && \
cmake --install build
You can then add module files for both libraries (as described here) or add them to your path directly. Just be sure to export the relevant kokkos settings.
# in the kokkos module file
setenv  Kokkos_ENABLE_CUDA              ON
setenv  Kokkos_ARCH_ARMV9_GRACE         ON
setenv  Kokkos_ARCH_HOPPER90            ON
Compiling & running the code
DeltaAI's mpich seems to not be CUDA aware (or it's bugged), so you will always need to add the flag gpu_aware_mpi=OFF.
Your cmake setting should look something like this:
cmake -B build -D pgen=<PGEN> -D mpi=ON -D CMAKE_CXX_COMPILER=CC -D CMAKE_C_COMPILER=cc -D gpu_aware_mpi=OFF
Finally an example SLURM script using the full node looks like this:
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=72
#SBATCH --gpus-per-node=4
#SBATCH --partition=ghx4
#SBATCH --time=48:00:00
#SBATCH --gpu-bind=verbose,closest
#SBATCH --job-name=example
#SBATCH -o ./log/%x.%j.out
#SBATCH -e ./log/%x.%j.err
#SBATCH --account=your-account
module restore
module unload gcc-native
module load gcc-native/12
module load craype-accel-nvidia90
module use --append /path/to/your/.modfiles
module load kokkos/4.6.00
module load entity/cuda
module load adios2/2.10.2
module list
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_OFI_VERBOSE=1
srun ./entity.xc -input <INPUT>.toml
Last updated: 4/28/2025
Perlmutter is a DoE cluster in LBNL with 4x NVIDIA A100 and a AMD EPYC 7763 CPU on each node. Note, that two different GPU configurations are available with 40 and 80 GB of VRAM respectively.
Installing the dependencies
The easiest way to use the code here is to compile and install your own modules manually. First, load the modules you will need for that:
module load PrgEnv-gnu cray-hdf5-parallel cmake/3.24.3
Download the Kokkos source code, configure/compile and install it (in this example, we install it in the ~/opt directory.
wget https://github.com/kokkos/kokkos/releases/download/4.6.01/kokkos-4.6.01.tar.gz
tar xvf kokkos-4.6.01.tar.gz
cd kokkos-4.6.01
cmake -B build -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_BUILD_TYPE=Release \
    -D CMAKE_CXX_EXTENSIONS=OFF \
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE \
    -D CMAKE_CXX_COMPILER=CC \
    -D Kokkos_ENABLE_CUDA=ON \
    -D Kokkos_ENABLE_IMPL_CUDA_MALLOC_ASYNC=OFF \
    -D Kokkos_ARCH_ZEN3=ON \
    -D Kokkos_ARCH_AMPERE80=ON \
    -D CMAKE_INSTALL_PREFIX=$HOME/opt/kokkos/4.6.01/
cmake --build build -j
cmake --install build
Now the ADIOS2:
wget https://github.com/ornladios/ADIOS2/archive/refs/tags/v2.10.2.tar.gz
tar xvf v2.10.2.tar.gz
cd ADIOS2-2.10.2
cmake -B build -D CMAKE_CXX_STANDARD=17 \
    -D CMAKE_CXX_EXTENSIONS=OFF \
    -D CMAKE_POSITION_INDEPENDENT_CODE=TRUE \
    -D BUILD_SHARED_LIBS=ON \
    -D ADIOS2_USE_HDF5=ON \
    -D ADIOS2_USE_Python=OFF \
    -D ADIOS2_USE_Fortran=OFF \
    -D ADIOS2_USE_ZeroMQ=OFF \
    -D BUILD_TESTING=OFF \
    -D ADIOS2_BUILD_EXAMPLES=OFF \
    -D ADIOS2_USE_MPI=ON \
    -D ADIOS2_USE_BLOSC=ON \
    -D LIBFABRIC_ROOT=/opt/cray/libfabric/1.15.2.0/ \
    -D CMAKE_INSTALL_PREFIX=$HOME/opt/adios2/v2.10.2 \
    -D MPI_ROOT=/opt/cray/pe/craype/2.7.30
cmake --build build -j
cmake --install build
For simplicity, it is recommended to also create the module files (e.g., in ~/modules directory):
For kokkos:
#%Module1.0######################################################################
##
## Kokkos @ Zen3 @ Ampere80 modulefile
##
#################################################################################
proc ModulesHelp { } {
  puts stderr "\tKokkos\n"
}
module-whatis      "Sets up Kokkos @ Zen3 @ Ampere80"    
conflict           kokkos
set                basedir      /global/homes/h/<USER>/opt/kokkos/4.6.01
prepend-path       PATH         $basedir/bin
setenv             Kokkos_DIR   $basedir
setenv             Kokkos_ARCH_ZEN3 ON
setenv             Kokkos_ARCH_AMPERE80 ON
setenv             Kokkos_ENABLE_CUDA ON
For ADIOS2:
#%Module1.0######################################################################
##
## ADIOS2 modulefile
##
#################################################################################
proc ModulesHelp { } {
  puts stderr "\tADIOS2\n"
}
module-whatis      "Sets up ADIOS2"    
conflict           adios2
set                basedir      /global/homes/h/<USER>/opt/adios2/v2.10.2
prepend-path       PATH         $basedir/bin
setenv             ADIOS2_DIR   $basedir
setenv ADIOS2_USE_HDF5      ON
setenv ADIOS2_USE_MPI       ON
setenv ADIOS2_HAVE_HDF5_VOL ON
setenv MPI_ROOT             /opt/cray/pe/mpich/8.1.30/ofi/gnu/12.3
setenv HDF5_ROOT            /opt/cray/pe/hdf5-parallel/1.14.3.1/gnu/12.3
prereq cray-mpich/8.1.30 cray-hdf5-parallel/1.14.3.1
Make sure to explicitly set the paths, instead of using ~ or $HOME.
Compiling & running the code
When compiling entity itself, explicitly pass the cc and CC as compilers, i.e.:
cmake -B build ... -D CMAKE_C_COMPILER=cc -D CMAKE_CXX_COMPILER=CC
Use the following submit script for the slurm (example for 8 GPUs on 2 nodes; details on available resources):
#!/bin/bash
#SBATCH --account=<ALLOCATION>
#SBATCH --constraint=gpu
#SBATCH --qos=<QUEUE>
#SBATCH -t <TIME>
#SBATCH -N 2
#SBATCH -c 1
#SBATCH -n 8
#SBATCH --gpus=8
#SBATCH --gpus-per-task=1
#SBATCH --gpu-bind=none
# load all the modules here
export MPICH_NO_BUFFER_ALIAS_CHECK=1
export MPICH_GPU_SUPPORT_ENABLED=1
export MPICH_OFI_NIC_POLICY=GPU
export SLURM_CPU_BIND="cores"
srun ./entity.xc -input cfg.toml >report 2>error
Last updated: 6/19/2025
WIP
WIP
Aurora uses Intel PVC nodes with 6 GPUs/node. Each PVC has 128GB of memory and is split into 2 tiles. It is recommended to use 1 MPI rank per tile, so 2 per GPU and 12 per node.
Development of entity for Aurora is currently ongoing. Use the following docs with caution and check in with @LudwigBoess on potential changes.
Modules to load
You can load the installed dependencies with
module load adios2
module load autoconf cmake
The adios2 module automatically loads the related kokkos module. Please note that the adios2 module provided by ALCF does not support HDF5.
I would recommend saving the module configuration for easy loading within the PBS job:
module save entity
You can compile entity with:
cmake -B build -D pgen=<your_pgen> -D precision=single -D mpi=ON -D output=ON -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx
Running entity
Aurora uses PBS for workload management. The Intel PVC GPUs are split into two tiles each and it is recommended to launch one MPI rank per tile.
#!/bin/bash -l
#PBS -A <project_name>
#PBS -N <job_name>
#PBS -l select=1                # number of nodes to use
#PBS -l walltime=00:05:00
#PBS -l filesystems=flare       # replace with the filesystem of your project
#PBS -k doe
#PBS -l place=scatter
#PBS -q debug
NTOTRANKS=12        # 2*6*N_nodes - updated with your requested number
NRANKS_PER_NODE=12  # 2*6  - always the same
# change to directory from which job was submitted
cd $PBS_O_WORKDIR
# load all modules defined above
module restore entity
# only relevant for CPU pinning and to avoid Kokkos complaints
export OMP_PROC_BIND=spread
mpiexec --envall -n ${NTOTRANKS} --ppn ${NRANKS_PER_NODE} ./gpu_tile_compact.sh ./entity.xc -input weibel.toml
To run it you need to define a script gpu_tile_compact.sh in the same folder as your executable. It should look like this:
#!/bin/bash -l
num_gpu=6
num_tile=2
gpu_id=$(( (PALS_LOCAL_RANKID / num_tile ) % num_gpu ))
tile_id=$((PALS_LOCAL_RANKID % num_tile))
export ZE_ENABLE_PCI_ID_DEVICE_ORDER=1
export ZE_AFFINITY_MASK=$gpu_id.$tile_id
# reports the GPU tile pinning
echo “RANK= $PALS_RANKID LOCAL_RANK= $PALS_LOCAL_RANKID gpu= $gpu_id.$tile_id”
# runs the actual job
exec "$@"
Last updated: 8/11/2025
LUMI cluster is located in Finland. It is equipped with 2978 nodes with 4 AMD MI250x GPUs and a single 64 cores AMD EPYC "Trento" CPU. The required modules to be loaded are:
Compiling & running the code
module load PrgEnv-cray
module load cray-mpich
module load craype-accel-amd-gfx90a
module load rocm
module load cray-hdf5-parallel/1.12.2.11
The configuration command is standard. The Kokkos library, along with adios2, will be installed from code dependencies directly at compilation. It is also important to provide the c++ and c compilers manually with environemntal variables CC and cc (they are already predefined given that all the modules mentioned above were loaded). So far, gpu-aware mpi is not supported on LUMI. The configuration command is the following:
cmake -B build -D pgen=turbulence -D mpi=ON -D Kokkos_ENABLE_HIP=ON -D Kokkos_ARCH_AMD_GFX90A=ON -D CMAKE_CXX_COMPILER=CC -D CMAKE_C_COMPILER=cc -D gpu_aware_mpi=OFF
The example submit script:
#!/bin/bash -l
#SBATCH --job-name=examplejob   # Job name
#SBATCH --output=test.out # Name of stdout output file
#SBATCH --error=test.err  # Name of stderr error file
#SBATCH --partition=standard-g  # partition name
#SBATCH --nodes=8               # Total number of nodes 
#SBATCH --ntasks-per-node=8     # 8 MPI ranks per
#SBATCH --gpus-per-node=8
##SBATCH --mem=0
#SBATCH --time=48:00:00       # Run time (d-hh:mm:ss)
#SBATCH --account=project_<NUMBER>  # Project for billing
export MPICH_GPU_SUPPORT_ENABLED=1
srun ./entity.xc -input <INPUT>.toml
Last updated: 6/19/2025
Trillium is a large parallel cluster built by Lenovo Canada and hosted by SciNet at the University of Toronto, the GPU subcluster has 61 nodes each with 4 x Nvidia H100 SXM (80 GB memory) (HOPPER90 architecture) and 1 x AMD EPYC 9654 (Zen 4) @ 2.4 GHz, 384MB cache L3 (96 cores). Entity works largely out of the box on trillium with the exception of the HDF5 format and requiring GPU aware MPI to disabled. 
Compiling & running the code The following modules are confirmed to have worked for building, compilation, running and restarting
module load gcc/12.3 cmake/3.31.0 cuda/12.6 openmpi/4.1.5
To disable hdf5, modify the following file in the entity source directory
/path_to_src/entity/cmake/adios2Config.cmake
changing
# Format/compression support
set(ADIOS2_USE_HDF5
  OFF # <-- set this to OFF
  CACHE BOOL "Use HDF5 for ADIOS2")
When configuring ensure to set the flag
-D gpu_aware_mpi=OFF    
as the nodes are not properly configured to perform gpu to gpu direct communication (the code will still run, but errors will arise at mesh block boundaries, and the code itself will run much slower).
A typical pbs script for running entity on the gpu subcluster is
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --gpus-per-node=4
#SBATCH --ntasks-per-node=4  # Keep all GPUs active
#SBATCH --time=23:59:59
#SBATCH --partition=compute_full_node
#SBATCH -o outjob_test.o%j
#SBATCH -e outjob_test.e%j
#SBATCH -J test
module load gcc/12.3 cmake/3.31.0 cuda/12.6 openmpi/4.1.5
mpirun --map-by ppr:4:node --bind-to core ./entity.xc -input fluxtube.toml
where here we have requested 2x4 gpus for the full 24 hour wall time. Note one can request 1, 4, and 8 gpus for brief interactive debug jobs with
$debugjob
$debugjob 1
$debugjob 2
To pip install the version of nt2py which works with the adios2 output format you will need to load the following modules
module load python texlive gcc arrow/21.0.0  
Last updated: 9/12/2025
Mind the dates
At the bottom of each section, there are tags indicating when was the last date this instruction was updated. Some of them may be outdated due to clusters being constantly updated and changed. If so, please feel free to reach out with questions or contribute updated instructions.