Compiling/running

First, make sure you have all the necessary dependencies installed (Kokkos and ADIOS2 can be built in-tree with the code, so no additional configuration necessary).

Configuring & compiling¶

Clone the repository with the following command:
```
git clone --recursive https://github.com/entity-toolkit/entity.git
```
Note

For developers with write access, it is highly recommended to use ssh for cloning the repository:
```
git clone --recursive git@github.com:entity-toolkit/entity.git
```
If you have not set up your github ssh yet, please follow the instructions here. Alternatively, you can clone the repository with https as shown above.

Configure the code from the root directory using cmake, e.g.:

# from the root of the repository
cmake -B build -D pgen=<PROBLEM_GENERATOR> -D Kokkos_ENABLE_CUDA=ON <...>

All the build options are specified using the -D flag followed by the argument and its value (as shown above). Boolean options are specified as ON or OFF. The following are all the options that can be specified:

Option	Description	Values	Default
`pgen`	problem generator	see `<engine>/pgen/` directory	`dummy`
`precision`	floating point precision	`single`, `double`	`single`
`output`	enable output	`ON`, `OFF`	`OFF`
`mpi`	enable multi-node support	`ON`, `OFF`	`OFF`
`DEBUG`	enable debug mode	`ON`, `OFF`	`OFF`
`TESTS`	compile the unit tests	`ON`, `OFF`	`OFF`

Additionally, there are some CMake and other library-specific options (for Kokkos and ADIOS2) that can be specified along with the above ones. While the code picks most of these options for the end-user, some of them can/should be specified manually. In particular:

Option	Description	Values	Default
`Kokkos_ENABLE_CUDA`	enable CUDA	`ON`, `OFF`	`OFF`
`Kokkos_ENABLE_OPENMP`	enable OpenMP	`ON`, `OFF`	`OFF`
`Kokkos_ARCH_***`	use particular CPU/GPU architecture	see Kokkos documentation	`Kokkos` attempts to determine automatically

Note

When simply compiling with -D Kokkos_ENABLE_CUDA=ON without additional flags, CMake will try to deduce the GPU architecture based on the machine you are compiling on. Oftentimes this might not be the same as the architecture of the machine you are planning to run on (and sometimes the former might lack GPU altogether). To be more explicit, you can specify the GPU architecture manually using the -D Kokkos_ARCH_***=ON flags. For example, to explicitly compile for A100 GPUs, you can use -D Kokkos_ARCH_AMPERE80=ON. For V100 -- use -D Kokkos_ARCH_VOLTA70=ON.

After cmake is done configuring the code, a directory named build will be created in the root directory. You can now compile the code by running:
```
cmake --build build -j <NCORES>
```
where <NCORES> is the number of cores you want to use for the compilation (if you skip the <NCORES> and just put -j, cmake will attempt to take as many threads as possible). Note, that the -j flag is optional, and if not specified, the code will compile using a single core.
After the compilation is done, you will find the executable called entity.xc in the build/src/ directory. That's it! You can now finally run the code.

Running¶

You can run the code with the following command:

/path/to/entity.xc -input /path/to/input_file.toml

entity.xc runs headlessly, producing several diagnostic outputs. .info file contains the general information about the simulation including all the parameters used, the compiler version, the architecture, etc. .log file contains timestamps of each simulation substep and is mainly used for debugging purposes. In case the simulation fails or throws warnings, an .err file will be generated, containing the error message. The simulation also dumps a live stdout report after each successfull simulation step, which contains information about the time spent on each simulation substep, the number of active particles, and the estimated time for completion. It may look something like this:

................................................................................
Step: 1260     [of 1448]
Time: 1.7401   [Δt = 0.0014]

[SUBSTEP]                  [DURATION]  [% TOT]
  Communications............314.00 µs     9.55
  CurrentDeposit............400.00 µs    12.17
  CurrentFiltering..........803.00 µs    24.43
  Custom....................929.00 µs    28.26
  FieldBoundaries.............0.00 ns     0.00
  FieldSolver...............502.00 µs    15.27
  ParticleBoundaries..........0.00 ns     0.00
  ParticlePusher............339.00 µs    10.31
Total                         3.29 ms

Particle count:                [TOT (%)]
  species 1 (e-)........2.59e+04 ( 2.6%)
  species 2 (e+)........2.59e+04 ( 2.6%)

Average timestep: 9.57 ms
Remaining time: 1.80 s
Elapsed time: 11.12 s
[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■          ]  87.02%
................................................................................

For the Stellar Princeton cluster users

For convenience we provide precompiled libraries (kokkos and adios2) for the Stellar users. To use them, run the following:

# this line can also be added to your ~/.bashrc or ~/.zshrc for auto loading
module use --append /home/hakobyan/.modules
# see the new available modules with ...
module avail
# load ...
module load entity/cuda/...
# ... depending on the architecture
# then configuring the code is quite straightforward
cmake -B build -D pgen=...
# Kokkos_ARCH_***, Kokkos_ENABLE_CUDA, etc. are already set

Testing¶

1.0.0

To compile the unit tests, you need to specify the -D TESTS=ON flag when configuring the code with cmake. After the code is compiled, you can run the tests with the following command:

ctest --test-dir build/

You may also specify the --output-on-failure flag to see the output of the tests that failed.

To run only specific tests, you can use the -R flag followed by the regular expression that matches the test name. For example, to run all the tests that contain the word particle, you can use:

ctest --test-dir build/ -R particle

Specific architectures¶

HIP/ROCm @ AMD GPUs¶

1.1.0

Compiling on AMD GPUs is typically not an issue:

Make sure you have the ROCm library loaded: e.g., run rocminfo;
Sometimes the environment variables are not properly set up, so make sure you have the following variables properly defined:
CMAKE_PREFIX_PATH=/opt/rocm (or wherever ROCm is installed),
CC=hipcc & CXX=hipcc,
in rare occasions, you might have to also explicitly pass -D CMAKE_CXX_COMPILER=hipcc -D CMAKE_C_COMPILER=hipcc to cmake during the configuration stage;
Compile the code with proper Kokkos flags; i.e., for MI250x GPUs you would use: -D Kokkos_ENABLE_HIP=ON and -D Kokkos_ARCH_AMD_GFX90A=ON.

Now running is a bit trickier and the exact instruction might vary from machine to machine (part of it is because ROCm is much less streamlined than CUDA, but also system administrators on clusters are often more negligent towards AMD GPUs).

If you are running this on a cluster -- the first thing to do is to inspect the documentation of the cluster. There you might find the proper slurm command for requesting GPU nodes and binding each GPU to respective CPUs.
On personal machines figuring this out is a bit easier. First, inspect the output of rocminfo and rocm-smi. From there, you should be able to find the ID of the GPU you want to use. If you see more than one device -- that means you either have an additional AMD CPU, or an integrated GPU installed as well; ignore them. You will need to override two environment variables:
HSA_OVERRIDE_GFX_VERSION set to GFX version that you used to compile the code (if you used GFX1100 Kokkos flag, that would be 11.0.0);
HIP_VISIBLE_DEVICES, and ROCR_VISIBLE_DEVICES both need to be set to your device ID (usually, it's just a number from 0 to the number of devices that support HIP).

For example, the output of rocminfo | grep -A 5 "Agent " may look like this:

Agent 1                  
*******                  
  Name:                    AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
  Vendor Name:             CPU                                
--
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon™ RX 7700S             
  Vendor Name:             AMD                                
--
Agent 3                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD

In this case, the required GPU is the Agent 2, which supports GFX1100. rocm-smi will look something like this:

============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp    Power    Partitions          SCLK  MCLK     Fan    Perf  PwrCap       VRAM%  GPU%  
              (DID,     GUID)  (Edge)  (Avg)    (Mem, Compute, ID)                                                        
==========================================================================================================================
0       1     0x7480,   19047  35.0°C  0.0W     N/A, N/A, 0         0Mhz  96Mhz    29.8%  auto  100.0W       0%     0%    
1       2     0x15bf,   17218  48.0°C  19.111W  N/A, N/A, 0         None  1000Mhz  0%     auto  Unsupported  82%    5%    
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================

so the GPU we need has Device ID of 0 (since it's the dedicated GPU, it might automatically turn off when idle to save power on laptops; hence Power = 0.0W). Now we can run the code with:

HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=0 ./executable ...