Compiling/running
First, make sure you have all the necessary dependencies installed (Kokkos
and ADIOS2
can be built in-tree with the code, so no additional configuration necessary).
Configuring & compiling¶
-
Clone the repository with the following command:
git clone --recursive https://github.com/entity-toolkit/entity.git
Note
For developers with write access, it is highly recommended to use
ssh
for cloning the repository:If you have not set up your githubgit clone --recursive git@github.com:entity-toolkit/entity.git
ssh
yet, please follow the instructions here. Alternatively, you can clone the repository withhttps
as shown above. -
Configure the code from the root directory using
cmake
, e.g.:All the build options are specified using the# from the root of the repository cmake -B build -D pgen=<PROBLEM_GENERATOR> -D Kokkos_ENABLE_CUDA=ON <...>
-D
flag followed by the argument and its value (as shown above). Boolean options are specified asON
orOFF
. The following are all the options that can be specified:Option Description Values Default pgen
problem generator see <engine>/pgen/
directorydummy
precision
floating point precision single
,double
single
output
enable output ON
,OFF
OFF
mpi
enable multi-node support ON
,OFF
OFF
DEBUG
enable debug mode ON
,OFF
OFF
TESTS
compile the unit tests ON
,OFF
OFF
Additionally, there are some CMake and other library-specific options (for Kokkos and ADIOS2) that can be specified along with the above ones. While the code picks most of these options for the end-user, some of them can/should be specified manually. In particular:
Option Description Values Default Kokkos_ENABLE_CUDA
enable CUDA ON
,OFF
OFF
Kokkos_ENABLE_OPENMP
enable OpenMP ON
,OFF
OFF
Kokkos_ARCH_***
use particular CPU/GPU architecture see Kokkos documentation Kokkos
attempts to determine automaticallyNote
When simply compiling with
-D Kokkos_ENABLE_CUDA=ON
without additional flags,CMake
will try to deduce the GPU architecture based on the machine you are compiling on. Oftentimes this might not be the same as the architecture of the machine you are planning to run on (and sometimes the former might lack GPU altogether). To be more explicit, you can specify the GPU architecture manually using the-D Kokkos_ARCH_***=ON
flags. For example, to explicitly compile forA100
GPUs, you can use-D Kokkos_ARCH_AMPERE80=ON
. ForV100
-- use-D Kokkos_ARCH_VOLTA70=ON
. -
After
cmake
is done configuring the code, a directory namedbuild
will be created in the root directory. You can now compile the code by running:wherecmake --build build -j <NCORES>
<NCORES>
is the number of cores you want to use for the compilation (if you skip the<NCORES>
and just put-j
,cmake
will attempt to take as many threads as possible). Note, that the-j
flag is optional, and if not specified, the code will compile using a single core. -
After the compilation is done, you will find the executable called
entity.xc
in thebuild/src/
directory. That's it! You can now finally run the code.
Running¶
You can run the code with the following command:
/path/to/entity.xc -input /path/to/input_file.toml
entity.xc
runs headlessly, producing several diagnostic outputs. .info
file contains the general information about the simulation including all the parameters used, the compiler version, the architecture, etc. .log
file contains timestamps of each simulation substep and is mainly used for debugging purposes. In case the simulation fails or throws warnings, an .err
file will be generated, containing the error message. The simulation also dumps a live stdout report after each successfull simulation step, which contains information about the time spent on each simulation substep, the number of active particles, and the estimated time for completion. It may look something like this:
................................................................................
Step: 1260 [of 1448]
Time: 1.7401 [Δt = 0.0014]
[SUBSTEP] [DURATION] [% TOT]
Communications............314.00 µs 9.55
CurrentDeposit............400.00 µs 12.17
CurrentFiltering..........803.00 µs 24.43
Custom....................929.00 µs 28.26
FieldBoundaries.............0.00 ns 0.00
FieldSolver...............502.00 µs 15.27
ParticleBoundaries..........0.00 ns 0.00
ParticlePusher............339.00 µs 10.31
Total 3.29 ms
Particle count: [TOT (%)]
species 1 (e-)........2.59e+04 ( 2.6%)
species 2 (e+)........2.59e+04 ( 2.6%)
Average timestep: 9.57 ms
Remaining time: 1.80 s
Elapsed time: 11.12 s
[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 87.02%
................................................................................
For the Stellar Princeton cluster users
For convenience we provide precompiled libraries (kokkos
and adios2
) for the Stellar users. To use them, run the following:
# this line can also be added to your ~/.bashrc or ~/.zshrc for auto loading
module use --append /home/hakobyan/.modules
# see the new available modules with ...
module avail
# load ...
module load entity/cuda/...
# ... depending on the architecture
# then configuring the code is quite straightforward
cmake -B build -D pgen=...
# Kokkos_ARCH_***, Kokkos_ENABLE_CUDA, etc. are already set
Testing¶
1.0.0
To compile the unit tests, you need to specify the -D TESTS=ON
flag when configuring the code with cmake
. After the code is compiled, you can run the tests with the following command:
ctest --test-dir build/
You may also specify the --output-on-failure
flag to see the output of the tests that failed.
To run only specific tests, you can use the -R
flag followed by the regular expression that matches the test name. For example, to run all the tests that contain the word particle
, you can use:
ctest --test-dir build/ -R particle
Specific architectures¶
HIP/ROCm @ AMD GPUs¶
Compiling on AMD GPUs is typically not an issue:
- Make sure you have the ROCm library loaded: e.g., run
rocminfo
; -
Sometimes the environment variables are not properly set up, so make sure you have the following variables properly defined:
-
CMAKE_PREFIX_PATH=/opt/rocm
(or wherever ROCm is installed), CC=hipcc
&CXX=hipcc
,-
in rare occasions, you might have to also explicitly pass
-D CMAKE_CXX_COMPILER=hipcc -D CMAKE_C_COMPILER=hipcc
to cmake during the configuration stage; -
Compile the code with proper Kokkos flags; i.e., for MI250x GPUs you would use:
-D Kokkos_ENABLE_HIP=ON
and-D Kokkos_ARCH_AMD_GFX90A=ON
.
Now running is a bit trickier and the exact instruction might vary from machine to machine (part of it is because ROCm is much less streamlined than CUDA, but also system administrators on clusters are often more negligent towards AMD GPUs).
-
If you are running this on a cluster -- the first thing to do is to inspect the documentation of the cluster. There you might find the proper
slurm
command for requesting GPU nodes and binding each GPU to respective CPUs. -
On personal machines figuring this out is a bit easier. First, inspect the output of
rocminfo
androcm-smi
. From there, you should be able to find the ID of the GPU you want to use. If you see more than one device -- that means you either have an additional AMD CPU, or an integrated GPU installed as well; ignore them. You will need to override two environment variables: -
HSA_OVERRIDE_GFX_VERSION
set to GFX version that you used to compile the code (if you usedGFX1100
Kokkos flag, that would be11.0.0
); HIP_VISIBLE_DEVICES
, andROCR_VISIBLE_DEVICES
both need to be set to your device ID (usually, it's just a number from 0 to the number of devices that support HIP).
For example, the output of rocminfo | grep -A 5 "Agent "
may look like this:
Agent 1
*******
Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 7940HS w/ Radeon 780M Graphics
Vendor Name: CPU
--
Agent 2
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon™ RX 7700S
Vendor Name: AMD
--
Agent 3
*******
Name: gfx1100
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Agent 2
, which supports GFX1100. rocm-smi
will look something like this:
============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
==========================================================================================================================
0 1 0x7480, 19047 35.0°C 0.0W N/A, N/A, 0 0Mhz 96Mhz 29.8% auto 100.0W 0% 0%
1 2 0x15bf, 17218 48.0°C 19.111W N/A, N/A, 0 None 1000Mhz 0% auto Unsupported 82% 5%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================
Device
ID of 0
(since it's the dedicated GPU, it might automatically turn off when idle to save power on laptops; hence Power = 0.0W
). Now we can run the code with:
HSA_OVERRIDE_GFX_VERSION=11.0.0 HIP_VISIBLE_DEVICES=0 ROCR_VISIBLE_DEVICES=0 ./executable ...