Document Actions
Instructions for scalability tests
Up to Table of Contents
All users applying for 1 million and more of allocation units for one period are required by RFK to evaluate and report the scalability of their software and datasets. Based on these, results RFK can limit maximum allowed cores per job per system. This is to ensure fair and efficient use of the systems. Additionally, user can get overall impression of his software scalability and how to efficiently use his CPU quota.
Criteria
All criteria are subject to change.
Hexagon
Currently we are not enforcing minimum size of the jobs, but this can change in the future. The recommended minimum is 32 cores. See for more limitations.The minimum scaling factor allowed:
- In the range up to 512 cores, when the number of cores is doubled, the wall time must decrease by a factor of 1.5
- When the number of cores is doubled from 512 to 1024, the wall time must decrease by a factor of 1.4
- When the number of cores is doubled from 1024 to 2048, the wall time must decrease by a factor of 1.3
Stallo
There is no minimum size for jobs.
The minimum recommended scaling, is a reduction of the wall time by a factor of 1.4 when the number of cores is doubled.
Titan
Vilje
NTNU is currently reluctant to enforce a criteria on Vilje. It will take at least 6 months of experience with Vilje to come up with a reasonable criteria.
Performing tests
- Scalability tests must be run through the batch job system.
- The data
used for the test runs should reflect real production runs.
- The tests should start from a number of cores that is reasonable for the test data, and they continue until the scalability limit has been reached (max is individual per each site).
- The jobs have to be run with the same input data for each compute core count (parameters affecting the scalability can be and are encouraged to be changed).
Required information for the application form is the number of cores used and the time spent on execution in seconds (walltime). To gather the time spent on execution, you have to add 'time' command before parallel job line and use "real time" output, e.g.
time aprun -n32 ./wrf.exe #Hexagon
time mprun -n32 ./wrf.exe #Stallo and Titan
Instead of "time" command you can use IPM to gather "walltime" and get some extra profiling information. Please refer to our documentation.
Please refer to the systems user manuals for details on batch scripts. Hexagon, Stallo, Titan.
In case you have questions please contact us at mailto:support@notur.no

