Personal tools
You are here: Home Metacenter Metacenter documentation Metacenter User Guide Instructions for scalability tests
Document Actions

Instructions for scalability tests

Up to Table of Contents

Instructions for scalability tests.

All users applying for 1 million and more of allocation units for one period are required by RFK to evaluate and report the scalability of their software and datasets. Based on these, results RFK can limit maximum allowed cores per job per system. This is to ensure fair and efficient use of the systems. Additionally, user can get overall impression of his software scalability and how to efficiently use his CPU quota.

Criteria

All criteria are subject to change.

Hexagon

Currently we are not enforcing minimum size of the jobs, but this can change in the future. The recommended minimum is 32 cores. See for more limitations.

The minimum scaling factor allowed:

  • In the range up to 512 cores, when the number of cores is doubled, the wall time must decrease by a factor of 1.5
  • When the number of cores is doubled  from 512 to 1024, the wall time must decrease by a factor of 1.4
  • When the number of cores is doubled  from 1024 to 2048, the wall time must decrease by a factor of 1.3

Stallo

There is no minimum size for jobs.
The minimum recommended scaling, is a reduction of the wall time by a factor of 1.4 when the number of cores is doubled.

Titan

Vilje

NTNU is currently reluctant to enforce a criteria on Vilje. It will take at least 6 months of experience with Vilje to come up with a reasonable criteria.

Performing tests

  • Scalability tests must be run through the batch job system.
  • The data used for the test runs should reflect real production runs.
  • The tests should start from a number of cores that is reasonable for the test data,  and they continue until the scalability limit has been reached (max is individual per each site).
  • The jobs have to be run with the same input data for each compute core count (parameters affecting the scalability can be and are encouraged to be changed).

Required information for the application form is the number of cores used and the time spent on execution in seconds (walltime). To gather the time spent on execution, you have to add 'time' command before parallel job line and use "real time" output, e.g.

time aprun -n32 ./wrf.exe  #Hexagon

time mprun -n32 ./wrf.exe #Stallo and Titan

Instead of "time" command you can use IPM to gather "walltime" and get some extra profiling information. Please refer to our documentation.

Please refer to the systems user manuals for details on batch scripts.  Hexagon, Stallo, Titan.

In case you have questions please contact us at mailto:support@notur.no

 

by Alexander Oltu last modified Jan 18, 2012 04:18 PM