Stallo User Guide (old)View entire FAQ in full Up to Table of Contents
Welcome to the User Guide for Stallo. Use the navigation menu or the links to the right to navigate around in it.
Running jobs on the system
Logging on the compute nodes
Information on how to log in on a compute node.
Some times you may want to log on a compute node (for instance to check out output files on the local work area on the node), and this is also done by using SSH. From stallo.uit.no you can log in to compute-x-y the following way:
ssh -Y compute-x-y (for instance: ssh compute-5-8)
ssh -Y cx-y (for instance: ssh c5-8)
If you don't need display forwarding you can omit the "-Y" option above.
If you for instance wants to run a program interactively on a compute node, and with display forwarding to your desktop you should in stead do something like this:
ssh -Y stallo.uit.no 1) Long in on Stallo with display forwarding.This example assumes that you are running an X-server on your local desktop, which should be available for most users running Linux, Unix and Mac Os X. If you are using Windows you must install some X-server on your local PC.
Transferring data to/from stallo using ftp.
The ssh protocol (or rather the openssh implementation) has some limitations that becomes noticeable on long haul data transfers on high-speed networks like the one we have between the sites in Norway. The ftp protocol does not have these limitations and gives superior performance (10X over scp/sftp) when moving data to/from stallo from the other sites in Norway.
You need a ftp client that supports encrypted authentication.
Please note that it is only the authentication that is encrypted, the data you copy will flow unencrypted over the network so do not copy any sensitive information using ftp.
Also note that ftp-access is only available from the university networks in Norway.
Here is a list of ftp clients that is reported to work with encrypted authentication:
CLI: Command Line Interface
Clients that most probably will not work: the std. ftp client on your system, that is, the one you get when you use the command ftp, ncftp also seems to have problems.
The hostname of the ftp server is stallo-wgw.uit.no (this will change to stallo-ftp.uit.no soon).
Example using lftp on linux:
> lftp userA@stallo-wgw.uit.no lftp userA@stallo-wgw.uit.no:~> ls ........ file listing ......... lftp userA@stallo-wgw.uit.no:~> get a-file-on-the-system 84291584 bytes transferred in 3 seconds (28.98M/s)
We seem to have some problems with the openssl library that takes care of the encryption, newer versions seems to work better but we cannot change the library without recompiling a lot of other stuff so we have to live with it until we upgrade stallo this fall.
The problem gives the following error message when transferring a file using lftp:
lftp userA@stallo-wgw.uit.no:~> get filename get: Fatal error: SSL_read: wrong version number lftp userA@stallo-wgw.uit.no:~> get filename 84291584 bytes transferred in 3 seconds (29.00M/s)
As one sees, just retrying fixes the problem(??).
The PBS queuing system and job submission
About the queuing system and job submission on Snowstorm.
Job script example (Stallo)
This is an example on how a job script could be built on Stallo.
The script is available as a text file you can download.
You have to edit it to fit your own needs.
NB! If you are using Windows: be careful, use an editor that don't leave any "garbage" (often invisible in the editor itself) in the file!
Prioritizing of jobs and resource limits.
How is the priority of the jobs calculated and how much resources can a user expect to be allowed to allocate at once?
April 30th 2010, new changes to the scheduling policies. See the section on Job to node distribution.
The scheduler is set up to
No user will be allowed to have more than 168 000 cpu-hours allocated for running jobs at any time. This means that a user at most can allocate 1000 cpus for a week for concurrently running jobs (or 500 cpus for two weeks or 2000 cpus for half a week).
No single user will be allowed to use more than 200 jobs at any time. (you can well submit more, but you cannot have more than 200 running at the same time)
Users can apply for exceptions to these rules by contacting email@example.com.
Due to a large increase in demand from users we have made some changes to the job to compute node mappings. Up until April 2010 we have been running in a free for all fashion with very liberal policies as to which nodes a job would be mapped on.
Before we dive into the detail we need to say a few things about the stallo architecture.
See here for more details.
The basic philosophy for the mapping is to run the job on the nodes best suited for the task.
qsub -lnodes=4,walltime=48:00:00 ........
Will be allowed to run anywhere.
Infiniband parallel job:
qsub -lnodes=8:ppn=8:ib,walltime=240:00:00 .........
Will be mapped onto the infiniband nodes.
Ethernet parallel job:
qsub -lnodes=8:ppn=8:gige,walltime=240:00:00 .........
Will be run on the ethernet only nodes.
Single node jobs:
qsub -lnodes=1,walltime=240:00:00 ......... qsub -lnodes=1:ppn=8,walltime=240:00:00 .........
will be mapped onto gigabit ethernet nodes. This is new behaviour, earlier it would be mapped onto any free node. Also note that trying to run single node jobs on infiniband nodes will fail:
qsub -lnodes=1:ib,walltime=240:00:00 .........
This job will never be allowed to start.
qsub -q highmem -lnodes=4,pmem=14gb,walltime=240:00:00 ........
This job will run on the higmem nodes if the user is granted access by the administrators. Otherwise it will never start. Note that jobs that try to use both highmem and gigabit ethernet nodes will never start:
qsub -q highmem -lnodes=4:gige,pmem=14gb,walltime=240:00:00 ........
This job will never start.
Large file considerations.
Some special care needs to be taken if you want to create very large files on the system. With large we mean filesizes over 200GB or so.
The /global/work file system (and /global/home too) is served by a number of storage arrays that each contain smaller pieces of the file system, the size of the chunks are 2TB (2000GB) each. In the default setup each file is contained within one storage array so the default filesize limit is thus 2TB. In practice the file limit is considerably smaller as each array contains a lot of files.
Each user can change the default placement of the files it creates by striping files over several storage arrays. This is done with the following command:
lfs setstripe . 0 -1 4
after this has been done all new files created in the current directory will be spread over 4 storage arrays each having 1/4th of the file. The file can be accessed as normal no special action need to be taken. When the striping is set this way it will be defined on a per directory basis so different dirs can have different stripe setups in the same file system, new subdirs will inherit the striping from its parent at the time of creation.
We recommend users to set the stripe count so that each chunk will be approx. 200-300GB each, for example
Once a file is created the stripe count cannot be changed. This is because the physical bits of the data already are written to a certain subset of the storage arrays. However the following trick can used after one has changed the striping as described above:
# mv file file.bu # cp -a file.bu file # rm file.bu
The use of -a flag ensures that all permissions etc are preserved.
Running many short tasks.
Recommendations on how to run a lot of short tasks on the system. The overhead in the job start and cleanup makes it unpractical to run thousands of short tasks as individual jobs on Stallo.
The queueing setup on stallo, or rather, the accounting system generates overhead in the start and finish of a job of about 1 second at each end of the job. This overhead is insignificant when running large parallel jobs, but creates scaling issues when running a massive amount of shorter jobs. One can consider a collection of independent tasks as one large parallel job and the aforementioned overhead becomes the serial or unparallelizable part of the job. This is because the queuing system can only start and account one job at a time. This scaling problem is described by Amdahls Law.
Without going into any more details, let's look at the solution.
By using some shell trickery one can spawn and load-balance multiple independent task running in parallel within one node, just background the tasks and poll to see when some task is finished until you spawn the next:
for t in $tasks; do ./dowork.sh $t & activetasks=$(jobs | wc -l) while [ $activetasks -ge $maxpartasks ]; do sleep 1 activetasks=$(jobs | wc -l) done done wait
We charge for used resources, both cpu and memory.
To use the batch system you have to have a cpu quota, either local or natinoal. For every job you submit we check that you have sufficient quota to run it and you will get a warning if you do not have sufficient cpu-hours to run the job. The job will be submitted to queue, but will not start until you have enough cpu-hours to run it.
The accounting system charges for used processor equivalents (PE) times used walltime so if you ask for more than 2GB of memory per cpu you will get charged for more than the actual cpus you use.
The best way to describe PE is maybe by example: Assume that you have a node with 8 cpu-cores and 16 GB memory (as most nodes on stallo are):
if you ask for less than 2GB memory per core then PE will equal the cpu count. if you ask for 4GB memory per core then PE will be twice the cpu-count. if you ask for 16GB memory then PE=8 as you only can run one cpu per compute node.
Express queue for testing job scripts and interactive jobs.
A high priority queue called express can be used for testing and interactive jobs.
By submitting a job to the express queue you can get higher throughput for testing and shorter start up time for interactive jobs. Just use the -q express flag to submit to this queue:
qsub -q express jobscript.sh
or for an interactive job:
qsub -q express -I
This will give you a faster access if you have special needs during development, testing of job script logic or interactive use.
Jobs in the express queue will get higher priority than any other jobs in the system and will thus have a shorter queue delay than regular jobs. To prevent misuse the express queue has the following limitations:
So, it is more or less pointless to try to use the express queue to sneak regular production jobs passed the other regular jobs. Submitting a large amount of jobs to the express queue will most probably decrease the overall throughput of your jobs. Also remark that large jobs get prioritized anyway so they will most probably not benefit anything from using the express queue.
How can I submit many jobs in one command
use job arrays:
qsub -t 1-16 Myjob
will send Myjob 16 times into the queue. They can be distinguished by the value of the environmental variable
by toj000 — last modified Jan 15, 2009 11:34 AM Notur