Introduction to Using Njord
Contents
- How to Log In
- Using UNIX
- Copying Binary Data to Njord (Endianess)
- The Job Queue
- Introduction to Running Parallel Jobs
- Interactively Running Programs
- Executing programs through batchjobs"
How to Log In
The only way to connect to njord.hpc.ntnu.no is by secure shell (ssh), e.g. from a UNIX system
$ ssh -l username njord.hpc.ntnu.no
(Windows users: For logging in from a MS Windows machines, we recommend using the X-WIn32 or similar. You find the setup file and license key on progdist on NTNU Orakel site: See http://www.ntnu.edu/adm/it/helpdesk/software/distribution.
Mac OS X systems are shipped with OpenSSH, so on this platform all you need is to open a terminal window and log in.
Logins are restricted to machines within Norwegian Universities and Colleges. If you try to log in from another system, for instance from your commercial internet service provider (ISP), the login will appear to hang. To work around this, you need to log in on a local University or College system, and then log into njord in a second step. We open IP ranges for direct login on request, for instance for users working from commercial companies or foreign Universities. Please send a request to support with your IP range. We do not open for access from commercial ISP's.
On first login to njord, you will get the question
RSA key fingerprint is 75:6f:51:c7:f6:51:79:8c:c4:fc:19:69:7c:9d:db:24. Are you sure you want to continue connecting (yes/no)?Check that the fingerprint is exactly as the above before answering yes to the question. The host fingerprint might also be checked after login with the command
$ ssh-keygen -l -f /etc/ssh/ssh_host_rsa_keyIf the colon-separated fingerprint does not match exactly, your connection is hijacked.
X11 Forwarding
X11 forwarding is necessary to display editor windows (gvim, emacs, nedit, etc.) or similar from njord onto your desktop. To enable X11 forwarding, log in with the ssh X and Y options enabled,$ ssh -X -Y -l username njord.hpc.ntnu.no
We recommend this intro for Putty users. Refer to the OpenSSH FAQ for more information on SSH.
Using UNIX
After you have logged in to Njord, you are working in a UNIX environment and you must therefore use standard UNIX commands. If you don't know how to do that, then you can take one of the many online UNIX courses on the Internet, such as those at Software Carpentry:
If you prefer to look at PDF files or PowerPoint presentations, then look at this page.
Copying Binary Data to Njord (Endianness)
One can usually not use binary files that have been created on a PC or a Linux cluster directly on Njord. The reason for this is that Njord is a big endian computer, while PC's are little endian computers. For a full explanation of these terms, see Wikipedia's page on Endianness. However, XL Fortran might read or write little endian binary data with the ufmt_littleendian runtime option. Little endian I/O is assigned to Fortran unit numbers through the XLFRTEOPTS environment variable. To perform little endian I/O on unit 2, type
$ export XLFRTEOPTS=ufmt_littleendian=2in the shell or job script before running the program. A comma separated list of unit numbers and dash separated range of units is also accepted. To perform little endian I/O on units 2,5 and 10,11,12,...,20 the assignment should be written
$ export XLFRTEOPTS=ufmt_littleendian=2,5,10-20
The Job Queue
In order to run a program on Njord, you must usually first submit it to a queue and then wait for the resources to become a available for you job to run. Type "llq" on the command line, in order to view the current job queue. The queue system on Njord is called LoadLeveler and is described here.
Introduction to Running Parallel Jobs
To run a parallel job on Njord, the operating system must be instructed on how to start and run this job.
There are two ways to run a parallel job; interactively or as a batch job.
Programs are running interactively when they run on login nodes (f02n07l or f05n07l), and as a batch jobs when they run on job nodes - controlled by a queue system (class overview).

- -Interactively running programs. - Programs in this mode are only for testing and development of software, and not for producing scientific data. This programs runs on login nodes, and shall have low cores, memory and time consumption.
- -Batch job. - A batch job shall only run for producing scientific data (except job in the Express Queue). Batch jobs runs on free nodes and cores, and do not share node or memory with other jobs, except jobs in “small” queue. A batch job have to wait for free nodes in a queue before starting, and that can take hours. Normally a batch job runs faster than an interactive work.
See below this introduction for more details.
Interactively Running Programs
For program in this mode; it is recommended to initialize the job in this sequence:
1. Create a host file:
The host file is a recipe for the operating system on how to share the node and processors.
Example of a host file:
f02n07l f02n07l f02n07l f02n07l f05n07l f05n07l f05n07l f05n07l
The operating system pick out node in same order as it set in the host file.
The example above means that the MPI or OpenMP program first pick out 4 core from node f02n07 and then 4 core from node f05n07l. This order can be mixed as pleased.
2. Compile the code.
Examples (MPI): C: mpcc -o hellompi hellompi.c Fortran 90: mpxlf90 -o hellompi hellompi.f90 See Compilers, Libraries and Tools.
3. Start the interactive program with this command
$ poe ./hellompi -procs 4 -hostfile hostfile
Executing Programs through Batch Jobs
1.Compile the code.
Examples (MPI): C: mpcc -o hellompi hellompi.c Fortran 90: mpxlf90 -o hellompi hellompi.f90 See Compilers, Libraries and Tools.
2.Create a batch job file (see example below)
Batch job file can look like the example below: (hellompi.sh).
(See for more details on page Batch Jobs; Keywords and Sample batch script).
#!/bin/ksh # @ job_name = hellompi # @ account_no = support # @ class = normal # @ job_type = parallel # @ node = 1 # @ tasks_per_node = 16 # @ node_usage = not_shared # @ resources = ConsumableCpus(1) ConsumableMemory(832 mb) # @ network.MPI = sn_all,,us # @ error = $(job_name).$(jobid).err # @ output = $(job_name).$(jobid).out # @ wall_clock_limit = 01:00:00 # @ environment = COPY_ALL # @ env_copy = all # # @ queue # # Create working dir (NA for this example) # Copy input files (NA for this example) # Run program $HOME/test/hellompi # Move results (NA for this example)
3. Run a batch job
For starting the batch job:
$ llsubmit hellompi.sh
Print out the job queue to screen by command “llq”.
Example of a queue:
$ llq Id Owner Submitted ST PRI Class Running on ------------------------------------------------------------------------ f02n02io.208057.0 myself 10/16 23:25 R 50 normal f03n05 f05n02io.208051.0 christth 10/16 23:32 R 50 normal f04n12 f02n02io.211789.0 pzinke 10/27 11:42 R 50 normal f01n10 f05n02io.212044.0 tjiputra 10/28 07:54 R 50 normal f03n11 f02n02io.214914.0 forecast 11/3 14:52 R 50 forecast f05n06 f05n02io.214913.0 forecast 11/3 14:55 R 50 forecast f03n02 .......
If your batch job has status I (Idle) for long time then try to switch between job classes as Normal class and Large class.
The status for the Batch Jobs is in column ST in the (llq) queue above. (For more information see Batch Job Status)

