Personal tools
You are here: Home NTNU Njord User Guide Introduction to Using Njord

Introduction to Using Njord

by Henrik R. Nagel last modified Aug 24, 2011 09:10 AM

Contents

 

 

How to Log In

The only way to connect to njord.hpc.ntnu.no is by secure shell (ssh), e.g. from a UNIX system

 $ ssh -l username njord.hpc.ntnu.no

(Windows users: For logging in from a MS Windows machines, we recommend using the X-WIn32 or similar. You find the setup file and license key on progdist on NTNU Orakel site: See http://www.ntnu.edu/adm/it/helpdesk/software/distribution.

Mac OS X systems are shipped with OpenSSH, so on this platform all you need is to open a terminal window and log in.

Logins are restricted to machines within Norwegian Universities and Colleges. If you try to log in from another system, for instance from your commercial internet service provider (ISP), the login will appear to hang. To work around this, you need to log in on a local University or College system, and then log into njord in a second step. We open IP ranges for direct login on request, for instance for users working from commercial companies or foreign Universities. Please send a request to support with your IP range. We do not open for access from commercial ISP's.

On first login to njord, you will get the question

  RSA key fingerprint is 75:6f:51:c7:f6:51:79:8c:c4:fc:19:69:7c:9d:db:24.
  Are you sure you want to continue connecting (yes/no)?
Check that the fingerprint is exactly as the above before answering yes to the question. The host fingerprint might also be checked after login with the command
  $ ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key
If the colon-separated fingerprint does not match exactly, your connection is hijacked.

X11 Forwarding

X11 forwarding is necessary to display editor windows (gvim, emacs, nedit, etc.) or similar from njord onto your desktop. To enable X11 forwarding, log in with the ssh X and Y options enabled,
  $ ssh -X -Y -l username njord.hpc.ntnu.no

We recommend this intro for Putty users. Refer to the OpenSSH FAQ for more information on SSH.

 

Using UNIX

After you have logged in to Njord, you are working in a UNIX environment and you must therefore use standard UNIX commands. If you don't know how to do that, then you can take one of the many online UNIX courses on the Internet, such as those at Software Carpentry:

If you prefer to look at PDF files or PowerPoint presentations, then look at this page.

 

Copying Binary Data to Njord (Endianness)

One can usually not use binary files that have been created on a PC or a Linux cluster directly on Njord. The reason for this is that Njord is a big endian computer, while PC's are little endian computers. For a full explanation of these terms, see Wikipedia's page on Endianness. However, XL Fortran might read or write little endian binary data with the ufmt_littleendian runtime option. Little endian I/O is assigned to Fortran unit numbers through the XLFRTEOPTS environment variable. To perform little endian I/O on unit 2, type

 $ export XLFRTEOPTS=ufmt_littleendian=2
in the shell or job script before running the program. A comma separated list of unit numbers and dash separated range of units is also accepted. To perform little endian I/O on units 2,5 and 10,11,12,...,20 the assignment should be written
 $ export XLFRTEOPTS=ufmt_littleendian=2,5,10-20

 

The Job Queue

In order to run a program on Njord, you must usually first submit it to a queue and then wait for the resources to become a available for you job to run. Type "llq" on the command line, in order to view the current job queue. The queue system on Njord is called LoadLeveler and is described here.

 

Introduction to Running Parallel Jobs

To run a parallel job on Njord, the operating system must be instructed on how to start and run this job.

There are two ways to run a parallel job; interactively or as a batch job.

Programs are running interactively when they run on login nodes (f02n07l or f05n07l), and as a batch jobs when they run on job nodes - controlled by a queue system (class overview).

Figure Njord Class overview

  • -Interactively running programs. - Programs in this mode are only for testing and development of software, and not for producing scientific data. This programs runs on login nodes, and shall have low cores, memory and time consumption.
  • -Batch job. - A batch job shall only run for producing scientific data (except job in the Express Queue). Batch jobs runs on free nodes and cores, and do not share node or memory with other jobs, except jobs in “small” queue. A batch job have to wait for free nodes in a queue before starting, and that can take hours. Normally a batch job runs faster than an interactive work.

See below this introduction for more details.

 

Interactively Running Programs

For program in this mode; it is recommended to initialize the job in this sequence:

1. Create a host file:

The host file is a recipe for the operating system on how to share the node and processors.

Example of a host file:

 f02n07l 
 f02n07l 
 f02n07l 
 f02n07l 
 f05n07l 
 f05n07l 
 f05n07l 
 f05n07l

The operating system pick out node in same order as it set in the host file.

The example above means that the MPI or OpenMP program first pick out 4 core from node f02n07 and then 4 core from node f05n07l. This order can be mixed as pleased.

2. Compile the code.

Examples (MPI):
C: 
   mpcc -o hellompi hellompi.c   

Fortran 90:        
   mpxlf90 -o hellompi hellompi.f90   

See Compilers, Libraries and Tools.

3. Start the interactive program with this command

 $ poe ./hellompi -procs 4 -hostfile hostfile

 

Executing Programs through Batch Jobs

 1.Compile the code.

Examples (MPI):
C: 
   mpcc -o hellompi hellompi.c    

Fortran 90:         		
   mpxlf90 -o hellompi hellompi.f90 

See Compilers, Libraries and Tools.

2.Create a batch job file (see example below)

Batch job file can look like the example below: (hellompi.sh).

(See for more details on page Batch JobsKeywords and Sample batch script).

#!/bin/ksh 
# @ job_name         	= hellompi  	
# @ account_no       	= support  	
# @ class            	= normal  	
# @ job_type         	= parallel  	
# @ node             	= 1  	
# @ tasks_per_node  	= 16  	
# @ node_usage       	= not_shared  	
# @ resources        	= ConsumableCpus(1) ConsumableMemory(832 mb)  	
# @ network.MPI      	= sn_all,,us  	

# @ error            	= $(job_name).$(jobid).err  	
# @ output           	= $(job_name).$(jobid).out  	
# @ wall_clock_limit 	= 01:00:00  	
# @ environment      	= COPY_ALL  	
# @ env_copy         	= all  	
#  	
# @ queue  	
#  	
# Create working dir (NA for this example)  	
# Copy input files (NA for this example)   
# Run program  	
$HOME/test/hellompi   
# Move results (NA for	this example)

    3. Run a batch job

       For starting the batch job:

      $ llsubmit hellompi.sh

       Print out the job queue to screen by command “llq”.

       Example of a queue:

       $ llq
        		Id        Owner      Submitted   ST PRI Class        Running on   		
    ------------------------------------------------------------------------   
    f02n02io.208057.0       myself     10/16 23:25 R  50  normal       f03n05   		
    f05n02io.208051.0       christth   10/16 23:32 R  50  normal       f04n12   		
    f02n02io.211789.0       pzinke     10/27 11:42 R  50  normal       f01n10   		
    f05n02io.212044.0       tjiputra   10/28 07:54 R  50  normal       f03n11   		
    f02n02io.214914.0       forecast   11/3  14:52 R  50  forecast     f05n06   		
    f05n02io.214913.0       forecast   11/3  14:55 R  50  forecast     f03n02   		
    .......

    If your batch job has status I (Idle) for long time then try to switch between job classes as Normal class and Large class.

    The status for the Batch Jobs is in column ST in the (llq) queue above. (For more information see Batch Job Status)

     

    Document Actions