You are here

Raijin.nci.org.au Primer

Raijin Supercomputing 

Raijin is a distributed memory compute cluster installed and run by NCI in Canberra. Intersect and NCI are partners in supercomputing. Intersect bought about 34 million SUs per year and 180TB disk space via various LIEF grants. These resources will be distributed via the resource allocation round RAR to partner institutions. Support will be done via Intersect. Email:help@intersect.org.au for any questions.

Hardware

NCI’s Raijin, named after the Shinto God of thunder, lightning and storms, is a hybrid Fujitsu Primergy and Lenovo NeXtScale high-performance, distributed-memory cluster, procured with funding from the Australian Government, through its Super Science Initiative (under the EIF Climate HPC Centre Funding Agreement between the Commonwealth of Australia and the Australian National University), as well as through the 2015-16 Agility Fund of the National Collaborative Research Infrastructure Strategy.

The original system was installed in 2012 and entered production use in June 2013. System updates entered production in January 2017. Raijin currently comprises:

  • 84,656 cores (Intel Xeon Sandy Bridge 2.6 GHz, Broadwell 2.6 GHz) in 4416 compute nodes;
  • 120 NVIDIA Tesla K80 GPUs in 30 nodes and 8 NVIDIA Tesla P100 GPUs in 2 nodes;
  • 32 Intel Xeon Phi (64 core Knights Landing, 1.3 GHz) in 32 compute nodesl;
  • 300 Terabytes of main memory;
  • Hybrid FDR/EDR Mellanox Infiniband full fat tree interconnect (up to 100 Gb/sec); and
  • 8 Petabytes of high-performance operational storage capacity.

Get started with some quick resource links here.

Useful Resources for New Rajin Users

Access the Raijin command line:

ssh username@raijin.nci.org.au

 

Help Resources

New users and projects

Rajin User Guide

Raijin and queue status

Raijin software list - Click on a software package to see a sample PBS script.

Raijin Software request form

How to access Raijin

Orange User and Project Data

Raijin Software request form

How to access Raijin

Software marked by an orange dot has license restrictions. Projects need to be added to that user group via request. Please contact us.

For project and user management NCI uses Mancini which allows you to add new users to your project, request access to license restricted software, request new projects etc.

Useful commands on Raijin:

  • nci_account -P project-code: Shows allocated and used resources. Replaces quotasu on Orange.

  • lquota: shows disk usage and quota. Replaces quota on Orange.

  • nf_limits: shows limits of walltime and memory for specified project

Use the man pages for more options of the commands above.

Queues on Raijin

Raijin has several batch queues:

Queue     

Relative
Cost 

Max Mem
in GB

Max walltime
for 16 cores

normal  
1.00
128.0
48:00
express
3.00
128.0
24:00
normalbw
1.25
256.0
48:00
expressbw
3.75
256.0
24:00
copyq
1.00
32.0
10:00* (1 core)
hugemem
1.25
1024.0
43:25
knl
0.25
192.0
12:00
gpu
3.00
256.0
48:00
queue pascal
4.0
256.0
48:00
  • The normal queue consists of nodes identical to Orange: 2x8 cores with Intel Sandy Bridge. 1 hour compute time costs 1 SU per core.

  • The express queue has the same nodes as the normal queue but jobs get a priority for which the cost is 3x as high compared to the normal queue.

  • The normalbw and expressbw are similar to the 'normal' and 'express' queues but the nodes are 2x14 cores based on Intel Broadwell. Depending on your code these nodes are faster (mainly if your code uses AVX-2). Code must be recompiled for these nodes to take advantage of the new CPU architecture. See Raijin handbook. The relative costs for the two queues are 1.25 and 3.75 respectively.

  • Raijin also has a 30 GPU nodes with 4xK80 each, 2 GPU nodes with 4xP100 each and 32 KNL nodes (64 cores per node). Please see the NCI handbook for details and how to compile the code for these nodes. Since there are only a few of these nodes check also the batch queue how many jobs are waiting.

  • In the hugemem queue jobs with up to 1TB can be run. Soon this will be upgraded to 3 nodes with 3TB each.

  • Use the copy queue to copy large data sets from /short to the MDSS archive using 1 core. See the handbook for details.

  • The walltime is usually 48h. Extensions can be negotiated. Make sure you explored the option of checkpointing your program before asking for an extension.

Storage on Raijin

Please note the following differences to Orange:

  • /home on Raijin has a quota of 2GB (60GB on Orange)

  • /short/project-code replaces /projects/project-code. Quota depends on project

  • /g/data{1,2,3} replaces /projects/project-code. Quota depends on project.

Scheduler on Raijin

Raijin uses a customised version of PBS Pro. PBS scripts have to be adapted from Orange to Raijin in order to work. Please see the software list and included sample scripts.

HPC Training Courses

Our HPC training courses will be fully updated to include everything you need to know to work productively on Raijin, including revised PBS scripts to get you started. Please visit Energy.intersect.org.au/training to find a training course near you.

More information and help

Support continues for projects under the Intersect partner share via help@intersect.org.au.

Please reach out for any help, questions or concerns at any time.

Orange User and Project Data

Orange Home Directories

Background

User home directories on Intersect's Orange system were subject to a default quota of 80GB. Raijin's /home filesystem has a user quota of 2GB per user account. With this difference in home directory quotas on the two systems it is not possible to copy the Orange home directories into place on Raijin administratively. Doing so would have the following adverse impacts:

  • many users would be immediately over quota on Raijin /home,

  • for users who already have an allocation on Raijin, placing their user /home into an over quota condition may impact any currently running or queued jobs they have, particularly if the job has output files being written back to /home.

Where is my Orange Home data stored?

Data from the Orange home filesystem data has been transferred to a distribution project area on Raijin - /g/data1/jv33/orange-home/<username>. Home directory data are read-only, and accessible only by the owner (username) of their directory. For example, the data in the directory /g/data1/jv33/orange-home/zzz111 will be accessible to user zzz111 only, with read-only access.

The data are presented as read-only to allow self-service access by users migrating from Orange to Raijin.  Users should copy relevant files they wish to preserve to their Raijin home directory. The home directory data in /g/data1/jv33/orange-home/<username> should not be used for active working data, but as a self-service location to copy out data from Orange /home.

For users who have more than 2GB of data in their Orange home directories, the following options are available:

  • copy up to 2GB of data into their Raijin /home directory;

  • copy any project related data from their Orange home directory into an appropriate project directory on Raijin; or,

  • copy any inactive data a user wishes to preserve from their Orange home directory to another storage platform or device.

Orange home directory data in /g/data1/jv33/orange-home/ will be available to users until 30 June 2017. Users should copy out any files they wish to preserve before this deadline.

How can Intersect users access their Orange home data?

Intersect users migrating to Raijin from Orange have been granted automatic membership of the jv33 project. A user can access their Intersect Orange home data through the following process:

1. Login to a Raijin login node: ssh <username>@raijin.nci.org.au

2. cd /g/data1/jv33/orange-home/<username>

Data can be then copied to the user’s Raijin home directory using standard unix utilities like cp and rsync. Note that the mv command cannot be used because the data are read-only. For users who wish to download their Orange home directory data to their their local workstation or laptop, scp and sftp access is available from the Raijin data mover nodes. These are accessible at <username>@r-dm.nci.org.au, using either the scp or sftp protocols.

Will a user be able to access another user's home data?

No. Users can only access their own data in the directory that corresponds to their username. Access is correctly enforced through the use of Access Control Lists and Permissions.

How do I copy my data from the Orange users home directory data project?

Use the following procedure to copy files from your Orange home directory archive.

  1. Login to the Raijin login nodes, by ssh <username>@raijin.nci.org.au

  2. Create a new directory in your home directory to transfer the files into. We recommend using a name such as from_orange_home. Create the directory with command mkdir ~/from_orange_home

  3. cd /g/data1/jv33/orange-home/<username>
     

  4. Identify the files you would like to copy, for example the user may have a directory 'example_data' in their Orange home directory, /g/data1/jv33/orange-home/<username>/example_data

  5. cp -Rp  /g/data1/jv33/orange-home/<username>/example_data ~/from_orange_home

The data should be now accessible in your Raijin home directory. You can identify the exact path of your home directory with the commandecho $HOME .

Orange /project Directories

Background

Project data from Orange will be copied on request from a private read-only archive to a subdirectory in the project’s /short directory on Raijin:

/short/<projectcode>/from_orange

How can users access their Orange /project data?

The steps outlined below can be used to access project data from Orange. Please note that requests for project data must be made by the Lead CI of the project, or someone authorised to act on behalf of the Lead CI.

  1. Identify your NCI project code, e.g. z99. In most cases your project code will be the same as your project code on Orange.

  2. Review and confirm the status of users attached to your project, using the NCI registration system - https://my.nci.org.au. Users who are listed but are no longer active on your project should be removed.

  3. Log a support request with help@nci.org.au, to request a copy of your Orange project data. The subject line for your request should take the form "Orange Project Data Migration Request <projectcode>".

  4. NCI system administrators will copy your data from the data holding area to /short/<project_code>/from_orange

  5. NCI system administrators will confirm that the group ownership (gid) and quota are correct.

  6. NCI staff will update the support ticket to confirm copy details and completion of the transfer.

The /short quotas for all active Intersect projects have been adjusted to accommodate additional data from the Orange read-only projects archive. (Please note that the project data archive is not stored under NCI project jv33, which is for user home data only.)

This migration procedure allows NCI and Intersect to:

  1. confirm that project data from Orange are needed on Raijin, per the Lead CI’s request;

  2. record a proper request and approval for the migration of project data;

  3. maintain data integrity and security for all projects;

  4. give each Lead CI familiarity with the Mancini system and NCI administrative procedures; and,

  5. minimise administrative overheads during the transition to Raijin.

Will the correct UIDs and GIDs be applied for the project?

Yes. The Orange /project data will be copied to the correct location, and have the correct group ownership applied before the data are exposed to members of the project. Where a file or directory resides inside a project directory, the group ownership will be set to the parent project. This is standard practice at NCI.

For example, all files within project z99 have their group ownership set to z99.

[user@raijin exampledir]$ pwd

/short/z99/exampledir

[user@raijin exampledir]$ ls -l

total 8

-rw-r----- 1 aaa111 z99    0 Mar 20 11:16 1.file

-rw-r----- 1 bbb111 z99    0 Mar 20 11:16 2.file

drwxr-s--- 2 aaa111 z99 4096 Mar 20 11:17 A_directory

drwxr-s--- 2 ccc111 z99 4096 Mar 20 11:17 B_directory

[user@raijin exampledir]$