The Pawsey Supercomputing Centre

Content

Researcher Feedback 2016

2nd March 2017

Results from our annual Pawsey Researcher Survey for 2016 are back!

Pawsey is pleased to announce that feedback indicates that “researchers using Pawsey facilities have an overall satisfactory experience of services, staff, training, and events.” This makes the sixth consecutive year in which Pawsey “meets expectations” for almost all of its researchers with Pawsey providing consistent “good service” for our user base.

The survey results suggested that while certain areas could be improved upon, no glaring issues were apparent. Pawsey staff have used the information from the survey to form strategies for enhancement to services, and earmark key areas for upcoming change and transformation.

 

Some specific examples of the positive feedback received include:

  • “Good, efficient events and communications”
  • “Helpful and hard-working staff”
  • “Generally well integrated services throughout”
  • “Overall a good cloud service”

 

Areas for improvement highlighted:

  • More advanced training and increased online materials (especially for out of State researchers).

Pawsey is pleased to announce that new online and advanced training offerings will be available during 2017. This will extend the availability of training courses to support learners across Australia. The three existing main training units (introductory, intermediate and advanced) have been divided into 10-20 slide modules, each with a stand-alone topic, e.g. how to transfer files or check group usage. The modules will then be converted into 10-30 minute videos and will be made available on Pawsey websites and publicly on YouTube (to be confirmed). Each video will also have both lecture-style talk and hands-on demonstrations.

With many modules being planned for release, Pawsey will initially give priority to Pawsey-specific introductory and intermediate modules which cover basic supercomputing, data and visualisation topics. The next batch of modules will cover more advanced topics such as serial code optimisation, parallelising with MPI, and parallelising with OpenMP. Further details will be provided as soon as release dates are confirmed.

 

  • Increased machine learning capabilities and cloud storage

At the start of July 2017, Pawsey will deploy a new OpenStack cluster to best meet the needs of researchers. Specifically, the new service will support large data workflows, and computational tasks. The new resource will support / provide:

  1. HPC on OpenStack – For any compute job that doesn’t fill nodes on Magnus.
  2. Data Analytics using Sahara – Sahara is an OpenStack Module for easy deployment and management of DA clusters and jobs. This will remove the need to deploy complex Data Analytics clusters.
  3. Machine Learning to automatically investigate complex relationships within a dataset.
  4. Data Centric Workflows – Researchers with large, inhomogeneous datasets require adaptable compute facilities for data processing workflows.

As soon as more information is available, the new cloud service will be promoted to ensure all researchers can take full advantage of this new resource.

 

  • Increased tape storage, replace tape with spinning disk; more storage generally

There is presently a licence to store up to 100PB of data in the Pawsey Centre on tape and independently up to 6PB of data on high performance enterprise grade spinning disk.  There is presently capacity to meet the immediate needs and forecast long term requirements of researcher projects.  Researchers with data needs are invited to apply for storage at any time using the data portal.

 

  • Improve data transfer capability within data stores, including interactions with Mediaflux data portal and the command line tools

 In December, Pawsey staff evaluated and improved some ethernet configurations, which have decreased various data transfer times.   In 2017, the team will be investigating leverage Mediaflux’s native DMF support to seamlessly improve the efficiency of command line data transfers involving the HSM.

 

  • Improve integration and ability to move data around

Pawsey current offers two dedicated systems to long-term storage provided via a POSIX filesystems, supporting copy tools to move data in and out. During 2017, staff will investigate running a SLURM daemon on both systems to allow researchers to schedule jobs on the system. Radio astronomer researchers are presently beta testing a new dcp, a high performance copy tool for Lustre; this will hopefully become widely available in quarter one of 2017. Staff plan to investigate the feasibility of using Mediaflux’s S3 interface (via Ceph) to provide collection access to OpenStack Virtual Machines for enhanced data processing services.

 

  • Improved HPC reliability (reducing the number of interrupts to workflow)

Pawsey staff are at present reviewing the number of maintenance days held annually, as well as investigating the possibility of running more maintenance while the systems are operational. Once a decision is made an updated schedule will be available here

The Cray Sonexion serving /scratch and /scratch2 have been upgraded to the latest stable release (NEO 2.0). SLURM has been upgraded to 16.05.8 and will be the major version that Magnus and Galaxy will use until their retirement. Cray Linux Environment has been stable at 5.2UP04 since last year and will only be upgraded to rectify known issues.

 

  • Faster scratch disk

/scratch also uses NEO 2.0, but the underlying Lustre has been upgraded from version 2.1.x to 2.5.x which sees a number of software improvements such as improved meta-data performance and single threaded performance. A new /group filesystem will be online in March 2017 which will double the amount of storage space and triple the performance of the current filesystem. This is also based around a much newer version of Lustre (2.7.x) which has a number of performance improvements which will be most clearly seen on the Zeus data mover nodes.

 

  • Deletion warnings needed

Deletion warnings will be investigated during 2017.

 

  • Increased GPU capability

The Advance Technology Cluster will be available in quarter two of 2017. Researchers will have access to 44 NVIDIA Pascal GPUs which a much more modern software stack based around SLES 12 and CUDA 8+. This will allow for the support of projects requiring Machine Learning or Deep Learning software. Which the Zeus Cluster expansion coming online in Q3 2017, the current nodes will be allocated to a dedicated to either a GPU queue or visualisation queue depending on whether the GPU is Tesla or Quadro based. This will allow Pawsey to resume having a dedicated GPU service and be able to allocate resources to them (rather than the current shared-service offering).

 

  • HPC resource allocations more than once a year

Increasing the resource allocations was tried and tested during 2015-2016. Unfortunately it was largely unsuccessful. In addition, feedback from the Chairs of the location allocation committees indicated that they preferred to have one calendar year intake. However Pawsey will contact the committees again and indicate that researchers have raised this issue again.

 

  • New services for Supercomputing

In 2017, priority new services such as ability to add users to projects and the evaluation of Shifter (a Docker-like technology) on all Supercomputing systems will be investigated.

 

Pawsey extends a warm thank you to all the survey participants. We look forward to the further advancement of our services and facilities made possible through valuable feedback offered from researchers. We encourage all researchers to continue providing suggestions via surveys so that Pawsey may best serve the research needs of Australia.

Researchers are invited to provide further feedback at any time by contacting feedback@pawsey.org.au