Science Faculty HPC Facility: Projects
Prospective users of the Science Faculty HPC Facility are asked to provide a short overview
of the work that they propose to carry out upon the facility, so that their domain science's
techniques and methodologies become visible to the VUW research community as a whole.
Provisioning Projects
These first three projects were selected, from amongst the research groups that drove the
accquisition of the facility, for running during the its provisioning phase, as they were felt
to represent the three major gaps in research computing provision at VUW that it is hoped
that the facility will fill.
SBS: Biodiversity: HPC helps reveal the diversity of an ant genome
School of Biological Sciences PhD candidate Monica Gruber will be using the
VUW Science Faculty's new High Performance Computing Facility
to analyse complex genomic data for a number of studies of the invasive yellow
crazy ant. The yellow crazy ant is a pest in many Indo-Pacific island
nations, Australia and South-east Asia.
The theme of Monica's studies relates to whether there is a genetic
basis for the ant's invasion success. Her genomic studies include
comparing genomes with other ants, and bees and wasps, to identify
genes associated with population growth, and behaviour.
A more practical aim of this research is to discover species-specific
pathogens or parasites that may be applied to biological control of
this ant.
The typical data set size is around 10GB, with 7 data sets requiring multiple
analyses. In-core memory requirements during pilot analyses have already exceeded
170GB. Searching gene databases to detect pathogens and gene homologues is
potentially parallelisable across distributed computing resources.
ARC: Multi-millennial Ice Sheet Modeling at the Continental-scale
Antarctic Research Centre researcher Nicholas Golledge, will be using
VUW Science Faculty's new High Performance Computing Facility to
produce high spatial resolution simulations of the entire Antarctic ice
sheet.
Nick's research focuses on running numerical ice sheet models
to simulate the three-dimensional structure and dynamic character
of past and present ice sheets and glaciers, however, in order to
most accurately capture the physical processes of the ice-sheet system
it is necessary to implement these models at high spatial resolutions,
which, for the Antarctic continent, is less than a 10km grid-scale.
Nick uses a model code, PISM, developed by a team at the University of
Fairbanks, Alaska, that is specifically written to make use of massively
parallel architectures, allowing the processing to be distributed across
several hundred cores and so achieve significant reductions in model
runtimes.
As a researcher targetted for pre-deployment testing of the facility, Nick
has already been able to produce a 7.5km resolution simulation of the
Antarctic ice sheet. at the height of the Last Glacial Maximum, some
20,000 years ago, and is already gaining new insights into the research
as a result.
SCPS: Quantum chemical calculations shed light on electronic structure
School of Chemical and Physical Science researcher Matthias Lein will
be using the VUW Science Faculty's new High Performance Computing
Facility to compute the electronic structure of transition metal
coordination compounds and nano-sized materials.
Accurate calculations of the electronic interaction and structural optimizations are
still a formidable computational tasks even though the general approach has
been known for a long time. While smaller molecules can be computed
quickly even on a desktop computer, the kind of molecules that tickle
the interest of researchers need several orders of magnitude more
computational power to achieve the same level of precision.
The new facility will be mainly used for the theoretical prediction of
chemical structures and spectroscopic properties of the associated
compounds.
A typical computation will run on 8 compute cores
simultaneously and use up to 64 GB of memory along the way.
The data set that is produced at the end is dwarfed by the amount of
intermediate data that is generated while the calculations is running.
Those data sets can grow as large as several TB, but are then reduced
to a small set of results that is much smaller.
Projects
SGEES: Rainfall-runoff modelling for the Lake Taupo catchment.
School of Geography, Environment and Earth Sciences PhD student
Deborah Maxwell will be using the VUW Science Faculty's new High
Performance Computing Facility to improve model prediction of inflows
to Lake Taupo, and consequently, Mighty River Power's management of
water that can pass through the Taupo Control Gates into the Waikato
Power System.
Deborah's development, using MATLAB, of a rainfall-runoff model for Lake Taupo,
overlays the spatial distribution of effective precipitation, losses and storage
within the whole catchment, onto a routing of runoff through various sub-catchments
to Lake Taupo.
Whilst individual simulations can be run quickly, calibration of the model
parameters requires Monte Carlo methods, involving random sampling from the
distribution of inputs and successive model runs, until a statistically significant
distribution of outputs is obtained, which necessitates running a large number of
simulations.
Deborah's overall processing times benefit from the ability to compile the MATLAB
codes and then run concurrent multiple simulations in non-interactive batch-mode,
against the MATLAB Compiler Runtime.
SMSOR: Modelling the joint survival function parametrically using copulas in R.
School of Mathematics, Statistics, and Operations Research Masters student
Boyd Anderson is using the Science Faculty HPC to find a parametric joint
survival function of a real automotive warranty data-set. In particular,
using a copula (a function linking marginal variables into a multivariate
distribution) to model the underlying dependence structure of the data-set.
The copulas being considered are the Archimedean family, and the Elliptical
family. In total, nine different copulas, each with at least 5 parameters.
The computationally expensive part of this project is finding the optimum
copula parameters to describe the behaviour of the data-set. To do this,
two optimisation heuristics were selected, Differential Evolution, and
Particle Swarm Optimisation. Both DE and PSO have been implemented in R,
and are sufficiently parallelised to run on 24 cores. The average run time
per optimisation is 2-3 days, and each model will be run multiple times to
increase the confidence in the computed best fit.
SECS: Large scale evaluation of graph layout algorithms.
School of Engineering and Computer Science PhD student Roman Klapaukh will be using
the Science Faculty HPC Facility to run simulations of graph layout algorithms.
Specifically we are looking at many different variants of the force directed layout and
how the different variants affect the final layout.
Unlike many other HPC projects, doing a single computation is very quick. The
difficulty lies in performing enough runs of the algorithm to do all the tests which
we require. Running on the Science Faculty HPC allows us to perform many
sufficiently simultaneous trails.
SECS: Simulation Framework for Classifying Handwritten Image Patterns
School of Engineering and Computer Science Evolutionary Computation
researcher Toktam Ebadi, will be using the Science Faculty HPC
Facility to execute a simulation framework for classifying image
patterns.
Toktam’s research focuses on developing Feature Pattern Classification
System (FPCS) that investigates suitability of Learning Classifier
Systems (LCSs) for the image domain.
Two implementations of FPCS have been developed. The original FPCS
that suits online reinforcement learning scenarios and supervised FPCS
that suits supervised scenarios where the ground truth data is
available. Such a system is beneficial in identifying objects in
digital images, however as the number of classes increase, more rules
are required and therefore more memory.
In order to overcome memory limitations, larger datasets were
previously being divided into separate parts with the training
performed on each part, however, this resulted in an unwanted
behaviour in the FPCS. Using the Science Faculty HPC Facility will
thus enable execution of the FPCS on problems with larger number of
classes and examples.
SCPS: Dynamics of Bose-Einstein condensates
Department of Physics (University of Otago) PhD student Sam Rooney,
currently being hosted by SCPS, will be using the Science Faculty HPC
Facility to simulate the dynamics of Bose-Einstein condensates at finite
temperatures.
To quantitatively account for atomic interactions and thermal fluctuations, the
equation of motion takes the form of a stochastic nonlinear Schrödinger equation
which must be solved numerically.
A major computational difficulty is performing enough simulations to achieve
stochastic convergence, which typically requires hundreds of trajectories.
Individual simulations can require anywhere from 10^3 to 10^6 modes, leading to
simulation times of the order of hours to months on a single cpu. Using the HPC
facility enables us to trivially parallelize our numerics by performing many
trajectories simultaneously.