I am accepting PhD students, so if my research is of interest to you, then please get in touch! Due to the focus of my research, you must be a strong programmer, have a willingness and ability to pick up new technologies and techniques easily, and ideally have existing HPC experience.
Current PhD students
Gabriel RodrÍguez Canal is researching enhancing the ecosystem for FPGAs to better support HPC code development. He has ported the Controller task-based model to AMD Xilinx and Intel FPGAs, leveraging partial reconfiguration to dynamically swap tasks in and out during execution and won the 2022 AMD Xilinx open hardware contest in the PhD category for this work. During an internship with HPE he developed Fortran-HLS, which couples the Flang frontend with the AMD Xilinx HLS backend, enabling direct Fortran HLS programming of FPGAs and led to an FPL paper. Building upon this, Gabriel developed MLIR dialects and transformations to enable automatic dataflow optimisation of imperative codes, with the ultimate aim that serial codes can obtain high performance on FPGAs with no code changes required by the programmer. This was leveraged in our ASPLOS paper with a follow-up paper building upon this to demonstrate a Fortran-based OpenMP MLIR flow.
David Katz is researching compiler methodologies for providing both performance and programmer productivity on the Cerebras Wafer Scale Engine (WSE). Developing new dialects and transformations within the MLIR and xDSL ecosystems, the overarching objective is to enable automatic transformation of codes to an algorithmic form that best suits the CS-2 and CS-3 machines. To date we have focused on stencils, which is a very common mathematical pattern in HPC, with an ASPLOS paper where we demonstrate that our approach slightly outperforms handwritten CS-2 kernels and a CS-3 is around fourteen times faster than 128 A100 GPUs. A major part of this work has not only been the core compiler technologies, but furthermore exploration around the most appropriate WSE execution model.
Jake Davies is exploring compiler technologies for Tenstorrent AI accelerators. Designed for model inference, the architecture is designed around Tensix cores which contain a co-processor with matrix and vector units, along with RISC-V CPUs to drive the co-processor and undertake data movement. The general purpose and open source nature of their programming framework means that it is possible to use this for other workloads, such as HPC, however this involves significant complexity and complete rewriting of codes. Jake is developing compiler infrastructure within MLIR and xDSL to enable OpenMP offloading of existing codes to this technology with (ideally) no code changes required by the programmer. We have already demonstrated in a paper exploring FFTs on Tenstorrent that the architecture’s specialisation provides significant potential for energy efficiency gains. The current objective is to enable these gains to be provided to HPC programmers without any code changes required.
Graduated PhD students
Mark Klaisoongnoen explored the role of FPGAs in accelerating quantitative finance workloads in an energy efficient manner. He worked with STAC research, the finance industry standard community body that develops numerous benchmarks for evaluating technologies for financial applications. Based upon Mark’s work he was been able to not only deliver significant energy reductions compared to CPUs or GPUs, but furthermore significantly outperform these technologies for specific quantitative finance benchmarks. This is because the performance of these are bound by aspects other than compute, and-so the bespoke nature of FPGAs where one can develop memory hierarchies and logic which is specialised for the problem in question is highly beneficial here. Mark also explored the role of AMD Xilinx AI engines for quantitative finance workloads, which was an activity that he began during an internship with HPE, and resulted in an ISFPGA paper.
Ludovic Capelli studied with me exploring bringing HPC methods and techniques to the vertex centric programming model. First developed by Google in 2010, vertex centric provides programmer productivity in graph based processing. However, until this work it typically required significant memory resources and performed poorly, with numerous frameworks compromising to optimise these features at the code of productivity. This work significantly enhanced vertex centric for both shared and distributed memory architectures, reducing memory usage by several orders of magnitude and Ludovic holds an (unofficial) world record for the largest vertex centric graph (750 billion edges) ever processed within a single node without requiring out-of-core computation. Developing a distributed version of his iPregel framework, called DiP, he was able to process graphs comprising 1.6 trillion edges using vertex centric. During his PhD Ludovic undertook internships with the National Institute of Informatics (NII) in Tokyo and Renault. He was also awarded a prestigious Huawei PhD fellowship and now works for EPCC as a teaching fellow.
Maurice Jamieson studied providing performance and programmer productivity on low SWaP micro-core architectures. These CPUs are typically very simple, have tiny memory spaces (e.g. 32KB or less) with no hardware caching support, and run bare-metal without an operating system. His work resulted in the development of a framework called Vipera for dynamic languages on these architectures, enabling at or near native C code performance for languages such as Python resulting in a CC paper. Furthermore, using Vipera one is able to run codes of arbitrary sizes and datasets with a memory requirement of approximately 4KB, thus decoupling the requirements of an application from that of the hardware entirely from software. During the course of his research he also developed the Eithne benchmarking framework to enable easier measurement of codes on this class of architecture. Maurice now works for EPCC as a software architect.