research

The most consistently-updated list of my publications is my Google Scholar page.

My research effort is focused around the intersection of machine learning and medicine/healthcare. (I'd rather it be the union, but I only have so much time in a day).

I'm interested in representation learning, probabilistic modelling, and deep learning on time series. I'm interested in critical care medicine and characterising organ function in ICU patients. This fits into a broader interest in patient state modelling and forecasting, which is a natural application area for machine learning.

A non-exhaustive list of projects in roughly reverse-chronological order:

Organ failure prediction in intensive care

This is an ongoing project funded by a grant from the Swiss National Science Foundation.

The objective is to predict imminent organ failure in intensive care unit patients. We have a large observational dataset from our collaborators at Bern University Hospital, so a lot of the challenge of this work is in data preparation and feature extraction. So far we have focused on circulatory failure, which we define to require elevated lactate and low mean arterial pressure or the presence of vasoactive drugs. We try to predict a deterioration in the patient's circulatory system over a clinically-useful horizon, with the idea that alarms from this system will enable doctors to better allocate their attention. As part of this grant we will be building a prototype to test in practice!

Recurrent conditional GANs for synthetic medical data

Paper:

Code: https://github.com/ratschlab/RGAN

Explanation: Generative adversarial networks have been used to generate realistic data in domains like images and text, so we used the approach to create synthetic medical timeseries (specifically ICU data). We did this because sharing data in medicine is challenging (for good reason), so if one had a sufficiently realistic synthetic dataset it could be used for benchmarking and high-level model development. To achieve this we had to build a GAN for time-series (a recurrent GAN), come up with a way of evaluating the synthetic data, and consider privacy implications. We empirically evaluated memorisation in the GAN, and also trained it in a differentially private manner to ensure the sensitive training data would not be compromised.

Presentations:

Learning unitary RNNs

Paper:

Code: https://github.com/ratschlab/uRNN

Explanation: Recurrent neural networks can suffer from exploding/vanishing gradients, especially on long input sequences. Using a unitary or orthogonal matrix for the transition weight matrix in the RNN can help address this problem, but then you have to enforce unitarity somehow, and the unitary group is not closed under addition. Existing work had used a restricted parametrisation of the unitary group, so in our paper we came up with a full parametrisation using the Lie algebra (u(n)) associated to the group of n x n unitary matrices.

Presentations:

Word and relationship embeddings for medicine

Paper(s):

Code: https://github.com/corcra/bf2

Explanation: I wrote a non-expert explanation of this project/paper. The basic idea is to learn vector representations for concepts, and affine transformations for relationships between concepts to enable context-dependent similarity . This also enables us to exploit knowledge graphs (e.g. the UMLS MetaThesaurus) to learn concept embeddings even if we have limited unstructured data. We learn all the embeddings jointly maximising the likelihood of a Boltzmann distribution using persistent contrastive divergence.

Presentations: