Heiko Strathmann, PhD

Coffee climbing jazz math

Academic Guest


I am interested in machine learning and computational statistics, with a focus on kernel methods, statistical testing and Bayesian inference.

I recently finished my PhD with Arthur Gretton at the Gatsby unit at UCL London  (http://www.gatsby.ucl.ac.uk/). I worked on embedding kernel methods in Monte Carlo algorithms, e.g. as density surrogates in Hamiltonian Monte Carlo, or for goodness-of-fit testing for quantifying MCMC convergence. During my undergraduate (MSc Machine Learning at UCL, BSc Computer Science in Duisburg), I worked on large-scale kernel two-sample testing and SVM classification of aminoacid sequences. I am also active in the open-source community as a core maintainer of Shogun, a toolbox for unified and efficient machine learning. During and after my PhD I delivered predictive analytics for the UK and global energy industry with swhere Ltd.

At ETHZ, I am applying probabilistic modelling techniques to challenges in health, and continue to contribute to the Shogun project.

Abstract Human professionals are often required to make decisions based on complex multivariate time series measurements in an online setting, e.g. in health care. Since human cognition is not optimized to work well in high-dimensional spaces, these decisions benefit from interpretable low-dimensional representations. However, many representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose to couple a variational autoencoder to a discrete latent space and introduce a topological structure through the use of self-organizing maps. This allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the latent space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application. In the latter experiment, our representation uncovers meaningful structure in the acute physiological state of a patient.

Authors Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, Gunnar Rätsch

Submitted Arxiv


Abstract We propose a fast method with statistical guarantees for learning an exponential family density model where the natural parameter is in a reproducing kernel Hilbert space, and may be infinite-dimensional. The model is learned by fitting the derivative of the log den- sity, the score , thus avoiding the need to com- pute a normalization constant. Our approach improves the computational efficiency of an earlier solution by using a low-rank, Nyström- like solution. The new solution retains the consistency and convergence rates of the full- rank solution (exactly in Fisher distance, and nearly in other distances), with guarantees on the degree of cost and storage reduction. We evaluate the method in experiments on density estimation and in the construction of an adaptive Hamiltonian Monte Carlo sam- pler. Compared to an existing score learning approach using a denoising autoencoder, our estimator is empirically more data-efficient when estimating the score, runs faster, and has fewer parameters (which can be tuned in a principled and interpretable way), in addition to providing statistical guarantees.

Authors D. J. Sutherland, H. Strathmann, M. Arbel, and A. Gretton

Submitted AISTATS 2018


Abstract We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy ( MMD ). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks ( GAN ), in which a model attempts to enerate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Second, the MMD can be used to evaluate the performance of a generative model, by testing the model’s samples against a reference data set. In the latter role, the optimized MMD is particularly helpful, as it gives an interpretable indication of how the model and data distributions differ, even in cases where individual model samples are not easily distinguished either by eye or by classifier.

Authors D. J. Sutherland, H. Y. Tung, H. Strathmann, S. De, A. Ramdas, A. Smola, and A. Gretton

Submitted ICLR, 2017


Authors I. Schuster, H. Strathmann, B. Paige, and D. Sejdinovic

Submitted Joint european conference on machine learning and knowledge discovery in databases, 2017

Authors K. Chwialkowski, H. Strathmann, and A. Gretton

Submitted ICML, 2016


Abstract We propose Kernel Hamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where classical HMC is not an option due to intractable gradients, KMC adaptively learns the target's gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency, and offers substantial mixing improvements over state-of-the-art gradient free samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.

Authors H. Strathmann, D. Sejdinovic, S. Livingstone, Z. Szabo, and A. Gretton

Submitted NIPS, 2015


Authors H. Strathmann, D. Sejdinovic, and M. Girolami

Submitted arXiv preprint, 2014.


Abstract A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples. Code may be downloaded at this https URL.

Authors D. Sejdinovic, H. Strathmann, M. Lomeli, C. Andrieu, and A. Gretton,

Submitted ICML, 2012


Authors A. Gretton, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, K. F., and B. K. Sriperumbudur

Submitted NIPS, 2012.


Abstract Antiviral CD8+ T cells are a key component of the adaptive immune system against hepatitis C virus (HCV). For the development of immune therapies, it is essential to understand how CD8+ T cells contribute to clearance of infection and why they fail so often. A mechanism for secondary failure is mutational escape of the virus. However, some substitutions in viral epitopes are associated with fitness costs and often require compensatory mutations. We hypothesized that compensatory mutations may point toward epitopes under particularly strong selection pressure that may be beneficial for vaccine design because of a higher genetic barrier to escape. We previously identified two HLA-B*15-restricted CD8+ epitopes in NS5B (LLRHHNMVY2450-2458 and SQRQKKVTF2466-2474), based on sequence analysis of a large HCV genotype 1b outbreak. Both epitopes are targeted in about 70% of HLA-B*15-positive individuals exposed to HCV. Reproducible selection of escape mutations was confirmed in an independent multicenter cohort in the present study. Interestingly, mutations were also selected in the epitope flanking region, suggesting that compensatory evolution may play a role. Covariation analysis of sequences from the database confirmed a significant association between escape mutations inside one of the epitopes (H2454R and M2456L) and substitutions in the epitope flanking region (S2439T and K2440Q). Functional analysis with the subgenomic replicon Con1 confirmed that the primary escape mutations impaired viral replication, while fitness was restored by the additional substitutions in the epitope flanking region. We concluded that selection of escape mutations inside an HLA-B*15 epitope requires secondary substitutions in the epitope flanking region that compensate for fitness costs.

Authors M. Ruhl, P. Chhatwal, H. Strathmann, T. Kuntzen, D. Bankwitz, K. Skibbe, A. Walker, F. M. Heinemann, P. a Horn, T. M. Allen, D. Hoffmann, T. Pietschmann, and J. Timm

Submitted Journal of virology, Vol. 86, Iss. 2, pp. 991-1000, 2012.