Imperial College London

DrSarahFilippi

Faculty of Natural SciencesDepartment of Mathematics

Reader in Statistical Machine Learning
 
 
 
//

Contact

 

+44 (0)20 7594 8562s.filippi

 
 
//

Location

 

523Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

49 results found

Bodinier B, Filippi S, Haugdahl Nost T, Chiquet J, Chadeau Met al., 2023, Automated calibration for stability selection in penalised regression and graphical models, Journal of the Royal Statistical Society Series C: Applied Statistics, Vol: 72, Pages: 1375-1393, ISSN: 0035-9254

Stability selection represents an attractive approach to identify sparse sets of features jointly associated with an outcome in high-dimensional contexts. We introduce an automated calibration procedure via maximisation of an in-house stability score and accommodating a priori-known block structure (e.g. multi-OMIC) data. It applies to [Least Absolute Shrinkage Selection Operator (LASSO)] penalised regression and graphical models. Simulations show our approach outperforms non-stability-based and stability selection approaches using the original calibration. Application to multi-block graphical LASSO on real (epigenetic and transcriptomic) data from the Norwegian Women and Cancer study reveals a central/credible and novel cross-OMIC role of LRRN3 in the biological response to smoking. Proposed approaches were implemented in the R package sharp.

Journal article

Bodinier B, Vuckovic D, Rodrigues S, Filippi S, Chiquet J, Chadeau Met al., 2023, Automated calibration of consensus weighted distance-based clustering approaches using sharp, Bioinformatics, Vol: 39, ISSN: 1367-4803

Motivation:In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms.Results:We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularised approaches. We propose a procedure for the calibration of the number of clusters (and regularisation parameter) by maximising the sharp score, a novel stability score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) approaches calibrated by maximising the sharp score compared to existing calibration scores, and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes.Availability and implementation:The R package sharp (version ≥ 1.4.3) is available on CRAN at https://CRAN.R-project.org/package=sharp.

Journal article

Odgers J, Kappatou C, Misener R, García Muñoz S, Filippi Set al., 2023, Probabilistic predictions for partial least squares using bootstrap, AIChE Journal, Vol: 69, Pages: 1-16, ISSN: 0001-1541

Modeling the uncertainty in partial least squares (PLS) is made difficult because of the nonlinear effect of the observed data on the latent space that the method finds. We present an approach, based on bootstrapping, that automatically accounts for these nonlinearities in the parameter uncertainty, allowing us to equally well represent confidence intervals for points lying close to or far away from the latent space. To show the opportunities of this approach, we develop applications in determining the Design Space for industrial processes and model the uncertainty of spectroscopy data. Our results show the benefits of our method for accounting for uncertainty far from the latent space for the purposes of Design Space identification, and match the performance of well established methods for spectroscopy data.

Journal article

Howson B, Pike-Burke C, Filippi S, 2023, Optimism and delays in episodic reinforcement learning, Artificial Intelligence and Statistics (AISTATS 2023), Publisher: PMLR, Pages: 1-34

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updatingthe policy immediately after every interaction with the environment. However, feedback is almost always delayed in practice. In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purposeapproaches to handling the delays. The first involves updating as soon as new information becomes available, whereas the second waits before using newly observed information to update the policy. For the class of optimistic algorithms and either approach, we show that the regret in-creases by an additive term involving the number of states, actions, episode length, the expected delay and an algorithm-dependent constant. We empirically investigate the impact of various delay distributions on the regret of optimistic algorithms to validate our theoretical results.

Conference paper

Howson B, Pike-Burke C, Filippi S, 2023, Delayed feedback in generalised linear bandits revisited, Artificial Intelligence and Statistics s (AISTATS 2023), Publisher: PMLR, Pages: 1-25, ISSN: 2640-3498

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback setting can achieve regret of ̃O(d√T + d3/2E[τ ] ), where E[τ ] denotes the expected delay, d is the dimension and T is the time horizon. This significantly improves upon existing approaches for this setting where the best known regret bound was ̃O(√dT √d + E[τ ] ). We verify our theoretical results through experiments on simulated data.

Conference paper

Lamprinakou S, Barahona M, Flaxman S, Filippi S, Gandy A, McCoy EJet al., 2023, BART-based inference for Poisson processes, Computational Statistics and Data Analysis, Vol: 180, Pages: 1-25, ISSN: 0167-9473

The effectiveness of Bayesian Additive Regression Trees (BART) has been demonstrated in a variety of contexts including non-parametric regression and classification. A BART scheme for estimating the intensity of inhomogeneous Poisson processes is introduced. Poisson intensity estimation is a vital task in various applications including medical imaging, astrophysics and network traffic analysis. The new approach enables full posterior inference of the intensity in a non-parametric regression setting. The performance of the novel scheme is demonstrated through simulation studies on synthetic and real datasets up to five dimensions, and the new scheme is compared with alternative approaches.

Journal article

Howson B, Pike-Burke C, Filippi S, 2023, Optimism and Delays in Episodic Reinforcement Learning, Pages: 6061-6094

There are many algorithms for regret minimisation in episodic reinforcement learning. This problem is well-understood from a theoretical perspective, providing that the sequences of states, actions and rewards associated with each episode are available to the algorithm updating the policy immediately after every interaction with the environment. However, feedback is almost always delayed in practice. In this paper, we study the impact of delayed feedback in episodic reinforcement learning from a theoretical perspective and propose two general-purpose approaches to handling the delays. The first involves updating as soon as new information becomes available, whereas the second waits before using newly observed information to update the policy. For the class of optimistic algorithms and either approach, we show that the regret increases by an additive term involving the number of states, actions, episode length, the expected delay and an algorithm-dependent constant. We empirically investigate the impact of various delay distributions on the regret of optimistic algorithms to validate our theoretical results.

Conference paper

Howson B, Pike-Burke C, Filippi S, 2023, Delayed Feedback in Generalised Linear Bandits Revisited, Pages: 6095-6119

The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback setting can achieve regret of Õ(d√T +d3/2E[τ]), where E[τ] denotes the expected delay, d is the dimension and T is the time horizon. This significantly improves upon existing approaches for this setting where the best known regret bound was Õ(√dT√d + E[τ]). We verify our theoretical results through experiments on simulated data.

Conference paper

Zhang Q, Wild V, Filippi S, Flaxman S, Sejdinovic Det al., 2022, Bayesian kernel two-sample testing, Journal of Computational and Graphical Statistics, Vol: 31, Pages: 1164-1176, ISSN: 1061-8600

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modeling the difference between kernel mean embeddings in the reproducing kernel Hilbert space using the framework established by Flaxman et al. The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e., testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach. Supplementary materials for this article are available online.

Journal article

Komodromos M, Aboagye EO, Evangelou M, Filippi S, Ray Ket al., 2022, Variational Bayes for high-dimensional proportional hazards models with applications within gene expression, BIOINFORMATICS, Vol: 38, Pages: 3918-3926, ISSN: 1367-4803

Motivation:Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.Results:We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.Availability and implementation:our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).

Journal article

Reiker T, Golumbeanu M, Shattock A, Burgert L, Smith TA, Filippi S, Cameron E, Penny MAet al., 2021, Emulator-based Bayesian optimization for efficient multi-objective calibration of an individual-based model of malaria, Nature Communications, Vol: 12, ISSN: 2041-1723

Individual-based models have become important tools in the global battle against infectious diseases, yet model complexity can make calibration to biological and epidemiological data challenging. We propose using a Bayesian optimization framework employing Gaussian process or machine learning emulator functions to calibrate a complex malaria transmission simulator. We demonstrate our approach by optimizing over a high-dimensional parameter space with respect to a portfolio of multiple fitting objectives built from datasets capturing the natural history of malaria transmission and disease progression. Our approach quickly outperforms previous calibrations, yielding an improved final goodness of fit. Per-objective parameter importance and sensitivity diagnostics provided by our approach offer epidemiological insights and enhance trust in predictions through greater interpretability.

Journal article

Frainay C, Pitarch Y, Filippi S, Evangelou M, Custovic Aet al., 2021, Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining, Clinical and Experimental Allergy, Vol: 51, Pages: 1185-1194, ISSN: 0954-7894

BackgroundBiomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.ObjectiveTo investigate the consequence of the ambiguity between the use of terms “Eczema” and “Atopic Dermatitis” (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.MethodsArticles were retrieved by querying the PubMed using terms ‘eczema’ (D003876) and “dermatitis, atopic” (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.ResultsAtopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with “AD” or “Eczema” differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and Clinical RelevanceThere is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning

Journal article

NCD Risk Factor Collaboration NCD-RisC, Iurilli N, 2021, Heterogeneous contributions of change in population distribution of body-mass index to change in obesity and underweight, eLife, Vol: 10, ISSN: 2050-084X

From 1985 to 2016, the prevalence of underweight decreased, and that of obesity and severe obesity increased, in most regions, with significant variation in the magnitude of these changes across regions. We investigated how much change in mean body mass index (BMI) explains changes in the prevalence of underweight, obesity, and severe obesity in different regions using data from 2896 population-based studies with 187 million participants. Changes in the prevalence of underweight and total obesity, and to a lesser extent severe obesity, are largely driven by shifts in the distribution of BMI, with smaller contributions from changes in the shape of the distribution. In East and Southeast Asia and sub-Saharan Africa, the underweight tail of the BMI distribution was left behind as the distribution shifted. There is a need for policies that address all forms of malnutrition by making healthy foods accessible and affordable, while restricting unhealthy foods through fiscal and regulatory restrictions.

Journal article

Unwin H, Mishra S, Bradley V, Gandy A, Mellan T, Coupland H, Ish-Horowicz J, Vollmer M, Whittaker C, Filippi S, Xi X, Monod M, Ratmann O, Hutchinson M, Valka F, Zhu H, Hawryluk I, Milton P, Ainslie K, Baguelin M, Boonyasiri A, Brazeau N, Cattarino L, Cucunuba Z, Cuomo-Dannenburg G, Dorigatti I, Eales O, Eaton J, van Elsland S, Fitzjohn R, Gaythorpe K, Green W, Hinsley W, Jeffrey B, Knock E, Laydon D, Lees J, Nedjati-Gilani G, Nouvellet P, Okell L, Parag K, Siveroni I, Thompson H, Walker P, Walters C, Watson O, Whittles L, Ghani A, Ferguson N, Riley S, Donnelly C, Bhatt S, Flaxman Set al., 2020, State-level tracking of COVID-19 in the United States, Nature Communications, Vol: 11, Pages: 1-9, ISSN: 2041-1723

As of 1st June 2020, the US Centers for Disease Control and Prevention reported 104,232 confirmed or probable COVID-19-related deaths in the US. This was more than twice the number of deaths reported in the next most severely impacted country. We jointly model the US epidemic at the state-level, using publicly available deathdata within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the number of individuals that have been infected, the number of individuals that are currently infectious and the time-varying reproduction number (the average number of secondary infections caused by an infected person). We use changes in mobility to capture the impact that non-pharmaceutical interventions and other behaviour changes have on therate of transmission of SARS-CoV-2. We estimate thatRtwas only below one in 23 states on 1st June. We also estimate that 3.7% [3.4%-4.0%] of the total population of the US had been infected, with wide variation between states, and approximately 0.01% of the population was infectious. We demonstrate good 3 week model forecasts of deaths with low error and good coverage of our credible intervals.

Journal article

Kolbeinsson A, Filippi S, Panagakis I, Matthews P, Elliott P, Dehghan A, Tzoulaki Iet al., 2020, Accelerated MRI-predicted brain ageing and its associations with cardiometabolic and brain disorders, Scientific Reports, Vol: 10, ISSN: 2045-2322

Brain structure in later life reflects both influences of intrinsic aging and those of lifestyle, environment and disease. We developed a deep neural network model trained on brain MRI scans of healthy people to predict “healthy” brain age. Brain regions most informative for the prediction included the cerebellum, hippocampus, amygdala and insular cortex. We then applied this model to data from an independent group of people not stratified for health. A phenome-wide association analysis of over 1,410 traits in the UK Biobank with differences between the predicted and chronological ages for the second group identified significant associations with over 40 traits including diseases (e.g., type I and type II diabetes), disease risk factors (e.g., increased diastolic blood pressure and body mass index), and poorer cognitive function. These observations highlight relationships between brain and systemic health and have implications for understanding contributions of the latter to late life dementia risk.

Journal article

Roberts G, Fontanella S, Selby A, Howard R, Filippi S, Hedlin G, Nordlund B, Howarth P, Hashimoto S, Brinkman P, Fleming LJ, Murray C, Bush A, Frey U, Singer F, Schoos A-MM, van Aalderen W, Djukanovic R, Chung KF, Sterk PJ, Adnan C, U-BIOPRED Consortiumet al., 2020, Connectivity patterns between multiple allergen specific IgE antibodies and their association with severe asthma, Journal of Allergy and Clinical Immunology, Vol: 146, Pages: 821-830, ISSN: 0091-6749

BACKGROUND: Allergic sensitization is associated with severe asthma, but assessment of sensitization is not recommended by most guidelines. OBJECTIVE: We hypothesized that patterns of IgE responses to multiple allergenic proteins differ between sensitized participants with mild/moderate and severe asthma. METHODS: IgE to 112 allergenic molecules (components, c-sIgE) was measured using multiplex array among 509 adults and 140 school-age and 131 preschool children with asthma/wheeze from the Unbiased BIOmarkers for the PREDiction of respiratory diseases outcomes cohort, of whom 595 had severe disease. We applied clustering methods to identify co-occurrence patterns of components (component clusters) and patterns of sensitization among participants (sensitization clusters). Network analysis techniques explored the connectivity structure of c-sIgE, and differential network analysis looked for differences in c-sIgE interactions between severe and mild/moderate asthma. RESULTS: Four sensitization clusters were identified, but with no difference between disease severity groups. Similarly, component clusters were not associated with asthma severity. None of the c-sIgE were identified as associates of severe asthma. The key difference between school children and adults with mild/moderate compared with those with severe asthma was in the network of connections between c-sIgE. Participants with severe asthma had higher connectivity among components, but these connections were weaker. The mild/moderate network had fewer connections, but the connections were stronger. Connectivity between components with no structural homology tended to co-occur among participants with severe asthma. Results were independent from the different sample sizes of mild/moderate and severe groups. CONCLUSIONS: The patterns of interactions between IgE to multiple allergenic proteins are predictors of asthma severity among school children and adults with allergic asthma.

Journal article

Monod M, Blenkinsop A, Xi X, Herbert D, Bershan S, Tietze S, Bradley V, Chen Y, Coupland H, Filippi S, Ish-Horowicz J, McManus M, Mellan T, Gandy A, Hutchinson M, Unwin H, Vollmer M, Weber S, Zhu H, Bezancon A, Ferguson N, Mishra S, Flaxman S, Bhatt S, Ratmann O, Ainslie K, Baguelin M, Boonyasiri A, Boyd O, Cattarino L, Cooper L, Cucunuba Perez Z, Cuomo-Dannenburg G, Djaafara A, Dorigatti I, van Elsland S, Fitzjohn R, Gaythorpe K, Geidelberg L, Green W, Hamlet A, Jeffrey B, Knock E, Laydon D, Nedjati Gilani G, Nouvellet P, Parag K, Siveroni I, Thompson H, Verity R, Walters C, Donnelly C, Okell L, Bhatia S, Brazeau N, Eales O, Haw D, Imai N, Jauneikaite E, Lees J, Mousa A, Olivera Mesa D, Skarp J, Whittles Let al., 2020, Report 32: Targeting interventions to age groups that sustain COVID-19 transmission in the United States, Pages: 1-32

Following ini􀀂al declines, in mid 2020, a resurgence in transmission of novel coronavirus disease (COVID-19) has occurred in the United States and parts of Europe. Despite the wide implementa􀀂on of non-pharmaceu􀀂cal inter-ven􀀂ons, it is s􀀂ll not known how they are impacted by changing contact pa􀀁erns, age and other demographics. As COVID-19 disease control becomes more localised, understanding the age demographics driving transmission and how these impact the loosening of interven􀀂ons such as school reopening is crucial. Considering dynamics for the United States, we analyse aggregated, age-specific mobility trends from more than 10 million individuals and link these mechanis􀀂cally to age-specific COVID-19 mortality data. In contrast to previous approaches, we link mobility to mortality via age specific contact pa􀀁erns and use this rich rela􀀂onship to reconstruct accurate trans-mission dynamics. Contrary to anecdotal evidence, we find li􀀁le support for age-shi􀀃s in contact and transmission dynamics over 􀀂me. We es􀀂mate that, un􀀂l August, 63.4% [60.9%-65.5%] of SARS-CoV-2 infec􀀂ons in the United States originated from adults aged 20-49, while 1.2% [0.8%-1.8%] originated from children aged 0-9. In areas with con􀀂nued, community-wide transmission, our transmission model predicts that re-opening kindergartens and el-ementary schools could facilitate spread and lead to considerable excess COVID-19 a􀀁ributable deaths over a 90-day period. These findings indicate that targe􀀂ng interven􀀂ons to adults aged 20-49 are an important con-sidera􀀂on in hal􀀂ng resurgent epidemics, and preven􀀂ng COVID-19-a􀀁ributable deaths when kindergartens and elementary schools reopen.

Journal article

Teymur O, Filippi S, 2020, A Bayesian nonparametric test for conditional independence, Foundations of Data Science, Vol: 2, Pages: 155-172, ISSN: 2639-8001

This article introduces a Bayesian nonparametric method for quantifying the relative evidence in a dataset in favour of the dependence or independence of two variables conditional on a third. The approach uses Pólya tree priors on spaces of conditional probability densities, accounting for uncertainty in the form of the underlying distributions in a nonparametric way. The Bayesian perspective provides an inherently symmetric probability measure of conditional dependence or independence, a feature particularly advantageous in causal discovery and not employed in existing procedures of this type.

Journal article

Yeung E, McFann S, Marsh L, Dufresne E, Filippi S, Harrington HA, Shvartsman SY, Wühr Met al., 2020, Inference of multisite phosphorylation rate constants and their modulation by pathogenic mutations, Current Biology, ISSN: 0960-9822

Journal article

Sonntag H-J, Filippi S, Pipis S, Custovic Aet al., 2019, Blood biomarkers of sensitization and asthma, Frontiers in Pediatrics, Vol: 7, ISSN: 2296-2360

Biomarkers are essential to determine different phenotypes of childhood asthma, andfor the prediction of response to treatments. In young preschool children with asthma,aeroallergen sensitization, and blood eosinophil count of 300/µL or greater may identifythose who can benefit from the daily use of inhaled corticosteroids (ICS). We proposethat every preschool child who is considered for ICS treatment should have these twofeatures measured as a minimum before a decision is made on the commencementof long-term preventive treatment. In practice, IgE-mediated sensitization should beconsidered as a quantifiable variable, i.e., we should use the titer of sIgE antibodies orthe size of skin prick test response. A number of other blood biomarkers may proveuseful (e.g., allergen-specific IgG/IgE antibody ratios amongst sensitized individuals,component-resolved diagnostics which measures sIgE response to a large number ofallergenic molecules, assessment of immune responses to viruses, level of serum CC16,etc.), but it remains unclear whether these can be translated into clinically useful tests.Going forward, a more integrated approach which takes into account multiple domainsof asthma, from the pattern of symptoms and blood biomarkers to genetic risk andlung function measures, is needed if we are to move toward a stratified approach toasthma management.

Journal article

Jetka T, Nienałtowski K, Filippi S, Stumpf M, Komorowski Met al., 2018, An information-theoretic framework for deciphering pleiotropic and noisy biochemical signaling, Nature Communications, Vol: 9, ISSN: 2041-1723

Many components of signaling pathways are functionally pleiotropic, and signaling responses are marked with substantial cell-to-cell heterogeneity. Therefore, biochemical descriptions of signaling require quantitative support to explain how complex stimuli (inputs) are encoded in distinct activities of pathways effectors (outputs). A unique perspective of information theory cannot be fully utilized due to lack of modeling tools that account for the complexity of biochemical signaling, specifically for multiple inputs and outputs. Here, we develop a modeling framework of information theory that allows for efficient analysis of models with multiple inputs and outputs; accounts for temporal dynamics of signaling; enables analysis of how signals flow through shared network components; and is not restricted by limited variability of responses. The framework allows us to explain how identity and quantity of type I and type III interferon variants could be recognized by cells despite activating the same signaling effectors.

Journal article

Filippi SL, Muraro D, Parker A, Vaux L, Almet A, Fletcher A, Watson A, Pin C, Maini P, Byrne Het al., 2018, Chronic TNFα-driven injury delays cell migration to villi in the intestinal epithelium, Journal of the Royal Society Interface, Vol: 15, ISSN: 1742-5662

The intestinal epithelium is a single layer of cells which provides the first line of defence of the intestinal mucosa to bacterial infection. Cohesion of this physical barrier is supported by renewal of epithelial stem cells, residing in invaginations called crypts, and by crypt cell migration onto protrusions called villi; dysregulation of such mechanisms may render the gut susceptible to chronic inflammation. The impact that excessive or misplaced epithelial cell death may have on villus cell migration is currently unknown. We integrated cell-tracking methods with computational models to determine how epithelial homeostasis is affected by acute and chronic TNFα-driven epithelial cell death. Parameter inference reveals that acute inflammatory cell death has a transient effect on epithelial cell dynamics, whereas cell death caused by chronic elevated TNFα causes a delay in the accumulation of labelled cells onto the villus compared to the control. Such a delay may be reproduced by using a cell-based model to simulate the dynamics of each cell in a crypt–villus geometry, showing that a prolonged increase in cell death slows the migration of cells from the crypt to the villus. This investigation highlights which injuries (acute or chronic) may be regenerated and which cause disruption of healthy epithelial homeostasis.

Journal article

Dony L, Mackerodt J, Ward S, Filippi S, Stumpf MPH, Liepe Jet al., 2018, PEITH(Theta): perfecting experiments with information theory in Python with GPU support, Bioinformatics, Vol: 34, Pages: 1249-1250, ISSN: 1367-4803

MotivationDifferent experiments provide differing levels of information about a biological system. This makes it difficult, a priori, to select one of them beyond mere speculation and/or belief, especially when resources are limited. With the increasing diversity of experimental approaches and general advances in quantitative systems biology, methods that inform us about the information content that a given experiment carries about the question we want to answer, become crucial.ResultsPEITH(Θ) is a general purpose, Python framework for experimental design in systems biology. PEITH(Θ) uses Bayesian inference and information theory in order to derive which experiments are most informative in order to estimate all model parameters and/or perform model predictions.Availability and implementation: https://github.com/MichaelPHStumpf/Peitho

Journal article

Filippi S, Holmes C, 2017, A Bayesian nonparametric approach to testing for dependence between random variables, Bayesian Analysis, Vol: 12, Pages: 919-938, ISSN: 1931-6690

Nonparametric and nonlinear measures of statistical dependence between pairsof random variables are important tools in modern data analysis. In particularthe emergence of large data sets can now support the relaxation of linearityassumptions implicit in traditional association scores such as correlation.Here we describe a Bayesian nonparametric procedure that leads to a tractable,explicit and analytic quantification of the relative evidence for dependence vsindependence. Our approach uses Polya tree priors on the space of probabilitymeasures which can then be embedded within a decision theoretic test fordependence. Polya tree priors can accommodate known uncertainty in the form ofthe underlying sampling distribution and provides an explicit posteriorprobability measure of both dependence and independence. Well known advantagesof having an explicit probability measure include: easy comparison of evidenceacross different studies; encoding prior information; quantifying changes independence across different experimental conditions, and; the integration ofresults within formal decision analysis.

Journal article

Smith RCG, Stumpf PS, Ridden SJ, Sim A, Filippi S, Harrington HA, MacArthur BDet al., 2017, The problem of measurement in cell biology: a tale of two alleles, European Biophysics Journal with Biophysics Letters, Vol: 46, Pages: S371-S371, ISSN: 0175-7571

Journal article

Smith RCG, Stumpf PS, Ridden SJ, Sim A, Filippi SL, Harrington HA, MacArthur BDet al., 2017, Nanog fluctuations in embryonic stem cells highlight the problem of Measurement in cell biology, Biophysical Journal, Vol: 112, Pages: 2641-2652, ISSN: 1542-0086

A number of important pluripotency regulators, including the transcription factor Nanog, are observed to fluctuate stochastically in individual embryonic stem cells. By transiently priming cells for commitment to different lineages, these fluctuations are thought to be important to the maintenance of, and exit from, pluripotency. However, because temporal changes in intracellular protein abundances cannot be measured directly in live cells, fluctuations are typically assessed using genetically engineered reporter cell lines that produce a fluorescent signal as a proxy for protein expression. Here, using a combination of mathematical modeling and experiment, we show that there are unforeseen ways in which widely used reporter strategies can systematically disturb the dynamics they are intended to monitor, sometimes giving profoundly misleading results. In the case of Nanog, we show how genetic reporters can compromise the behavior of important pluripotency-sustaining positive feedback loops, and induce a bifurcation in the underlying dynamics that gives rise to heterogeneous Nanog expression patterns in reporter cell lines that are not representative of the wild-type. These findings help explain the range of published observations of Nanog variability and highlight the problem of measurement in live cells.

Journal article

Zhang Q, Filippi SL, Flaxman S, Sejdinovic Det al., 2017, Feature-to-feature regression for a two-step conditional independence test, Uncertainty in Artificial Intelligence

The algorithms for causal discovery and morebroadly for learning the structure of graphicalmodels require well calibrated and consistentconditional independence (CI) tests. We revisitthe CI tests which are based on two-step proceduresand involve regression with subsequent(unconditional) independence test (RESIT) onregression residuals and investigate the assumptionsunder which these tests operate. In particular,we demonstrate that when going beyond simplefunctional relationships with additive noise,such tests can lead to an inflated number of falsediscoveries. We study the relationship of thesetests with those based on dependence measuresusing reproducing kernel Hilbert spaces (RKHS)and propose an extension of RESIT which usesRKHS-valued regression. The resulting test inheritsthe simple two-step testing procedure ofRESIT, while giving correct Type I control andcompetitive power. When used as a componentof the PC algorithm, the proposed test is morerobust to the case where hidden variables inducea switching behaviour in the associations presentin the data.

Conference paper

Zhang Q, Filippi S, Gretton A, Sejdinovic Det al., 2017, Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, Vol: 28, Pages: 113-130, ISSN: 1573-1375

Representations of probability measures in reproducing kernel Hilbert spacesprovide a flexible framework for fully nonparametric hypothesis tests ofindependence, which can capture any type of departure from independence,including nonlinear associations and multivariate interactions. However, theseapproaches come with an at least quadratic computational cost in the number ofobservations, which can be prohibitive in many applications. Arguably, it isexactly in such large-scale datasets that capturing any type of dependence isof interest, so striking a favourable tradeoff between computational efficiencyand test performance for kernel independence tests would have a direct impacton their applicability in practice. In this contribution, we provide anextensive study of the use of large-scale kernel approximations in the contextof independence testing, contrasting block-based, Nystrom and random Fourierfeature approaches. Through a variety of synthetic data experiments, it isdemonstrated that our novel large scale methods give comparable performancewith existing methods whilst using significantly less computation time andmemory.

Journal article

Wills QF, Mellado-Gomez E, Nolan R, Warner D, Sharma E, Broxholme J, Wright B, Lockstone H, James W, Lynch M, Gonzales M, West J, Leyrat A, Padilla-Parra S, Filippi S, Holmes C, Moore MD, Bowden Ret al., 2017, The nature and nurture of cell heterogeneity: accounting for macrophage gene-environment interactions with single-cell RNA-Seq., BMC Genomics, Vol: 18, ISSN: 1471-2164

BACKGROUND: Single-cell RNA-Seq can be a valuable and unbiased tool to dissect cellular heterogeneity, despite the transcriptome's limitations in describing higher functional phenotypes and protein events. Perhaps the most important shortfall with transcriptomic 'snapshots' of cell populations is that they risk being descriptive, only cataloging heterogeneity at one point in time, and without microenvironmental context. Studying the genetic ('nature') and environmental ('nurture') modifiers of heterogeneity, and how cell population dynamics unfold over time in response to these modifiers is key when studying highly plastic cells such as macrophages. RESULTS: We introduce the programmable Polaris™ microfluidic lab-on-chip for single-cell sequencing, which performs live-cell imaging while controlling for the culture microenvironment of each cell. Using gene-edited macrophages we demonstrate how previously unappreciated knockout effects of SAMHD1, such as an altered oxidative stress response, have a large paracrine signaling component. Furthermore, we demonstrate single-cell pathway enrichments for cell cycle arrest and APOBEC3G degradation, both associated with the oxidative stress response and altered proteostasis. Interestingly, SAMHD1 and APOBEC3G are both HIV-1 inhibitors ('restriction factors'), with no known co-regulation. CONCLUSION: As single-cell methods continue to mature, so will the ability to move beyond simple 'snapshots' of cell populations towards studying the determinants of population dynamics. By combining single-cell culture, live-cell imaging, and single-cell sequencing, we have demonstrated the ability to study cell phenotypes and microenvironmental influences. It's these microenvironmental components - ignored by standard single-cell workflows - that likely determine how macrophages, for example, react to inflammation and form treatment resistant HIV reservoirs.

Journal article

Filippi S, Holmes CC, Nieto-Barajas LE, 2016, Scalable Bayesian nonparametric measures for exploring pairwise dependence via Dirichlet Process Mixtures, Electronic Journal of Statistics, Vol: 10, Pages: 3338-3354, ISSN: 1935-7524

In this article we propose novel Bayesian nonparametric methods using Dirichlet Process Mixture (DPM) models for detecting pairwise dependence between random variables while accounting for uncertainty in the form of the underlying distributions. A key criteria is that the procedures should scale to large data sets. In this regard we find that the formal calculation of the Bayes factor for a dependent-vs.-independent DPM joint probability measure is not feasible computationally. To address this we present Bayesian diagnostic measures for characterising evidence against a “null model” of pairwise independence. In simulation studies, as well as for a real data analysis, we show that our approach provides a useful tool for the exploratory nonparametric Bayesian analysis of large multivariate data sets.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00696244&limit=30&person=true