Introduction

Rustand, Denis; van Niekerk, Janet; Krainski, Elias T.; Rue, Håvard

doi:10.1201/9781003646822

Longitudinal and survival data analyses have received important attention in statistical literature in recent decades. A wide variety of regression models have been proposed to fit and evaluate the effect of covariates on these types of data. Each of these data types has important characteristics that require specific regression models. Longitudinal data involve repeated measurements, resulting in a hierarchical structure in which repeated measurements are grouped. Survival data require accounting for censoring mechanisms, among other specificities.

Many statistical methods can be used to estimate these models, including traditional frequentist and Bayesian frameworks, as well as modern machine learning techniques focused mainly on prediction. Frequentist methods, such as maximum likelihood estimation, often employ iterative algorithms like Newton-Raphson to optimize parameter estimates. In contrast, Bayesian methods aim to estimate a posterior belief of the proposed model, usually with Markov chain Monte Carlo techniques for sampling-based inference. While these approaches differ philosophically, they converge in practice under certain conditions. For instance, when non-informative priors are used in Bayesian inference, the maximum a posteriori estimates align with the maximum likelihood estimates, as the posterior distribution becomes predominantly data-driven. Despite their robustness, traditional methods can be computationally intensive, particularly for large datasets or complex models. To address this, approximate methods such as variational Bayes and expectation-propagation have been developed, but their gains in computational speed often come at the cost of lower accuracy in parameter estimates due to restrictive assumptions, such as mean-field factorization.

INLA has emerged as a highly efficient alternative, providing comparable or even superior accuracy to traditional methods while significantly reducing computation time. This opens the door for the application of regression models to large datasets and the development of more complex models that were out of reach with traditional methods. This book explores the application of INLA to a broad spectrum of regression models for longitudinal and survival data, with a particular emphasis on mixed-effects models for longitudinal data and proportional hazards models for survival data. A key focus is the joint modeling framework, which allows for the simultaneous analysis of multiple outcomes while accounting for their association. The book relies on two open-source R packages: R-INLA and INLAjoint, which serves as an interface to facilitate the use of INLA to fit survival and longitudinal data. By highlighting the versatility and computational advantages of INLA, this book demonstrates its potential to construct models that are currently beyond the reach of other software tools.

The book is organized as follows: Chapter 1 introduces the methodological framework and the range of models that can be fitted with the R package INLAjoint and the underlying INLA methodology. Chapter 2 presents survival regression models with various examples, followed by longitudinal regression models in Chapter 3. Joint models are introduced and illustrated through multiple examples in Chapter 4. Finally, the inclusion of a spatial component is presented in Chapter 5. Through a combination of theoretical insights and practical examples, the aim of this book is to equip researchers and practitioners with the tools to use INLA for advanced longitudinal and survival data analysis.