Survival analysis

Rustand, Denis; van Niekerk, Janet; Krainski, Elias T.; Rue, Håvard

doi:10.1201/9781003646822

2 Survival analysis

Abstract

In this chapter, we give a comprehensive introduction to some survival models that can be fitted with INLA, along with illustrative examples based on simulated data. Most survival models used in practice by clinicians, epidemiologists and biostatisticians can be fitted with this method and the user-friendly R package INLAjoint makes this accessible to a wide audience. The models presented include the Cox proportional hazards, accelerated failure time, stratified proportional hazards, mixture cure, competing risks, multi-state, frailty and shared frailty joint models. We show how censoring and truncation mechanisms can be handled (e.g., right censoring, interval censoring, left truncation), and various baseline hazard options (e.g., parametric with exponential, Weibull or Gompertz, as well as semi-parametric with random walks) are accommodated. Each model is described, illustrated with a fully reproducible example, interpreted, and some key quantities and figures are computed to show how to extract information efficiently.

Overview of survival models presented in this chapter.
Model	Assumption	Purpose	Section
Proportional Hazards
Exponential	Constant hazard over time	Simple baseline when risk is time-invariant	2.2.1
Weibull	Monotonically increasing/decreasing hazard	More flexibility compared to exponential but still strong assumption on the baseline hazard’s shape	2.2.2
Semi-parametric (random walks)	Smooth but flexible hazard shape	When baseline hazard shape is unknown	2.2.3
Specialized Models
Stratified PH	Proportional hazards within strata, different baselines between strata	Controlling for a categorical variable that violates PH	2.6
Mixture Cure	Population is a mix of “cured” and “at-risk” individuals	Analyzing data with a long-term survivor plateau	2.7
Competing Risks	Multiple event types censoring each other	Analyzing cause-specific event risks	2.8
Multi-state	Subjects transition through multiple transient/absorbing states	Modeling complex disease progression pathways	2.9
Frailty Model	Unobserved factors create heterogeneity in risk	Analyzing clustered data or recurrent events	2.10
Joint Frailty Model	Frailty is shared between two or more survival processes	Modeling the association between different event times	2.11

2.1 Introduction

In survival analysis, the outcome corresponds to the time elapsed until an event of interest occurs. Some patients will observe the event while other patients may not. When the follow-up of the study is over, remaining patients are censored. Some patients leave the study early (i.e., before the end of follow-up), they are lost to follow-up and considered censored at their last visit. The event is usually death, but other events of interest (e.g., progression of a disease) can be analyzed with survival models. The shape of the distribution of survival times justifies the requirement for specific models, because survival times are always positive, they often have skewed shapes of distribution and thus, statistical methods that rely on normality are not directly applicable and may produce invalid results with survival data. However, a suitable transformation of the event times, such as the logarithm or the square root can help with this issue. The main specificity in the analysis of survival times is censoring (i.e., the event is only observed for some patients).

Censoring can be categorized as follows:

Right censoring: The event time occurs after the last time point of observation.
Left censoring: The event time occurs before the first time point of observation.
Interval censoring: The event of interest is only known to occur between two time points.

Another way to classify censoring focuses on the relationship between the probability of a subject being censored and the risk of event:

Informative censoring: The risk of event for individuals in the study is different from the risk of event from censored individuals (it is similar to the “missing not at random” missing data mechanism introduced in Section 3.1).
Examples: Patient decides to stop taking the treatment because of side effects; Patient is too sick and clinicians decide to switch to another treatment (assuming independent censoring in this case would lead to underestimation of the hazard); Patient feels cured by an effective treatment and may no longer feel the need to follow-up (assuming independent censoring in this case would lead to overestimation of the hazard).
Non-informative (or random) censoring: The individual withdraws from the study for reasons not related to the study (It is similar to the “missing completely at random” missing data mechanism introduced in Section 3.1).
Examples: The event has not occurred at the end of the follow-up; Patient moves to another country and cannot continue its participation to the trial.

Censoring due to loss of follow-up should be non-informative to reduce the bias in the estimates of survival curves (i.e., the censoring time is statistically independent from the event time). For example in a clinical trial where we study disease progression as an event, drop-out can occur when the disease progression leads a patient to die or withdraw from the trial to seek other treatment options, the drop-out is then informative for the estimation of the disease progression. If the disease progression correlates with a biomarker that is being monitored, modeling the disease progression separately from the drop-out process may be inefficient and lead to biased estimates. Joint models can account for such informative drop-out but have limitations when the missing data mechanism is related to other reasons than the drop-out mechanism (see Chapter 4).

Let \(T^*\) denote the positive continuous response variable that represents the elapsed time between the beginning of the follow-up and the event of interest, usually referred to as survival time or event time. There are several ways to describe the distribution of survival times:

Survival function

The survival function is the probability to survive at least until time \(t\): \(S(t)=P(T^*>t)\). It is a decreasing function starting at 1 at time 0 that converges towards 0 as \(t\) tends towards \(+\infty\). In some cases, the survival function can converge towards a positive value, indicating that some individuals are not at risk of the event anymore. For example if we study the risk of death from a specific disease, if the disease is cured then the individual is not at risk of dying from that disease anymore. See Section 2.7 for details on models for a population with a cure fraction.

Cumulative distribution function

The cumulative distribution function (cdf) represents the probability that the event occurs before or at time \(t\): \(F(t)=P(T^*<t)=1-S(t)\).

Probability density function

The probability density function (pdf) corresponds to the probability of event in a very short time interval after \(t\): \(f(t)=\lim\limits_{h \rightarrow 0^+} \frac{P(t\leq T^* < t+h)}{h}\).

We can therefore express the relationship between the pdf and the cdf: \(F(t)=\int_{0}^{t} f(s)ds\).

Hazard function

The hazard function corresponds to the probability that the event occurs in a small interval of time after \(t\), conditionally on surviving until \(t\), i.e., the instantaneous risk of event for individuals free from the event: \(\lambda(t)=\lim\limits_{h \rightarrow 0^+} \frac{P(t\leq T^* < t+h | T^*\geq t)}{h}=\frac{f(t)}{S(t)}.\)

Cumulative hazard function

Finally, the cumulative hazard function corresponds to the cumulative hazard up to time \(t\) (i.e., the total amount of cumulated risk): \(\Lambda(t)=\int_{0}^{t} \lambda(s)ds\).

We can deduce the relationship between the survival and hazard function: \(S(t)=\exp(-\Lambda(t))=\exp(-\int_{0}^{t} \lambda(s)ds)\).

These quantities of interest can be estimated by non-parametric estimators along with regression models that describe these quantities as function of explanatory variables. A well-known non-parametric estimator of the survival function is the Kaplan-Meier estimator (with the Greenwood formula to estimate the variance). It is possible to compare the survival functions of several groups of subjects with the log-rank test. Another common estimator is the Nelson-Aalen estimator for the cumulative hazard function and its variance.

The survival curves and the log-rank tests are limited to categorical variables. Moreover, the heterogeneity of the population within groups is ignored. Regression models that describe survival as a function of explanatory variables have been introduced. With the regression models used in survival analysis, multiple independent prognostic factors can be analyzed simultaneously and covariates effects can be assessed while adjusting for heterogeneity and imbalances in the population.

For a population of \(n\) patients, we can define the observations as couples \((T_i, \delta_i)\), with \(i=1,...,n\) and the indicator variable \(\delta_i=I_{T_i^*<C_i}\), equal to 1 if the survival time is observed and 0 in case of an incomplete observation (i.e., censoring). The observed time \(T_i\) is \(T_i=\min(T_i^*, C_i)\), where \(C_i\) denote the censoring time of individual \(i\).

2.2 The proportional hazards model

The proportional hazards (PH) model (Cox (1972)) is the most commonly used statistical model to study the relationship between the survival time of patients and predictor variables. While the Kaplan-Meier curves and the log-rank test are limited to categorical variables, the PH model can include both categorical and continuous variables and study their effect on the risk of event simultaneously. The PH model is usually described by its hazard function: \[\lambda_i(t)=\lambda_0(t)\exp(\eta_i),\] where \(\lambda_0(t)\) is the baseline hazard function, corresponding to the risk of event at time \(t\) if \(\eta_i\) is equal to zero. This baseline hazard acts as a time-dependent intercept in the model, and the rest of the equation is a multiple linear regression of the logarithm of the hazard on the variables contained in \(\eta_i\). Suppose \(\eta_i = X_i^\top\gamma\), then we are usually interested in the hazard ratio \(\exp(\gamma_k)\) of a covariate \(X_k\) comparing the risk of event for patients with \(X_k=1\) to the risk for patients with \(X_k=0\) in case of binary covariate (with a continuous covariate, \(\exp(\gamma_k)\) gives the hazard ratio of a 1-unit increase in covariate \(X_k\)). A covariate associated to a hazard ratio \(>1\) increases the risk of event, meaning that it is a bad prognosis factor while a covariate associated to a hazard ratio \(<1\) is a good prognosis factor.

The key assumption of the PH model, as its name suggests, is the proportional hazards, meaning that the hazard of the event in any subgroup defined by the covariates is a constant multiple of the hazard in any other subgroup. However, when time-dependent covariates are included in the PH model, the hazard ratio between two individuals can vary over time (Fisher & Lin (1999)).

The observed data is used to estimate model parameters through the likelihood function. Each data point contributes to the likelihood of the model. There are two types of contribution to the likelihood for survival times with a standard PH model:

Individual \(i\) is censored alive at time \(t\), his contribution to the likelihood is defined by the survival function \(S(t)\), i.e., all we know is that this individual was observed for a given amount of time without experiencing the event.
Individual \(i\) experience the event of interest at time \(t\), his contribution to the likelihood corresponds to the probability density function \(f(t)=\lambda_i(t)S(t)\), i.e., we know that this individual was observed for a given amount of time and we know the event occurred at a given time.

The likelihood contribution of individual \(i\) is therefore defined as \[\begin{align*} L_i(\cdot) &= \lambda_i(t)^{\delta_i} S(t),\\ &=\lambda_i(t)^{\delta_i} \exp\left( - \int_{0}^{t} \lambda_i(s) \mathrm{d}s \right), \end{align*}\] where \(\lambda_i(t)=\lambda_{0}(t)\ \exp \{\eta_i\}\).

When there is only interest in hazard ratios (e.g., in clinical trials the objective is usually to quantify the effect of a treatment on the risk of death or progression of a disease), the Cox semi-parametric proportional hazards model (Cox (1972)) avoids the specification of the baseline hazard. It is left unspecified and the Cox PH model can be estimated using the partial likelihood approach (Cox (1975)). We are however often interested in the estimation of the baseline hazard functions (e.g., to quantify the probability of an event or to draw survival curves). Moreover, for some complex survival models, the baseline hazard function needs to be approximated as the partial likelihood technique used for the Cox model cannot be adapted. In this book, we use the INLA methodology which considers the full likelihood approach so that the proportional hazards model is an LGM, and the baseline hazard needs to be approximated as detailed in the next example.

Let’s consider the following regression model:

\[\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left(\gamma \cdot trt_i \right)\] where \(trt_i\) is a binary covariate and \(\gamma\) the corresponding log-hazard ratio.

We can simulate data from this model, assuming a constant baseline hazard, i.e. \(\lambda_0(t) = \lambda_0\). In order to do so, we use the inverse transform sampling method which allows us to simulate a random variable from any probability distribution for which we can define an invertible cumulative distribution function. For survival times generation, it is more convenient to work with the survival function \(S(t)\), which gives the probability of surviving beyond time \(t\). The core principle of the inverse transform sampling method in this context is to generate a random number \(u\) from a standard uniform distribution, \(u \sim U(0,1)\), and set it equal to the survival probability \(S(t)\). We can then solve this equation for the event time t. Based on definitions from Section 2.1, we get:

\[\begin{align*} u &= S(t)\\ &= \exp(-\lambda_0\cdot t \cdot \exp(\gamma \cdot trt_i )) \end{align*}\] We can rearrange the terms to isolate \(t\): \[t = \frac{-\log(u)}{\lambda_0\exp(\gamma \cdot trt_i)}\] We can now simulate data based on the formula we just derived:

n <- 500 # number of individuals
gamma <- 0.2 # treatment effect
baseRate <- 0.2 # baseline hazard scale
followup <- 4 # study duration
trt <- rbinom(n, 1, 0.5) # trt covariate
u <- runif(n) # uniform distribution for survival times generation
eventTimes <- pmin(-(log(u)/(baseRate*exp(trt*gamma))), followup)
eventIndic <- as.numeric(eventTimes<followup) # event indicator
datExp <- data.frame(eventTimes, eventIndic, trt) # dataset
summary(datExp)

   eventTimes         eventIndic         trt      
 Min.   :0.003652   Min.   :0.000   Min.   :0.00  
 1st Qu.:1.243587   1st Qu.:0.000   1st Qu.:0.00  
 Median :3.102436   Median :1.000   Median :0.00  
 Mean   :2.646161   Mean   :0.596   Mean   :0.49  
 3rd Qu.:4.000000   3rd Qu.:1.000   3rd Qu.:1.00  
 Max.   :4.000000   Max.   :1.000   Max.   :1.00

The column eventTimes gives observed event or censoring times (\(t\)), eventIndic is the corresponding censoring indicator (\(\delta\)) and trt is a binary covariate or interest (i.e., treatment).

2.2.1 Exponential parametric baseline hazard

The model is defined as follows: \[\lambda_0(t) = \mu \] where \(\mu>0\) is the rate parameter of the exponential distribution.

We can estimate this model with R-INLA, first we start by loading the library in R:

library(INLA)

Then we can fit the model using the inla function:

M_exp_inla <- inla(formula = inla.surv(eventTimes, eventIndic) ~ trt,
                   data=datExp, family = "exponentialsurv")

Here in the argument formula, the first part is the outcome (i.e., a combination of event time and censoring indicator combined in an inla.surv object, similar to the Surv object in the R package survival (Therneau (2023)), see ?inla.surv in R after loading the R-INLA library for more details), which is described as a function of covariate trt. The dataset is provided in the data argument and the family is set to exponentialsurv, which is a parametric function assuming an exponential baseline hazard. We can display summary statistics over the posterior distribution of the parameters with the summary function:

summary(M_exp_inla)

Time used:
    Pre = 0.409, Running = 0.205, Post = 0.00859, Total = 0.622 
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) -1.565 0.083     -1.727   -1.565     -1.403 -1.565   0
trt          0.144 0.116     -0.083    0.144      0.371  0.144   0

Marginal log-Likelihood:  -748.98 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The results show the computation time followed by parameters estimates, the first one (i.e., Intercept) corresponds to the logarithm of the rate parameter for the exponential baseline hazard while the second one is our trt effect \(\gamma\). As we can see, this treatment effect matches well with the true value (i.e., 0.2). For the rate of the exponential, we can exponentiate it to compare with the true value (0.2). The summary of fixed effects is available in the summary.fixed argument of the fitted object:

exp(M_exp_inla$summary.fixed$mean[1])

[1] 0.2091438

We may however be interested in the uncertainty around this value which we only have on the log scale in R-INLA’s output. The function inla.tmarginal allows to transform posterior marginals with any function while the function inla.zmarginal can compute summary statistics over the transformed marginals. The posterior marginal distributions in an R-INLA fitted model are available directly in the fitted object as the marginals.fixed argument:

str(M_exp_inla$marginals.fixed)

List of 2
 $ (Intercept): num [1:43, 1:2] -1.92 -1.87 -1.82 -1.76 -1.73 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:2] "x" "y"
 $ trt        : num [1:43, 1:2] -0.3504 -0.2872 -0.2143 -0.1258 -0.0834 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:2] "x" "y"

We can have a look by plotting them (Figure 2.1):

par(mfrow=c(1,2), mar = c(4,4,2,0.3), cex=0.6)
plot(M_exp_inla$marginals.fixed[[1]],type="l",main="log exponential rate")
plot(M_exp_inla$marginals.fixed[[2]],type="l",main="trt effect")

Figure 2.1: Posterior marginal distributions from an `R-INLA` model fit. The left plot shows the posterior for the log exponential rate, and the right plot shows the posterior for the log-hazard ratio of the treatment effect.

And we can look at the transformed exponential rate as well as the hazard ratio of trt (instead of the log-hazard ratio as provided in R-INLA’s output) as shown in Figure 2.2:

par(mfrow=c(1,2), mar = c(4,4,2,0.3), cex=0.6)
Exp_rate_marg <- inla.tmarginal(exp, M_exp_inla$marginals.fixed[[1]])
inla.zmarginal(Exp_rate_marg)

Mean            0.209852 
Stdev           0.0172854 
Quantile  0.025 0.177922 
Quantile  0.25  0.197775 
Quantile  0.5   0.209115 
Quantile  0.75  0.221106 
Quantile  0.975 0.245774

plot(Exp_rate_marg, type="l", main="exponential rate")
HR_trt_marg <- inla.tmarginal(exp, M_exp_inla$marginals.fixed[[2]])
inla.zmarginal(HR_trt_marg)

Mean            1.16229 
Stdev           0.134243 
Quantile  0.025 0.920696 
Quantile  0.25  1.06769 
Quantile  0.5   1.15439 
Quantile  0.75  1.24813 
Quantile  0.975 1.44737

plot(HR_trt_marg, type="l", main="trt hazard ratio")

Figure 2.2: Posterior marginal distributions of the exponential rate and treatment hazard ratio. These plots show the posteriors from the `R-INLA` fit, transformed to their natural scales for easier interpretation.

The hazard ratio is easier to interpret and thus often favored as it directly multiplies the reference risk. In this example we could interpret as follows: The risk of event for individuals with trt=1 is increased by 16% on average compared to individuals with trt=0 with a 95% credible interval giving uncertainty between -8% and 45%.

We can fit the same model with INLAjoint, although the syntax is similar, the output is simplified:

library(INLAjoint)
M_exp <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
               dataSurv=datExp, basRisk = "exponentialsurv")

Here the argument formSurv is the formula for the time-to-event outcome, with the response given as an inla.surv() object while dataSurv is the dataset that must contain the variables given in the formula. These two arguments are different from the usual formula and data to avoid confusion since INLAjoint can also deal with longitudinal models simultaneously. The argument basRisk gives the baseline hazard of event. Here we define the parametric exponentialsurv but other options are discussed in following sections.

summary(M_exp)


Survival outcome
                     mean     sd 0.025quant 0.5quant 0.975quant
Exponential (rate) 0.2099 0.0171     0.1782   0.2091     0.2455
trt                0.1436 0.1159    -0.0835   0.1436     0.3707

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -751.0592                             -751.0592 

Deviance Information Criterion:  1486.872
Widely applicable Bayesian information criterion:  1486.873
Computation time: 0.39 seconds

The summary of the model directly gives the rate of the exponential baseline hazard and the log hazard ratio of treatment effect. To get the hazard ratio of treatment effect, the argument hr can be switched to TRUE in the call of the summary function:

summary(M_exp, hr=TRUE)


Survival outcome
                   exp(mean)     sd 0.025quant 0.5quant 0.975quant
Exponential (rate)    0.2099 0.0171     0.1782   0.2091     0.2455
trt                   1.1620 0.1329     0.9223   1.1543     1.4445

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -751.0592                             -751.0592 

Deviance Information Criterion:  1486.872
Widely applicable Bayesian information criterion:  1486.873
Computation time: 0.39 seconds

Another difference with R-INLA’s raw output is the automatic computation of two goodness of fit criteria that can be used for model selection: the Deviance Information Criterion (DIC, Spiegelhalter et al. (2002)) and the Widely Applicable Bayesian Information Criterion (WAIC, Watanabe (2013)), available as:

paste0("DIC: ", M_exp$dic$dic, ", WAIC: ", M_exp$waic$waic)

[1] "DIC: 1486.87, WAIC: 1486.87"

We can plot the fitted model directly with plot(M_exp) (see Section 1.5.4) to get the posterior marginals of all model parameters and the baseline hazard (Figure 2.3, Figure 2.4, Figure 2.5). INLAjoint uses the ggplot2 library for these plots (Wickham (2016)):

library(ggplot2)
plot(M_exp)

Figure 2.3: Posterior marginal distribution of the treatment effect (\(\gamma\)) from an exponential baseline hazard model. The dashed red line indicates the true value (0.2) used for data generation.

Figure 2.4: Posterior marginal distribution of the rate parameter (\(\mu\)) from an exponential baseline hazard model. The dashed red line indicates the true value (0.2) used for data generation.

Figure 2.5: Posterior estimate of the exponential baseline hazard function. The solid black line shows the posterior mean, the shaded region represents the 95% credible interval, and the dashed red line indicates the true constant value (0.2) used for data generation.

Here we can see the model is able to find the true value of all the parameters, where the last plot shows the baseline hazard function over time (which is constant by definition here).

2.2.2 Weibull parametric baseline hazard & accelerated failure time model

The exponential baseline hazard is based on the often too simplistic assumption that the risk remains constant over time. We can add flexibility by allowing the risk to increase or decrease over time with the Weibull parametric distribution. This distribution has an additional shape parameter that controls the evolution of the baseline hazard over time, when shape>1, the hazard increases over time while it decreases when shape<1. When shape=1 the model reduces to the exponential and the baseline hazard is constant over time, the scale parameter then corresponds to the exponential rate.

With R-INLA, the Weibull likelihood has 2 variants which are fitting the same model but with different parametrizations (see inla.doc("weibull") for more details). The default variant in INLAjoint is variant 0, i.e., assuming the following definition of the hazard function:

\[\lambda_0(t) = \alpha \cdot t^{\alpha-1} \cdot \mu \cdot \exp(-\mu \cdot t^\alpha), \ \ \ \ \alpha>0, \ \ \ \ \ \mu>0 \] while variant 1 is defined as :

\[\lambda_0(t) = \alpha \cdot t^{\alpha-1} \cdot \mu^\alpha \cdot \exp(-(\mu \cdot t)^\alpha), \ \ \ \ \alpha>0, \ \ \ \ \ \mu>0 \]

These two parametrizations are mathematically equivalent formulations of the same underlying Weibull model based on a transformation of the \(\mu\) parameter and their survival functions become identical under the transformation \(\mu_0=\mu_1^\alpha\).

Each variant’s structure aligns naturally with one of the two primary modeling frameworks in survival analysis, i.e., variant=0 is structured as a natural Proportional Hazards model, where covariates act multiplicatively on the baseline hazard. In contrast, variant=1 is structured as a natural Accelerated Failure Time (AFT) model, where covariates act to accelerate or decelerate the event timescale.

The Weibull distribution is unique in being the only commonly used distribution that belongs to both the PH and AFT model families. This means a single fitted model can be interpreted in two ways, and the parameters from one framework can be transformed into the other. We demonstrate this equivalence in the following.

We can start by fitting this model with INLAjoint, using the default variant 0, by just switching the basRisk argument to weibullsurv:

M_wei <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
               dataSurv=datExp, basRisk = "weibullsurv")
summary(M_wei)


Survival outcome
                  mean     sd 0.025quant 0.5quant 0.975quant
Weibull (shape) 1.0096 0.0406     0.9404   1.0067     1.0847
Weibull (scale) 0.2091 0.0198     0.1725   0.2083     0.2501
trt             0.1438 0.1159    -0.0835   0.1438     0.3710

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -748.4484                             -749.3939 

Deviance Information Criterion:  1488.024
Widely applicable Bayesian information criterion:  1488.03
Computation time: 0.4 seconds

The key advantage of the Weibull model over the exponential model is the inclusion of the shape parameter \(\alpha\), which provides insight into the evolution of the hazard over time.

Again, the plot function can be used to display all posterior marginals and the baseline hazard evolution over time. Specific plots can be directly called by their name, for example for the baseline hazard:

plot(M_wei)$Baseline

Figure 2.6: Posterior estimate of the baseline hazard from a Weibull model fitted to data with a true constant hazard. The solid black line is the posterior mean, the shaded region is the 95% credible interval, and the dashed red line indicates the true constant rate (0.2) used to generate data observations.

The posterior distribution for the shape parameter includes 1 in its 95% credible interval, meaning that the exponential baseline hazards is a reasonable assumption (it is indeed the true model we simulated from!). We can simulate data where the baseline hazard evolves over time (i.e., Weibull baseline), using the same method as for the exponential baseline hazard. The survival function for the current example is \(S(t) = \exp(-\mu\cdot t^\alpha \cdot\exp(\gamma \cdot trt_i))\). To simulate an event time \(t\), we again set a uniform random variable \(u \sim U(0,1)\) equal to the survival probability and solve for t: \[\log(u) = -\mu\cdot t^\alpha\exp(\gamma \cdot trt_i)\] Now we can rearrange the terms to isolate \(t\): \[t = \left(\frac{-\log(u)}{\mu\cdot \exp(\gamma \cdot trt_i)}\right)^{1/\alpha}\] This allows to simulate event times \(t\) based on uniform random numbers, the scale (\(\mu\)), the shape (\(\alpha\)) and regression parameters such as treatment effect (\(\gamma\)):

n=1000 # number of individuals
gamma=-1 # treatment effect
baseScale=0.2 # baseline hazard scale
baseShape=1.3 # baseline hazard shape
followup=4 # study duration
trt <- rbinom(n, 1, 0.5) # trt covariate
u <- runif(n) # uniform distribution for survival times generation
eventTimes <- pmin((-(log(u))/(baseScale*exp(trt*gamma)))^(1/baseShape), 
                   followup)
eventIndic <- as.numeric(eventTimes<followup) # event indicator
datWei <- data.frame(eventTimes, eventIndic, trt) # dataset
summary(datWei)

   eventTimes        eventIndic         trt      
 Min.   :0.04007   Min.   :0.000   Min.   :0.00  
 1st Qu.:1.68252   1st Qu.:0.000   1st Qu.:0.00  
 Median :3.70431   Median :1.000   Median :0.00  
 Mean   :2.90042   Mean   :0.535   Mean   :0.48  
 3rd Qu.:4.00000   3rd Qu.:1.000   3rd Qu.:1.00  
 Max.   :4.00000   Max.   :1.000   Max.   :1.00

Now the shape parameter is set to 1.3, so that the baseline hazard increases over time. We can fit the model and plot it as in Figure 2.7:

M_wei2 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                dataSurv=datWei, basRisk = "weibullsurv")
summary(M_wei2)
plot(M_wei2)$Baseline


Survival outcome
                   mean     sd 0.025quant 0.5quant 0.975quant
Weibull (shape)  1.2811 0.0505     1.1838   1.2805     1.3819
Weibull (scale)  0.2012 0.0154     0.1722   0.2006     0.2329
trt             -0.9883 0.0933    -1.1713  -0.9883    -0.8054

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -1373.207                             -1373.595 

Deviance Information Criterion:  2734.103
Widely applicable Bayesian information criterion:  2733.998
Computation time: 0.44 seconds

Figure 2.7: Posterior estimate of the baseline hazard from a Weibull model fitted to data with a true increasing hazard. The solid black line is the posterior mean, the shaded region is the 95% credible interval, and the dashed red line represents the true Weibull hazard function used for data generation.

It is possible to switch variant in the control options of INLAjoint:

M_wei.v1 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                  dataSurv=datWei, basRisk = "weibullsurv",
                  control=list(variant=1))
summary(M_wei.v1)


Survival outcome
                   mean     sd 0.025quant 0.5quant 0.975quant
Weibull (shape)  1.2772 0.0505     1.1801   1.2765     1.3778
Weibull (scale)  0.2855 0.0116     0.2633   0.2853     0.3089
trt             -0.7741 0.0770    -0.9278  -0.7732    -0.6257

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -1373.690                             -1374.079 

Deviance Information Criterion:  2734.152
Widely applicable Bayesian information criterion:  2734.041
Computation time: 0.41 seconds

The results are slightly different but the underlying model is similar, one can match these results with basic transformations as shown below.

Another important point is the equivalence between the proportional hazards and the accelerated failure time model with the Weibull baseline hazard. Regardless of the variant, we can picture the model as PH or AFT as illustrated below (and compared with standard software for PH (R package eha, Broström (2022)) and AFT (R package survival, Terry M. Therneau & Patricia M. Grambsch (2000))):

library(survival) # survreg for AFT comparison
library(eha) # PHreg for PH comparison
PH <- phreg(Surv(datWei$eventTimes,datWei$eventIndic)~datWei$trt)$coef
PH_scale <- c(-M_wei2$summary.fixed$mean[1]/M_wei2$summary.hyperpar$mean,
              -M_wei.v1$summary.fixed$mean[1], PH[2])
PH_shape <- c(log(M_wei2$summary.hyperpar$mean),
              log(M_wei.v1$summary.hyperpar$mean), PH[3])
PH_trt <- c(M_wei2$summary.fixed$mean[2], M_wei.v1$summary.fixed$mean[2]*
              M_wei.v1$summary.hyperpar$mean, PH[1])

AFT <- survreg(Surv(datWei$eventTimes,datWei$eventIndic)~datWei$trt)
AFT_scale <- c(-M_wei2$summary.fixed$mean[1]/M_wei2$summary.hyperpar$mean,
               -M_wei.v1$summary.fixed$mean[1], AFT$coefficients[1])
AFT_shape <- c(M_wei2$summary.hyperpar$mean,
               M_wei.v1$summary.hyperpar$mean, 1/AFT$scale)
AFT_trt <- c(-M_wei2$summary.fixed$mean[2]/M_wei2$summary.hyperpar$mean,
             -M_wei.v1$summary.fixed$mean[2], AFT$coefficients[2])

res <- round(data.frame(cbind(PH_scale, PH_shape, PH_trt, 
                        AFT_scale, AFT_shape, AFT_trt)),3)
rownames(res) <- c("Weibull variant 0", "Weibull variant 1", "PH/AFT");res

                  PH_scale PH_shape PH_trt AFT_scale AFT_shape AFT_trt
Weibull variant 0    1.254    0.248 -0.988     1.254     1.281   0.771
Weibull variant 1    1.254    0.245 -0.989     1.254     1.277   0.774
PH/AFT               1.251    0.258 -0.989     1.251     1.294   0.764

We can see that both Weibull variants can be pictured as proportional hazards or accelerated failure time models, after some transformations of the posterior marginals. Of note, the Gompertz distribution is also an option in INLAjoint, although not presented here to avoid redundancies. However, an illustration of the defective Gompertz approach is presented in Section 2.7.2, which is an extension of the standard Gompertz distribution for baseline hazard to model data with a cure fraction.

2.2.3 Semi-parametric baseline hazard approximation

There are several ways to define the baseline hazard function \(\lambda_0(t)\). In a parametric proportional hazards model, the baseline is defined as a specific function of time and additional parameters, the most common choices are the exponential, Weibull and Gompertz distributions. Piecewise constant functions do not make specific shape assumptions for the baseline hazard risk function, but tend to lack flexibility because of the assumption of piecewise constant hazards (and therefore “brutal” jumps in the hazard) with the risk of overfitting if too many intervals are defined. Additional flexible parametric methods (e.g., splines) have been proposed for the approximation of the baseline distribution function (Royston & Parmar (2002)).

Spline functions should be used with caution as their flexibility relies on an appropriate choice of the number of knots. They might overfit the data and provide a rough (i.e., not smooth) hazard function, which is usually not suitable as the risk function in the population of interest is usually expected to be smooth. For example Rondeau et al. (2007) uses penalization of the second-order derivatives of the splines, in order to prevent abrupt changes in the baseline hazard. Choosing the smoothing parameter is the difficult part of this method and can be done by approximate cross-validation techniques. Another approach uses random walks of order 1 or 2 (sometimes referred to as “Bayesian smooth splines”), this approach is flexible and requires only a single parameter (the precision of the random walk) since the baseline hazard then relies on the data with a smoothness constraint (Alvares et al. (2024)). It has been shown to outperform traditional splines approximation in terms of accuracy (Rustand et al. (2023)) and converges to a stable shape when the number of knots is increased.

Alternatively, leaving the baseline hazard completely unspecified is possible when the partial likelihood technique is applied but the estimation of the standard errors of the regression parameters is computationally intensive (especially for joint models) and prediction of survival probabilities is cumbersome (Xu et al. (2020)).

The baseline hazard is not observed directly and we may not know its expected shape. If it is not monotonic, the parametric approaches described above will fail to capture it properly, which is why the Cox model is often preferred as it makes no assumption on the shape of the baseline.

It has been shown that the full likelihood of a Cox model can be approximated with that of a Poisson regression by Holford (1976), Whitehead (1980) and Johansen (1983). The key idea is to convert the continuous-time hazard function into a discrete-time event count. If the time intervals are small enough, the hazard can be considered roughly constant within each interval. Then, the number of events in each interval follows a Poisson distribution, where the mean is proportional to the hazard in that interval.

The Poisson approximation in discrete time is based on the assumption that the number of events \(N_i\) in a small time interval of length \(\Delta t\) follows a Poisson distribution: \[N_i|\mu_i \sim \textrm{Poisson}(\mu_i)\] where the mean \(\mu_i\) is proportional to the hazard within the interval such that: \[\mu_i = \lambda_i\left(t\right) \Delta t.\] This approximation becomes more accurate as the time intervals become shorter. However, it’s important to note that this approximation may introduce error if the hazard changes rapidly within the intervals but if each unique event time is made into its own interval the partial likelihood approach for the Cox model can be replicated exactly. This is the method used to fit the Cox proportional hazards model with INLA.

In practice, we use an offset term to match exactly the event time and censoring time when they occur within an interval. Moreover, since more intervals involve a more flexible approximation of the baseline, INLAjoint uses random walk priors instead of piecewise constant approximation to avoid overfitting and ensure a smooth baseline hazard function. The Poisson regression model then fits with the LGM framework described in Chapter 1, and can be fitted with INLA.

In order to see how this “Poisson trick” works with R-INLA, we first need to decompose the follow-up into intervals. This is done with the inla.coxph function. Note that with INLAjoint, this step is done internally as illustrated later in this section (we show in details how this is done with R-INLA for transparency and a better understanding of how INLAjoint works).

Expanded_data <- inla.coxph(formula = inla.surv(eventTimes, 
                                                eventIndic) ~ trt,
                            control.hazard = list(model = "rw1"),
                            data=datWei)
names(Expanded_data)

[1] "formula"        "data"           "data.list"     
[4] "family"         "E"              "expand.df"     
[7] "control.hazard"

The object created contains some information required to fit the model. We can compare the data for the first individual in the original dataset:

datWei[1,]

  eventTimes eventIndic trt
1   2.427109          1   0

with the corresponding data in the expanded dataset:

Expanded_data$data[Expanded_data$data$expand..coxph==1,1:5]

   y..coxph   E..coxph expand..coxph baseline.hazard baseline.hazard.idx
1         0 0.26666667             1       0.0000000                   1
2         0 0.26666667             1       0.2666667                   2
3         0 0.26666667             1       0.5333333                   3
4         0 0.26666667             1       0.8000000                   4
5         0 0.26666667             1       1.0666667                   5
6         0 0.26666667             1       1.3333333                   6
7         0 0.26666667             1       1.6000000                   7
8         0 0.26666667             1       1.8666667                   8
9         0 0.26666667             1       2.1333333                   9
10        1 0.02710923             1       2.4000000                  10

There are now 10 lines in the expanded dataset for individual 1 where each line corresponds to an interval of the follow-up. For example the third line corresponds to an interval of size 0.27 (column E..coxph) that starts at time 0.53 (column baseline.hazard) without observing the event (column y..coxph). The last line is a smaller interval as it stops at the event time.

The cutpoints are defined automatically as equidistant between 0 and the maximum observed follow-up time. We can see the cutpoints in the object created:

Expanded_data$data.list # cutpoints
seq(0, max(datWei$eventTimes), len=16) # default is equidistant

$baseline.hazard.values
 [1] 0.0000000 0.2666667 0.5333333 0.8000000 1.0666667 1.3333333
 [7] 1.6000000 1.8666667 2.1333333 2.4000000 2.6666667 2.9333333
[13] 3.2000000 3.4666667 3.7333333 4.0000000

 [1] 0.0000000 0.2666667 0.5333333 0.8000000 1.0666667 1.3333333
 [7] 1.6000000 1.8666667 2.1333333 2.4000000 2.6666667 2.9333333
[13] 3.2000000 3.4666667 3.7333333 4.0000000

The default number of intervals is 15 but it can be changed in control.hazard when creating the expanded dataset, by setting the argument n.intervals (see ?control.hazard for more details on available arguments). It is also possible to manually setup these cutpoints with the argument cutpoints (which then overwrites the argument n.intervals). Finally the argument model allows to choose between random walk of order one (model = "rw1") and order two (model = "rw2").

We can then fit the model with R-INLA:

datRW <- append(Expanded_data$data.list, Expanded_data$data)
M_INLA_rw1 <- inla(Expanded_data$formula, data = datRW,
                   E=Expanded_data$E, family = Expanded_data$family)
summary(M_INLA_rw1)

Time used:
    Pre = 0.217, Running = 0.279, Post = 0.0323, Total = 0.528 
Fixed effects:
              mean    sd 0.025quant 0.5quant 0.975quant   mode kld
(Intercept) -1.236 0.055     -1.343   -1.236     -1.129 -1.236   0
trt         -0.982 0.094     -1.165   -0.982     -0.798 -0.982   0

Random effects:
  Name    Model
    baseline.hazard RW1 model

Model hyperparameters:
                               mean    sd 0.025quant 0.5quant 0.975quant  mode
Precision for baseline.hazard 30.97 41.33       4.48    20.09     119.12 10.49

Marginal log-Likelihood:  -2641.08 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

The results include two fixed effects, where the intercept corresponds to the average log-baseline hazard and deviations from this average are captured by the random walk (i.e., the random walk has a sum-to-zero constraint). It is also possible to remove this constraint in the control.hazard arguments, then the average of the random walk replaces this intercept (this is the strategy used in INLAjoint). The precision of the random walk (i.e., inverse of the variance) is given as an hyperparameter, the lower the value, the higher the deviations from the average baseline hazard. Note that this model can also be replicated in R-INLA with the coxph likelihood, although this approach is limited to the standard Cox PH. In the following, we show how to replicate this model with INLAjoint which facilitates the use of INLA for such models.

We start by defining the random walk of order one proportional hazards model:

M_rw1 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
               dataSurv=datWei, basRisk = "rw1")
summary(M_rw1)


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0540 0.0378     0.0106   0.0443     0.1549
trt                      -0.9838 0.0934    -1.1670  -0.9838    -0.8005

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -2629.927                             -2630.318 

Deviance Information Criterion:  5231.019
Widely applicable Bayesian information criterion:  5230.844
Computation time: 1.1 seconds

The summary shows the variance of the random walk and the log hazard ratio of treatment effect. As illustrated for parametric proportional hazards model, we can directly plot the baseline hazards with the plot function, as shown in Figure 2.8:

plot(M_rw1)$Baseline

Figure 2.8: Posterior estimate of the baseline hazard function using a random walk of order 1. The solid black line is the posterior mean, the shaded region is the 95% credible interval, and the dashed red line is the true underlying Weibull hazard function.

We can also fit the random walk of order two, which is smoother due to the smoothness constraint relying on the second order closest neighbors:

M_rw2 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
               dataSurv=datWei, basRisk = "rw2")
summary(M_rw2)
plot(M_rw2)$Baseline


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0292 0.0335     0.0007   0.0185     0.1234
trt                      -0.9884 0.0934    -1.1716  -0.9884    -0.8052

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -2636.627                             -2637.097 

Deviance Information Criterion:  5228.764
Widely applicable Bayesian information criterion:  5228.673
Computation time: 0.97 seconds

Figure 2.9: Posterior estimate of the baseline hazard function using a random walk of order 2. The solid black line is the posterior mean, the shaded region is the 95% credible interval, and the dashed red line is the true underlying Weibull hazard function.

An important point is the number of intervals to decompose the follow-up, the default for both random walk methods is 15 equidistant bins but increasing the number of bins will get us closer to the stochastic model and does not lead to overfitting as observed with commonly used splines (Van Niekerk et al. (2023)). We can compare the impact of the number and position of bins. First we fit a model with 5 intervals, this is set with the argument NbasRisk:

M_rw2.2 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                 dataSurv=datWei, basRisk = "rw2", NbasRisk = 5)
summary(M_rw2.2)


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0159 0.0265     0.0000   0.0060     0.0920
trt                      -0.9874 0.0934    -1.1706  -0.9874    -0.8042

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -2028.764                             -2029.244 

Deviance Information Criterion:  4048.04
Widely applicable Bayesian information criterion:  4048.049
Computation time: 0.49 seconds

Covariate effect is very similar to what we had with the default 15 intervals. We can try with 50 intervals:

M_rw2.3 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                 dataSurv=datWei, basRisk = "rw2", NbasRisk = 50)
summary(M_rw2.3)


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0292 0.0322     0.0009   0.0192     0.1195
trt                      -0.9906 0.0934    -1.1737  -0.9906    -0.8074

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -3403.391                             -3403.863 

Deviance Information Criterion:  6520.62
Widely applicable Bayesian information criterion:  6520.528
Computation time: 1.86 seconds

Still nearly identical, meaning that the baseline hazard is well approximated, even with a few intervals. This may not be true for more complex examples with important changes in the trend of the baseline hazard. We can set manually the position of the bins with the argument cutpoints, here for example we include many points in the early/late follow-up:

CTPS <- c(seq(0,1, len=10), 2, seq(3, 4, len=5))
M_rw2.4 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                 dataSurv=datWei, basRisk = "rw2", cutpoints = CTPS)
summary(M_rw2.4)


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0426 0.0447     0.0019   0.0291     0.1673
trt                      -0.9873 0.0934    -1.1705  -0.9873    -0.8041

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -2370.069                             -2370.533 

Deviance Information Criterion:  4675.4
Widely applicable Bayesian information criterion:  4675.266
Computation time: 0.82 seconds

The results seem unchanged. We can look at the corresponding shape of each baseline hazard curve we just fitted:

P_wei <- plot(M_wei2)$Baseline
par(mfrow=c(1,3), oma = c(0.3,0.3,0,0.3), 
    mar = c(4,4,4,0.3), xpd=FALSE, cex.lab=1)
P_rw2.2 <- plot(M_rw2.2)$Baseline
P_rw2.3 <- plot(M_rw2.3)$Baseline
P_rw2.4 <- plot(M_rw2.4)$Baseline
plot(P_rw2.2$data$ID, exp(P_rw2.2$data$q5quant), type="o", lwd=2, pch=19, 
     ylim=c(0,0.6), main="RW2 (5 bins)", ylab="Baseline hazard", xlab="")
lines(P_rw2.2$data$ID, exp(P_rw2.2$data$q025quant))
lines(P_rw2.2$data$ID, exp(P_rw2.2$data$q975quant))
plot(P_rw2.4$data$ID, exp(P_rw2.4$data$q5quant), type="o", lwd=2, pch=19, 
     ylim=c(0,0.6), main="RW2 (irregular bins)", ylab="", xlab="Time")
lines(P_rw2.4$data$ID, exp(P_rw2.4$data$q975quant))
lines(P_rw2.4$data$ID, exp(P_rw2.4$data$q025quant))
plot(P_rw2.3$data$ID, exp(P_rw2.3$data$q5quant), type="o", lwd=2, pch=19, 
     ylim=c(0,0.6), main="RW2 (50 bins)", ylab="", xlab="")
lines(P_rw2.3$data$ID, exp(P_rw2.3$data$q975quant))
lines(P_rw2.3$data$ID, exp(P_rw2.3$data$q025quant))

Figure 2.10: Comparison of RW2 baseline hazard estimates under different interval specifications. Each plot shows the posterior mean (points and line) and 95% credible interval for the baseline hazard estimated with: 5 equidistant bins (left), irregularly spaced bins (middle), and 50 equidistant bins (right). The true Weibull hazard is the dashed red line.

We can see how the number and position of the bins control the flexibility of the approximation, without any cost of having “too many” bins, unless for the increased computation time. The choice of the number of bins therefore involves a trade-off between flexibility and computational cost. For smoothly varying hazards, the default value of NbasRisk (15 intervals) is often sufficient. For studies with long follow-up or where the hazard is expected to be highly dynamic, increasing this value can provide greater flexibility. As demonstrated, the random walk prior provides strong regularization against overfitting, making the model robust to a large number of intervals.

A sensitivity analysis is always a recommended practice to ensure the results are stable. Beyond the random walks, it is technically possible to define any function for the baseline hazards and while this book focuses on the currently available options, new options could be developed if needed.

2.2.4 Illustrative example

To make the theoretical distinctions between parametric and semi-parametric baseline hazard models more concrete, we now apply these approaches to a real-world dataset. This practical illustration relies on the clinical trial data for primary biliary cholangitis (pbc2 dataset), which was previously introduced in Chapter 1. The objective is to visually compare the posterior estimates of the baseline hazard function under four distinct sets of assumptions: the constant hazard of the Exponential model, the monotonically constrained hazard of the Weibull model, and the flexible, smooth hazards estimated using first-order (RW1) and second-order (RW2) random walks.

This comparison will serve to highlight the strengths and limitations of each approach, demonstrating how model flexibility interacts with data and how restrictive assumptions can influence the resulting inference about the underlying event risk.

We first load and prepare the pbc2 dataset, extracting the relevant time-to-event data for the analysis (we consider the hazard of transplantation as a function of drug). Subsequently, we fit the four distinct survival models, each defined by a different assumption for the baseline hazard. Then, we can extract the posterior mean and 95% credible intervals for the baseline hazard from each model and plot these estimates for a direct visual comparison.

library(JMbayes2) # contains dataset
pbc2$trans <- ifelse(pbc2$status=="transplanted",1,0) # event
# extracting time to event data
SurvData <- pbc2[c(which(diff(
  as.numeric(pbc2[,which(colnames(pbc2)=="id")]))==1),
  length(pbc2[,which(colnames(pbc2)=="id")])),c("id", "years", 
                                                "trans","drug")]
# Exponential baseline hazard
pbc_exp <- joint(formSurv = inla.surv(years, trans) ~ drug,
                 dataSurv = SurvData, basRisk="exponentialsurv")
P_exp <- plot(pbc_exp)$Baseline
# Weibull baseline hazard
pbc_wei <- joint(formSurv = inla.surv(years, trans) ~ drug,
                  dataSurv = SurvData, basRisk="weibullsurv")
P_wei <- plot(pbc_wei)$Baseline
# smooth semiparametric baseline hazard (random walk order 1)
pbc_rw1 <- joint(formSurv = inla.surv(years, trans) ~ drug,
               dataSurv = SurvData, basRisk="rw1", NbasRisk = 100)
P_rw1 <- plot(pbc_rw1)$Baseline # posterior marginals and baseline hazard
# smooth semiparametric baseline hazard (random walk order 2)
pbc_rw2 <- joint(formSurv = inla.surv(years, trans) ~ drug,
              dataSurv = SurvData, basRisk="rw2", NbasRisk = 100)
P_rw2 <- plot(pbc_rw2)$Baseline
plot(P_exp$data$x, P_exp$data$y, type="l", lwd=2, ylim=c(0,0.055), 
     ylab="Baseline hazard", xlab="Time")
lines(P_exp$data$x, P_exp$data$lower, lty=1, col=1)
lines(P_exp$data$x, P_exp$data$upper, lty=1, col=1)
lines(P_wei$data$x, P_wei$data$y, lty=2, lwd=2, col=2)
lines(P_wei$data$x, P_wei$data$lower, lty=2, col=2)
lines(P_wei$data$x, P_wei$data$upper, lty=2, col=2)
lines(P_rw1$data$ID, exp(P_rw1$data$q5quant), col=3, lwd=2, lty=4)
lines(P_rw1$data$ID, exp(P_rw1$data$q025quant), col=3, lwd=1, lty=4)
lines(P_rw1$data$ID, exp(P_rw1$data$q975quant), col=3, lwd=1, lty=4)
lines(P_rw2$data$ID, exp(P_rw2$data$q5quant), col=4, lwd=2, lty=5)
lines(P_rw2$data$ID, exp(P_rw2$data$q025quant), col=4, lwd=1, lty=5)
lines(P_rw2$data$ID, exp(P_rw2$data$q975quant), col=4, lwd=1, lty=5)
legend("topleft", c("Exponential", "Weibull", "Rw1", "Rw2"), 
       lty=c(1,2,4,5), lwd=c(2,2,2,2), col=c(1,2,3,4), seg.len = 5)

Figure 2.11: Comparison of estimated baseline hazard functions for the PBC dataset. The plot shows the posterior mean and 95% credible interval of the baseline hazard from four models: Exponential (black solid line), Weibull (red dashed line), RW1 (green dotted-dashed line), and RW2 (blue long-dashed line).

The exponential model provides a constant hazard that represents a reasonable average risk over the follow-up period but fails to capture the temporal dynamics of the event risk. The Weibull model, with its added flexibility, captures an initial increase in the baseline hazard, aligning well with the random walk estimates during the first two years where the density of observed events is highest. However, its rigid parametric assumption of monotonicity forces it to overlook the subsequent change in trend, where a plateau and potential decrease is suggested by the more flexible random walk models. This difference highlights how parametric assumptions can introduce bias, particularly when the true hazard shape is non-monotonic.

The random walk models, by contrast, are sufficiently flexible to adapt to the data across the entire follow-up. They not only capture the initial increase in risk but also the subsequent stabilization, a pattern that may reflect important underlying changes in the disease’s natural history. The models largely agree where the data is most informative (early follow-up), and diverge where it is sparse (late follow-up).

This comparison has profound implications for predictions, particularly for the extrapolation beyond the observed range of the data. The random walk models, being inherently data-driven and non-parametric in nature, are not naturally defined beyond the maximum observation time. Their predictive mechanism defaults to a simple linear projection of the last observed trend. Conversely, parametric models like the Weibull are mathematically defined for any time point \(t>0\), making them an appealing choice for extrapolation.

However, this should be used with caution. The ability of a parametric function to produce a value does not confer validity upon it. Extrapolating a functional form that already shows clear signs of misspecification within the observed data range (as the Weibull model does here) is tricky and can lead to biased results and misleading conclusions. The apparent certainty of the parametric extrapolation is an illusion if the underlying assumption is incorrect. Therefore, while parametric assumptions can be useful for interpolation and short-term prediction, we warn against extrapolating model predictions far beyond the temporal support of the data. Such predictions are not data-driven inferences but are instead pure assumptions about future behavior that cannot be validated by the data at hand.

2.3 Inference and prediction of hazard and survival functions

We may be interested in the computation of survival probabilities and survival curves for some scenarios from a fitted survival model (see Section 1.5.5). For example, let’s compute survival probabilities based on the random walk of order 2 model fitted in Section 2.2.3, conditional on the covariate trt. In order to do so, we first create a new dataset that describes our scenarios of interest. This dataset must match the dataset structure as used to fit the model we want to predict from, meaning that all covariates included in the model should be in this new dataset (2 scenarios here means 2 lines). Since we are only interested in trt = 0 vs. trt = 1. We can use the column trt to identify the scenario through the argument id. For more complex scenarios (multiple specific values of a non-binary covariate or multiple covariates), an additional column is needed to identify each scenario (i.e., with an unique value for each scenario).

NewData <- data.frame("trt"=0:1);NewData

  trt
1   0
2   1

Then we call the predict function:

P <- predict(M_rw2, NewData, horizon = 4, id="trt")

The first argument is the model from which we derive predictions, the second argument is the new dataset that gives the scenarios of interest (i.e., here we want to compute the hazard function for trt=0 and trt=1). The argument horizon gives the time until which we want to compute the predictions and the argument id gives the column name for scenarios. See Section 1.5.5 for more details about how predictions are computed with INLAjoint.

str(P)

List of 2
 $ PredL: NULL
 $ PredS:'data.frame':  100 obs. of  8 variables:
  ..$ trt           : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ eventTimes    : num [1:100] 0 0.0816 0.1633 0.2449 0.3265 ...
  ..$ Outcome       : chr [1:100] "S_1" "S_1" "S_1" "S_1" ...
  ..$ Haz_Mean      : num [1:100] 0.0586 0.062 0.0658 0.0698 0.0736 ...
  ..$ Haz_Sd        : num [1:100] 0.00837 0.00811 0.00797 0.00802 0.00807 ...
  ..$ Haz_quant0.025: num [1:100] 0.0439 0.0477 0.0516 0.0558 0.0594 ...
  ..$ Haz_quant0.5  : num [1:100] 0.058 0.0616 0.0654 0.0691 0.0732 ...
  ..$ Haz_quant0.975: num [1:100] 0.0752 0.0786 0.0825 0.0862 0.0904 ...

The object P contains the results of the predictions, it is a list of 2 items, “predL” for prediction of longitudinal outcomes (see Chapter 3) and “predS” for predictions of survival outcomes. A data frame contains the result of the predictions with the first column identifying the scenario, eventTimes for the time at which the hazard function is computed, Outcome to identify the outcome when a multi-outcome model is used and the rest are summary statistics of the corresponding hazard function. The time points are defined automatically as 50 equidistant points between 0 and horizon. It is possible to modify the number of time points with the argument NtimePoints. Moreover, one can directly give a vector of time points to predict the risk at these points specifically with the argument timePoints.

Now we’re interested in the survival function, we can compute it by setting the argument survival=TRUE in the call of the predict function:

P <- predict(M_rw2, NewData, horizon = 4, id="trt", survival=TRUE)
str(P)

List of 2
 $ PredL: NULL
 $ PredS:'data.frame':  100 obs. of  13 variables:
  ..$ trt            : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ eventTimes     : num [1:100] 0 0.0816 0.1633 0.2449 0.3265 ...
  ..$ Outcome        : chr [1:100] "S_1" "S_1" "S_1" "S_1" ...
  ..$ Haz_Mean       : num [1:100] 0.0586 0.0621 0.0658 0.0697 0.0735 ...
  ..$ Haz_Sd         : num [1:100] 0.00883 0.00853 0.00835 0.00837 0.00831 ...
  ..$ Haz_quant0.025 : num [1:100] 0.0425 0.047 0.0515 0.0557 0.06 ...
  ..$ Haz_quant0.5   : num [1:100] 0.058 0.0615 0.0648 0.0684 0.0724 ...
  ..$ Haz_quant0.975 : num [1:100] 0.077 0.0803 0.084 0.0881 0.0919 ...
  ..$ Surv_Mean      : num [1:100] 1 0.995 0.99 0.984 0.978 ...
  ..$ Surv_Sd        : num [1:100] 0 0.000693 0.001359 0.002008 0.002638 ...
  ..$ Surv_quant0.025: num [1:100] 1 0.993 0.987 0.98 0.972 ...
  ..$ Surv_quant0.5  : num [1:100] 1 0.995 0.99 0.984 0.978 ...
  ..$ Surv_quant0.975: num [1:100] 1 0.996 0.992 0.987 0.983 ...

We can see that summary statistics about the survival function are included in the object. We can plot the survival function conditional on treatment:

ggplot(P$PredS, aes(x = eventTimes, y = Surv_quant0.5, 
                    group = trt, color=trt, linetype=trt)) + 
  geom_line() + xlab("Time") + ylab("Survival probability") +
  geom_line(aes(y = Surv_quant0.025),linetype=3) + 
  geom_line(aes(y = Surv_quant0.975),linetype=3) +
  theme(legend.position="bottom") + theme_minimal()

Figure 2.12: Predicted survival probabilities conditional on treatment group. The plot shows the estimated survival curves for the treatment group (trt=1, orange dashed line) and the control group (trt=0, green solid line). Dotted lines form the 95% credible intervals.

Note that for survival functions, the integration of the hazard function is based on the decomposition of time with timePoints, it is therefore important to have enough time points between the entry in the at-risk period and the horizon to get accurate estimates of the survival function.

Here the entry in the at-risk period is time 0 but this can be modified with the argument Csurv, for example we can assume the at-risk period starts at time 2:

P2 <- predict(M_rw2, NewData, horizon=4, id="trt", survival=TRUE, Csurv=2)
ggplot(P2$PredS, aes(x = eventTimes, y = Surv_quant0.5, 
                     group = trt, color=trt, linetype=trt)) + 
  geom_line() + xlab("Time") + ylab("Survival probability") +
  geom_line(aes(y = Surv_quant0.025),linetype=3) + 
  geom_line(aes(y = Surv_quant0.975),linetype=3) +
  theme(legend.position="bottom") + theme_minimal()

Figure 2.13: Predicted survival probabilities conditional on being event-free at time 2. The survival probabilities are calculated starting from time 2, as specified by the `Csurv=2` argument.

2.4 Time-dependent covariates

It is possible to include time-dependent covariates in a survival model, they can be classified into two categories:

Exogeneous (or external) covariates remain measurable and their distribution is unchanged after the occurrence of the event.
Endogeneous (or internal) covariates’ distribution is affected by the event.

The proportional hazards model can handle exogeneous time-dependent covariates but the likelihood requires knowing the value of these covariates for all subjects at risk over the follow-up while covariates are often only measured at some occasions, therefore requiring to impute values. Most biomarkers of interest in clinical research are endogeneous variables, their values are affected by a change in the risk of occurrence of the event. For example in a cancer clinical trial, if a treatment reduces both the risk of death and the size of tumors, adjusting a survival model on the size of tumors may severely bias the effect of treatment on the risk of death (because some of the treatment effect will be captured by tumor size).

It is often of interest to estimate the effect of the treatment on the biomarker. The biomarker may be censored by the event and its value is only known at the specific time points at which it is measured. Separate models for the longitudinal and survival outcomes are prone to bias when the two outcomes are associated (Rubin (1976), Wang & Taylor (2001)). Alternatively, two-stage models can fit a longitudinal model and subsequently use the estimated longitudinal trajectory of the biomarker as a covariate in the survival model (Tsiatis et al. (1995)). They account for measurement error of the biomarker in the survival model but fail to account for informative drop-out in the longitudinal model. Ignoring informative censoring in a longitudinal analysis can lead to biased covariates’ effect estimates (Garcı́a-Hernandez et al. (2020)).

In contrast, joint models offer the advantage of dealing with informative drop-out, measurement error and missing biomarker measurements not at random in the longitudinal and survival regression models (Ibrahim et al. (2010)). Being able to handle time-dependent covariates in survival models is necessary to fit most joint models (see Chapter 4).

With INLA, one can take advantage of the decomposition of the follow-up into intervals in order to account for time-varying covariates. These intervals can be arbitrary small without changing the likelihood of the model as we can reconstruct the follow-up with left truncation and right censoring. Let’s illustrate this with an example based on simulated data from Section 2.2.2. We consider the Weibull parametric baseline hazard but the same applies to other baseline hazard options. The standard model will assume each data point has an unique contribution to the likelihood (as described in Section 2.2.2):

# 1 observation per individual
M_weib_1 <- inla(inla.surv(eventTimes, eventIndic) ~ trt,
                 data=datWei, family = "weibullsurv")
round(rbind(M_weib_1$summary.fixed[,1:2], 
            M_weib_1$summary.hyperpar[,1:2]),4)

                                   mean     sd
(Intercept)                     -1.6067 0.0779
trt                             -0.9884 0.0933
alpha parameter for weibullsurv  1.2811 0.0505

length(M_weib_1$.args$data[[1]]) # original data size

[1] 1000

We can decompose the follow-up with 15 intervals of the same length between time 0 and the maximum follow-up time:

seq(0, max(datWei$eventTimes), len=16)

 [1] 0.0000000 0.2666667 0.5333333 0.8000000 1.0666667 1.3333333
 [7] 1.6000000 1.8666667 2.1333333 2.4000000 2.6666667 2.9333333
[13] 3.2000000 3.4666667 3.7333333 4.0000000

This decomposition is similar to the decomposition used for the baseline hazard with random walk to replicate the Cox model introduced in Section 2.2.3. We can therefore use the same function to set it up:

# 15 observations per individual
Itv <- inla.coxph(inla.surv(eventTimes, eventIndic) ~ trt, 
                  data=datWei, control.hazard=list(n.intervals=15))
Itv$data$EventTime <- Itv$data$baseline.hazard+Itv$data$E..coxph

We can compare the data for first individual in the original format:

M_weib_1$.args$data[1,]

  eventTimes eventIndic trt
1   2.427109          1   0

and in the format where the follow-up is decomposed:

head(Itv$data[c("baseline.hazard", "EventTime", "y..coxph")], 10)

   baseline.hazard EventTime y..coxph
1        0.0000000 0.2666667        0
2        0.2666667 0.5333333        0
3        0.5333333 0.8000000        0
4        0.8000000 1.0666667        0
5        1.0666667 1.3333333        0
6        1.3333333 1.6000000        0
7        1.6000000 1.8666667        0
8        1.8666667 2.1333333        0
9        2.1333333 2.4000000        0
10       2.4000000 2.4271092        1

Both data contain the same amount of information, the first basically means “Individual 1 was observed for 2.43 units of time until he observed the event” while the second decomposes into “Individual 1 was observed for 0.27 units of time without observing the event” followed by “Individual 1 was observed between time 0.27 and 0.53 without observing the event”, …, “Individual 1 was observed between time 2.40 and 2.43 until the event was observed”. Since the amount of data information that contributes to the likelihood is the same between these two data formats, we can compare the results (note the use of truncation in inla.surv to be able to start the follow-up from where it stopped in the expanded format):

M_weib_15 <- inla(inla.surv(time = Itv$data$EventTime,
                            truncation = Itv$data$baseline.hazard,
                            event = Itv$data$y..coxph) ~ trt, 
                  family="weibullsurv", data=Itv$data)
round(rbind(M_weib_15$summary.fixed[,1:2], 
            M_weib_15$summary.hyperpar[,1:2]),4)

                                   mean     sd
(Intercept)                     -1.6067 0.0779
trt                             -0.9884 0.0933
alpha parameter for weibullsurv  1.2811 0.0505

# data size after decomposition into 15 intervals
print(paste0("N = ", length(Itv$data[[1]])))

[1] "N = 11140"

The results are identical but in the second format, we can now include a covariate with values specific to each interval. The number of intervals can be arbitrarily large to capture smooth evolution of covariates over time. We can do the same decomposition with 100 cutpoints:

# 100 observations per individual
Itv2 <- inla.coxph(inla.surv(eventTimes, eventIndic) ~ trt, 
                   data=datWei, control.hazard=list(n.intervals=100))
Itv2$data$EventTime <- Itv2$data$baseline.hazard+Itv2$data$E..coxph
M_weib_100 <- inla(inla.surv(time = Itv2$data$EventTime,
                             truncation = Itv2$data$baseline.hazard,
                             event = Itv2$data$y..coxph) ~ trt, 
                   family="weibullsurv", data=Itv2$data)
round(rbind(M_weib_100$summary.fixed[,1:2], 
            M_weib_100$summary.hyperpar[,1:2]),4)

                                   mean     sd
(Intercept)                     -1.6067 0.0779
trt                             -0.9884 0.0933
alpha parameter for weibullsurv  1.2811 0.0505

# data size after decomposition into 100 intervals
print(paste0("N = ", length(Itv2$data[[1]])))

[1] "N = 72783"

One can now clearly see how this allows to include time-dependent covariates by adding the covariate of interest in this data format where the value can differ for each interval. Moreover, it is possible to match the decomposition of the follow-up into intervals with measurement occasions of the time-dependent covariate to assume an (often unrealistic) constant value between measurement occasions to avoid imputation. This is done with the argument cutpoints.

2.5 Interval censoring

Interval censoring occurs when the exact time of an event is unknown but is known to fall within a specific time window (e.g., between two clinic visits or lab tests). Unlike right-censoring (where an event is only known not to have occurred by the last observation), interval censoring arises when events are detected at intermittent measurements.

Interval censoring is available with parametric baseline hazard functions but requires additional steps with random walks as the integral over the interval is not straightforward and needs sampling to be taken into account properly (see for example Pan (2000)).

We can simulate interval censored data with the R package iCenReg (Anderson-Bergman (2017)), and then estimate the model with INLAjoint:

library(icenReg) # for data simulation
# Simulate data from a ph model where covariates 
# x1 and x2 have effect b1 and b2, respectively.
sim_data <- simIC_weib(n = 5000, b1 = 0.5, b2 = -0.5)
sim_data$ev <- ifelse(sim_data$u==Inf, 0, 3)

M_intcen <- joint(inla.surv(time=l, time2=u, event=ev) ~ x1 + x2, 
                  dataSurv = sim_data, basRisk="weibullsurv")
summary(M_intcen)


Survival outcome
                   mean     sd 0.025quant 0.5quant 0.975quant
Weibull (shape)  1.9915 0.0409     1.9120   1.9912     2.0724
Weibull (scale)  0.2439 0.0086     0.2272   0.2438     0.2612
x1               0.5503 0.0340     0.4837   0.5503     0.6170
x2              -0.4627 0.0196    -0.5013  -0.4627    -0.4243

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -3797.013                             -3797.403 

Deviance Information Criterion:  7561.193
Widely applicable Bayesian information criterion:  7561.132
Computation time: 0.69 seconds

The true values of the two covariates effect x1 and x2 (0.5 and -0.5, respectively) are well recovered by the model. In this example, we can see how interval censoring is handled through the specification of the arguments time and time2 that corresponds to the left and right bounds of the interval, respectively. The event indicator is set to 3 which means interval censored observation. It is possible to have a combination of interval censored observations and observed events (event=1) as well as right censored observations (event=0). More examples are available in Van Niekerk & Rue (2022).

2.6 Stratified proportional hazards model

Sometimes the baseline hazard may differ across subgroups but the relative effect of covariates remains identical within each stratum, in this case one can use the stratified proportional hazards model which allows for greater flexibility by accounting for heterogeneity across distinct populations. We can simulate data where a binary covariate strata defines two subgroups with each a specific baseline hazard. For strata=0, the baseline hazard is defined as Weibull with a shape parameter equal to 1.5, meaning that the baseline hazard increases over time while for strata=1 the shape parameter is set to 0.7 so that the baseline hazard decreases over time:

n=1000 # number of individuals
gamma=-1 # treatment effect
baseScale=0.5 # baseline hazard scale strata 1
baseShape=1.5 # baseline hazard shape strata 1
baseScale2=0.5 # baseline hazard scale strata 2
baseShape2=0.7 # baseline hazard shape strata 2
followup=4 # study duration
trt <- rbinom(n, 1, 0.5) # trt covariate
strata <- as.integer(rbinom(n, 1, 0.5))+1
u <- runif(n) # uniform distribution for survival times generation
eventTimes <- ifelse(strata==1,
                  (-(log(u))/(baseScale*exp(gamma*trt)))^(1/baseShape),
                  (-(log(u))/(baseScale2*exp(gamma*trt)))^(1/baseShape2))
eventIndic <- as.numeric(eventTimes<followup) # eventTimes indicator
eventTimes <- pmin(eventTimes, followup)
datStr <- data.frame(eventTimes, eventIndic, strata, trt) # dataset
summary(datStr)

   eventTimes          eventIndic        strata           trt      
 Min.   :0.0005999   Min.   :0.000   Min.   :1.000   Min.   :0.00  
 1st Qu.:0.8365166   1st Qu.:0.000   1st Qu.:1.000   1st Qu.:0.00  
 Median :1.9359808   Median :1.000   Median :2.000   Median :1.00  
 Mean   :2.1874111   Mean   :0.718   Mean   :1.509   Mean   :0.52  
 3rd Qu.:4.0000000   3rd Qu.:1.000   3rd Qu.:2.000   3rd Qu.:1.00  
 Max.   :4.0000000   Max.   :1.000   Max.   :2.000   Max.   :1.00

We can fit a stratified proportional hazard model by setting the argument strata in control options. We can directly plot all stratum-specific baseline hazards curves with the plot function:

tp <- seq(0,4,len=100)
M_str <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
               dataSurv=datStr, basRisk = "rw2", NbasRisk = 50,
               control=list(strata=list("strata"))) # strata argument
plot(M_str)$Baseline

Figure 2.14: Estimated baseline hazard functions from a stratified proportional hazards model. The left plot (S1.1) and right plot (S1.2) show the distinct baseline hazards estimated for the two strata. In each plot, the solid line represents the posterior mean of the baseline hazard, and the shaded region is the 95% credible interval. The dashed red lines indicate true curves used to generate data observations. The plot demonstrates the model’s ability to capture different, flexible baseline hazard profiles for each stratum.

We can see that we captured properly the strata-specific baseline hazard functions with random walks sharing a common variance hyperparameter. We can verify the fixed effect of treatment in the summary:

summary(M_str)


Survival outcome
                            mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)  0.0675 0.0536     0.0093   0.0529     0.2124
trt                      -1.0702 0.0787    -1.2244  -1.0702    -0.9159

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -4181.794                             -4182.187 

Deviance Information Criterion:  7797.323
Widely applicable Bayesian information criterion:  7797.186
Computation time: 1.84 seconds

The effect of treatment here is assumed to be the same for both strata and match with the true value used for simulations (-1).

2.7 Cure fraction models

2.7.1 Standard mixture cure model

The mixture cure model is used to analyze survival data with a cure fraction, meaning a proportion of the population is not at risk of experiencing the event of interest. In this framework, the population is conceptualized as a mixture of two distinct subgroups: “cured” individuals, who will never experience the event, and “uncured” individuals, who remain at risk. The survival function for the entire population is modeled as a weighted average of the survival functions of these two groups.

Let \(p(C=1)\) denote the probability of being cured and \(\lambda_{i}(t|C=0)\) the survival for the part of the population that is not cured, the mixture cure model is defined by:

\[\left\{ \begin{array}{ c @{{}{}} l @{{}{}} r } \textrm{Logit}(p(C=1)) &= \beta_{0} + \beta_{1}\cdot X_{1i} + \beta_{2}\cdot X_{2i} & \text{(probability of being cured)} \\ \lambda_{i}(t|C=0)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_1\cdot Z_{1i} + \gamma_2\cdot Z_{2i}\right) & \text{(survival for non-cured)} \end{array} \right.\]

The model includes a logistic regression model for the probability of being cured and a proportional hazards model for the time to event for the uncured individuals.

The cure fraction can be included in any survival model and can handle both right and interval censoring. However, only parametric baseline hazard can be defined at the moment for mixture cure models and the covariates that affect the probability of being cured can be associated to fixed effects only (i.e., no random effects in the logistic part). We can simulate data from this model with the R package mixcurelps (Gressani et al. (2022)):

# library(mixcurelps) # used to simulate data
# since it causes install error on some OS, the following code replicates
# the simulation code from mixcurelps without requiring the library
n <- 5000 ; W1=0.25 ; W2=1.45
beta0=-0.70 ; beta1=1.15 ; beta2=-0.95 ; gamma1=-0.10 ; gamma2=0.25
X <- cbind(1, rnorm(n), rbinom(n, 1, 0.5))
B <- rbinom(n, 1, (1 + exp(X %*% c(beta0, beta1, beta2)))^-1)
Z <- cbind(rnorm(n), rbinom(n, 1, 0.4))
Tlat <- (-log(runif(n)) / (W1 * exp(Z %*% c(gamma1, gamma2))))^(1/W2)
Tlat[Tlat > 8] <- 8; Tlat[B == 0] <- 20000; C <- pmin(rexp(n, 0.16), 11)
tobs <- pmin(Tlat, C); delta <- as.numeric(Tlat <= C)
simdat <- setNames(data.frame(tobs, delta, X, Z),
                   c("tobs","event","Intercept","x1","x2","z1","z2"))
M_mc <- joint(formSurv=inla.surv(tobs, event,
                                 cure=cbind("Int"=1, 
                                            "x1"=x1, 
                                            "x2"=x2)) ~ z1 + z2,
              basRisk = "weibullsurv", dataSurv=simdat)
summary(M_mc)


Survival outcome
                   mean     sd 0.025quant 0.5quant 0.975quant
Int(cure)       -0.6632 0.0615    -0.7846  -0.6631    -0.5426
x1(cure)         1.0800 0.0549     0.9729   1.0796     1.1890
x2(cure)        -0.9211 0.0922    -1.1028  -0.9210    -0.7396
Weibull (shape)  1.4784 0.0237     1.4329   1.4780     1.5264
Weibull (scale)  0.2406 0.0088     0.2238   0.2405     0.2582
z1              -0.1377 0.0228    -0.1824  -0.1377    -0.0929
z2               0.2488 0.0447     0.1613   0.2488     0.3364

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -6021.163                             -6018.953 

Deviance Information Criterion:  11983.67
Widely applicable Bayesian information criterion:  11983.62
Computation time: 1.16 seconds

The argument cure in the definition of the outcome with inla.surv allows to define covariates that affect the probability of being cured. Here we set an intercept and the effect of the two covariates X1 and X2. Figure 2.15 shows the posterior density of all the model parameters:

par(mar = c(2,2,2,1), xpd=FALSE, cex.lab=1, mfrow=c(2,4))
plot(M_mc$marginals.fixed[[1]], type="l", main="log Weibull scale")
plot(M_mc$marginals.hyperpar[[1]], type="l", main="Weibull shape")
plot(M_mc$marginals.hyperpar[[2]], type="l", main=expression(beta[0]))
plot(M_mc$marginals.hyperpar[[3]], type="l", main=expression(beta[1]))
plot(M_mc$marginals.hyperpar[[4]], type="l", main=expression(beta[2]))
plot(M_mc$marginals.fixed[[2]], type="l", main=expression(gamma[1]))
plot(M_mc$marginals.fixed[[3]], type="l", main=expression(gamma[2]))
plot.new()
legend("center", c("Posterior density", "True value"), 
       lty=c(1,2), col=c(1,2), bty = "n")

Figure 2.15: Posterior marginal distributions for the parameters of a mixture cure model. The plots show posteriors for the Weibull parameters, cure probability fixed effects (\(\beta_0, \beta_1, \beta_2\)), and survival component fixed effects (\(\gamma_1, \gamma_2\)). Dashed red lines indicate true values.

With mixture cure models, we are often interested in the probability of being cured. Since we use a logistic regression model with a logit link, we need to apply the inverse logit function to the linear predictor to get probabilities. For example for the probability of being cured for the reference individual (i.e., X1 = X2 = 0):

inla.zmarginal(inla.tmarginal(function(x) 1/(1+exp(-x)), 
               M_mc$marginals.hyperpar$`beta1 for Weibull-Cure`))

Mean            0.340148 
Stdev           0.0136962 
Quantile  0.025 0.313543 
Quantile  0.25  0.330779 
Quantile  0.5   0.340026 
Quantile  0.75  0.349356 
Quantile  0.975 0.367318

For the situations where X1 \(\neq\) 0 and/or X2 \(\neq\) 0, we can sample from the posterior distributions of the model and compute the corresponding linear combinations for each sample (it is also possible to setup linear combinations directly in R-INLA but this is not presented in this book).

# First sample values of intercept, X1 and X2's fixed effects
# Note how for mixture cure, these fixed effects are considered as 
# hyperparameters for internal computations purpose
S <- inla.hyperpar.sample(10^4, M_mc)[, 2:4]
# Then we can compute the proba of being cured for different scenarios
list("X1 = 0, X2 = 0" = summary(1/(1+exp(-S[,1]))),
     "X1 = 0, X2 = 1" = summary(1/(1+exp(-rowSums(S[,c(1,3)])))),
     "X1 = -2, X2 = 1" = summary(1/(1+exp(-(S[,1]-2*S[,2]+S[,3])))))

$`X1 = 0, X2 = 0`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.2869  0.3306  0.3401  0.3400  0.3492  0.3910 

$`X1 = 0, X2 = 1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1357  0.1627  0.1699  0.1703  0.1776  0.2160 

$`X1 = -2, X2 = 1`
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.01274 0.02081 0.02311 0.02337 0.02564 0.04149

Here we can see the different probabilities of being cured conditional on different combinations of covariates values.

2.7.2 Defective Gompertz distribution

While the mixture model offers an intuitive framework, it conceptualizes the population as two separate groups, requiring an additional logistic model for the cure probability. The defective Gompertz distribution provides a more parsimonious and integrated alternative. Instead of modeling a separate cure parameter, this approach includes the cure fraction directly into the survival function itself.

The defective Gompertz distribution reframes the cure fraction as an asymptotic property of the survival distribution. This unified framework can lead to more stable estimation, as traditional mixture models may suffer from convergence issues, particularly when the data suggest a large/low cure fraction. Moreover, the defective Gompertz distribution reduces to the standard Gompertz when there is no evidence for a cure fraction in the observed data (assuming the Gompertz distribution is flexible enough to capture the true hazard in the population). This allows to investigate the presence (or not) of a cure fraction in a population while the traditional cure fraction model fails to converge if there is no evidence for a cure fraction in the data (because the probability of being cured is 0, which corresponds to an infinite value in the scale of the logistic model). The Gompertz distribution is characterized by a hazard function that changes exponentially over time. Its probability density function is given by:

\[f(t) = \mu \cdot \exp(\alpha\cdot t) \exp\left(-\frac{\mu}{\alpha}(\exp(\alpha\cdot t)-1)\right)\] for \(t>0\), where \(\alpha\) is the shape parameter and \(\mu>0\) is the scale parameter. The corresponding survival function is: \[S(t) = \exp\left(-\frac{\mu}{\alpha}(\exp(\alpha\cdot t)-1)\right)\] The key to this approach lies in the shape parameter, \(\alpha\), which is not constrained to be positive in the defective formulation of the Gompertz distribution. When \(\alpha\) is negative, the hazard function decreases over time and approaches zero, causing the survival function to become defective (i.e., it does not converge to zero). Instead, it does reach a plateau at a value greater than zero, which represents the cure fraction.

The cure proportion, \(p\), is calculated as the limit of the survival function as \(t \rightarrow \infty\) when \(\alpha < 0\): \[p = \lim_{t\to\infty} S(t) = \exp\left(\frac{\mu}{\alpha}\right)\] This formula shows that the cure fraction is an intrinsic property derived directly from the distribution’s parameters, not an external parameter estimated separately. The parameter \(\alpha\) controls both the shape of the hazard curve and the existence of a cure fraction. If the credible interval over \(\alpha\) only includes negative values, the data support the presence of a cure fraction in the population.

We can simulate from this model, assuming a simple model including a cure fraction and a single fixed effect (trt):

gen.cure.gompertz <- function(n, a, b, p) {
  if (length(b) == 1) b <- rep(b, n)
  if (length(p) == 1) p <- rep(p, n)
  rm <- rbinom(n=n, size=1, prob=1-p)
  t <- rep(NA, n)
  for(i in 1:n){
    t[i] <- ifelse(rm[i]==0, Inf, 
         log((-(a/b[i])*log(1-runif(n=1, min=0, max=1-p[i]))) + 1)*(1/a))
  }
  t_finite <- ifelse(t==Inf, 0, t)
  u2 <- runif(n=n, 0, max(t_finite))
  t2 <- pmin(t, u2)
  delta <- ifelse(t<u2, 1, 0)
  return(cbind(t2, delta))
}
a_par <- -0.7 # defective parameter (negative for cure fraction)
b0 <- -0.8 # baseline log-scale parameter
eta <- b0 + gamma*trt # linear predictor for treatment effect
b_par <- exp(eta) # scale parameter
p_par <- exp(b_par/a_par) # cure fraction
surv_data <- gen.cure.gompertz(n=1000, a=a_par, b=b_par, p=p_par)
eventTimes <- surv_data[,1]
eventIndic <- surv_data[,2]
datDgomp <- data.frame(eventTimes, eventIndic, trt) # dataset
summary(datDgomp)

   eventTimes         eventIndic         trt      
 Min.   :0.002092   Min.   :0.000   Min.   :0.00  
 1st Qu.:0.684837   1st Qu.:0.000   1st Qu.:0.00  
 Median :2.170316   Median :0.000   Median :1.00  
 Mean   :2.860677   Mean   :0.264   Mean   :0.52  
 3rd Qu.:4.725078   3rd Qu.:1.000   3rd Qu.:1.00  
 Max.   :7.805085   Max.   :1.000   Max.   :1.00

Fitting the model with INLAjoint is straightforward. We just need to specify the argument basRisk = "dgompertzsurv" to use the defective Gompertz likelihood for the survival submodel:

M_dgomp <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                 dataSurv=datDgomp, basRisk = "dgompertzsurv")
summary(M_dgomp)


Survival outcome
                              mean     sd 0.025quant 0.5quant 0.975quant
Defective Gompertz (alpha) -0.7547 0.0639    -0.8830  -0.7536    -0.6327
Defective Gompertz (scale)  0.4204 0.0398     0.3472   0.4185     0.5036
trt                        -0.8043 0.1283    -1.0559  -0.8043    -0.5526

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -763.2737                             -763.6602 

Deviance Information Criterion:  1505.405
Widely applicable Bayesian information criterion:  1505.564
Computation time: 0.41 seconds

The model seems to successfully recover the true parameters used to generate the data, we can plot the posterior marginals of all the model parameters (Figure 2.16):

library(ggpubr)
P1 <- plot(M_dgomp)$Outcomes[[1]]
P2 <- plot(M_dgomp)$ParamBaseline
ggarrange(P1, P2, nrow=1, widths = c(1, 2))

Figure 2.16: Posterior marginal distributions for the parameters of a defective Gompertz cure model. The plots display the posterior densities for the treatment effect (trt_S1), the defective parameter (Defective Gompertz (alpha)_S1), and the scale parameter (Defective Gompertz (scale)_S1). Dashed red lines indicate the true values used for data generation.

More details on this model is available in Neto et al. (2025).

2.8 Competing risks

Unlike the survival models presented before, which focuses on a single event type, competing risks models account for the risk of occurrence of one of multiple competing events (i.e., each event prevents the others from happening). These models are particularly valuable in clinical trials and observational studies, where patients may experience various outcomes such as disease recurrence, death from different causes, or other significant events.

By modeling these competing risks, one can gain insights into the factors influencing each event type and make more informed decisions. The primary goal of competing risks models is to estimate the probability of each type of event occurring over time, while considering the influence of other competing events. This is achieved by modeling the cause-specific hazard functions, which describe the instantaneous risk of each event type.

We can start by simulating data where each individual can experience two competing events. To simulate data for a competing risks model, we assume that for each individual, there exists a latent time to event for each of the competing events. In the observed data, we only consider the first event for each individual, assuming one of the event occurred before the maximum follow-up considered. When one of the competing event occurs, individuals are censored for the remaining events.

n=300 # number of individuals
gamma_11=0.5 # treatment effect
gamma_12=-0.5
baseScale1=0.5 # baseline hazard scale
baseScale2=0.3
followup=4 # study duration
trt <- as.integer(1:n %in% sort(sample(1:n, n/2)))
u1 <- runif(n) # uniform distribution for survival times generation
u2 <- runif(n)
Times1 <- -(log(u1)/(baseScale1*exp(trt*gamma_11))) # event times
Times2 <- -(log(u2)/(baseScale2*exp(trt*gamma_12)))
eventIndic1 <- as.numeric(Times1<followup & Times1<Times2) 
eventIndic2 <- as.numeric(Times2<followup & Times2<Times1)
eventTimes <- pmin(Times1, Times2, followup)
datCR <- data.frame(id=1:n, eventTimes, eventIndic1, eventIndic2, trt)
head(datCR) # dataset

  id eventTimes eventIndic1 eventIndic2 trt
1  1 0.40960964           0           1   1
2  2 0.72556698           0           1   0
3  3 0.92051015           0           1   0
4  4 0.14563602           0           1   0
5  5 0.02034906           0           1   0
6  6 1.66715948           0           1   0

Here, we have two eventIndic columns that indicates whether each competing event occurred. A covariate trt is included with an effect on both cause-specific risks.

Fitting a model that includes multiple submodels in INLA is done through data augmentation, for example for the first 3 individuals:

datCR_ <- datCR[datCR$id%in%c(1:3),]
dat_CR_1 <- data.frame(eventTimes1=datCR_$eventTimes,
                       eventIndic1=datCR_$eventIndic1)
dat_CR_2 <- data.frame(eventTimes2=datCR_$eventTimes, 
                       eventIndic2=datCR_$eventIndic2)
inla.rbind.data.frames(dat_CR_1, dat_CR_2)

  eventTimes1 eventIndic1 eventTimes2 eventIndic2
1   0.4096096           0          NA          NA
2   0.7255670           0          NA          NA
3   0.9205102           0          NA          NA
4          NA          NA   0.4096096           1
5          NA          NA   0.7255670           1
6          NA          NA   0.9205102           1

We can see how the data is augmented by having each line corresponding to an unique contribution to the likelihood of one of the submodels. The outcome is then set as a list of two outcomes (or a data frame with two columns) while the effect of covariates is specific to each outcome as described above. The default intercept has to be removed to be replaced by an outcome-specific intercept. Beyond those key points, there are many details that makes the use of R-INLA cumbersome for these models. We therefore focus on INLAjoint for the rest of the chapter since it facilitates the use of INLA and does all these tricks to fit multiple likelihood models internally.

In order to fit this model with INLAjoint, we have to set a list of formulas for each cause-specific risk instead of a single formula. We directly use the generated data dat_CR as all the data formatting for R-INLA is automatically done internally:

M_cr <- joint(formSurv = list(inla.surv(eventTimes, eventIndic1) ~ trt,
                              inla.surv(eventTimes, eventIndic2) ~ trt),
              dataSurv=datCR, basRisk = "rw2")

When including multiple survival submodels, it is possible to supply only one dataset assuming all the variables from all formulas are present in this dataset and it is also possible to provide a list of datasets specific for each outcome (in the same order as the list of formulas).

summary(M_cr)


Survival outcome (S1)
                              mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S1 0.0143 0.0362     0.0001   0.0035     0.1010
trt_S1                      0.6318 0.1451     0.3473   0.6318     0.9164

Survival outcome (S2)
                               mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S2  0.0213 0.0361     0.0003   0.0090     0.1201
trt_S2                      -0.6251 0.2317    -1.0794  -0.6251    -0.1707

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                             -1262.89                              -1262.66 

Deviance Information Criterion:  2447.845
Widely applicable Bayesian information criterion:  2448.151
Computation time: 0.7 seconds

The summary shows results for the two survival outcomes, where we can see that both submodels capture the true values of treatment effect (i.e., 0.5 and -0.5, respectively). We can have a look at the baseline hazards captured in the following plot:

plot(M_cr)$Baseline + ylim(0,2)

Figure 2.17: Posterior marginal distributions for the cause-specific baseline hazards for Event 1 (`trt_S1`) and Event 2 (`trt_S2`) in a competing risks model. The strainght line and shaded region indicates posterior mean and 95% credible interval while the red dashed line indicates true baseline hazard used for data generation (assumed constant for simplicity).

Now suppose that we want to compute the probability of experiencing a specific event over time, while accounting for the presence of other competing events. This can be obtained with the cumulative incidence function and it can be computed directly with the predict function in INLAjoint. We first create a new dataset for the scenarios of interest, here we ask for the curves for trt = 0 and trt = 1:

ND <- data.frame("scenario"=c("trt = 0", "trt = 1"), "trt"=0:1);ND

  scenario trt
1  trt = 0   0
2  trt = 1   1

Then we can call the predict function:

P <- predict(M_cr, ND, horizon=followup, id="scenario", CIF=TRUE)
str(P)

List of 2
 $ PredL: NULL
 $ PredS:'data.frame':  200 obs. of  13 variables:
  ..$ scenario      : Factor w/ 2 levels "trt = 0","trt = 1": 1 1 1 1 1 1 1 1 1 1 ...
  ..$ eventTimes    : num [1:200] 0 0.0816 0.1633 0.2449 0.3265 ...
  ..$ Outcome       : chr [1:200] "S_1" "S_1" "S_1" "S_1" ...
  ..$ Haz_Mean      : num [1:200] 0.425 0.423 0.42 0.418 0.416 ...
  ..$ Haz_Sd        : num [1:200] 0.0655 0.0622 0.0596 0.0578 0.056 ...
  ..$ Haz_quant0.025: num [1:200] 0.313 0.315 0.315 0.321 0.324 ...
  ..$ Haz_quant0.5  : num [1:200] 0.419 0.418 0.415 0.412 0.409 ...
  ..$ Haz_quant0.975: num [1:200] 0.564 0.559 0.545 0.54 0.543 ...
  ..$ CIF_Mean      : num [1:200] 0 0.0324 0.0628 0.0911 0.1177 ...
  ..$ CIF_Sd        : num [1:200] 0 0.00461 0.0086 0.01209 0.01513 ...
  ..$ CIF_quant0.025: num [1:200] 0 0.0244 0.0478 0.0701 0.0914 ...
  ..$ CIF_quant0.5  : num [1:200] 0 0.0322 0.0622 0.0905 0.1167 ...
  ..$ CIF_quant0.975: num [1:200] 0 0.0423 0.0814 0.1168 0.1488 ...

The new argument CIF is set to TRUE in order to compute the cumulative incidence function. CIF should be preferred over cause-specific survival functions for competing risks models as the latter corresponds to the survival probability of an event without taking into account the risk of observing the other events, which is not realistic.

We can now plot the CIF conditional on trt:

ggplot(P$PredS, aes(x=eventTimes, y=CIF_quant0.5, 
                    group=Outcome, colour=Outcome, linetype=Outcome)) +
  facet_wrap(~scenario) + geom_line() + ylim(0,1) + theme_minimal() +
  geom_ribbon(aes(ymin=CIF_quant0.025, ymax=CIF_quant0.975), alpha=0.1) + 
  ylab("Event probability") +  xlab("Time") + 
  ggtitle("Cumulative Incidence Functions")

Figure 2.18: Cumulative Incidence Functions (CIFs) for competing events. The plot shows the estimated CIFs for each treatment group where Event 1 is represented by the solid lines while Event 2 is represented by the dashed lines.

Here, outcome S_1 corresponds to the first event type while S_2 is the second. For example we can see that for indidividuals with trt = 0, the probability of the first event converges towards ~50% over time while for individuals with trt = 1, this probability converges towards ~75%. Note that time-dependent covariates can be included in competing risks models following the data format described in Section 2.4 and for joint models with shared time-dependent components as described in Chapter 4.

2.9 Multi-state models

Multi-state models are used to analyze complex event histories where individuals can transition between multiple states over time. These models extend traditional survival analysis by allowing for more than two states, capturing the dynamics of transitions. In multi-state models, each state represents a distinct condition or stage, and transitions between states are governed by hazard functions that describe the instantaneous risk of moving from one state to another. These models can incorporate various types of transitions, including forward, reverse and recurrent events. Note that competing risks can be pictured as a specific case of multi-state models. Another specific and commonly used type of multi-state model is the illness-death model. This model typically consists of three states:

Healthy (or initial state): Individuals start in this state, representing the absence of disease or a baseline condition.
Illness (or intermediate state): Individuals transition to this state upon experiencing a disease or health event of interest.
Death (or absorbing state): This final state represents the end of follow-up due to death, from which no further transitions are possible.

The illness-death model allows for the estimation of transition intensities between these states, including the risk of developing the illness from the healthy state and the risk of death from either the healthy or illness state. This model is valuable for understanding the natural history of diseases and evaluating the impact of interventions. We can represent the illness-death model with a directed diagram using the graph library (Gentleman et al. (2024)):

library(graph) # just for the visual representation of multi-state
nod <- c("Healthy (1)", "Illness (2)", "Death (3)")
plot(graphNEL(nodes=nod, 
              edgeL=list("Healthy (1)"=list(edges=2:3),
                         "Illness (2)"=list(edges=3),
                         "Death (3)"=list()),
              edgemode='directed'),
     nodeAttrs=list(shape=sapply(nod, function(x) x="box"),
                    fontsize=sapply(nod, function(x) x=12),
                    height=sapply(nod, function(x) x=1),
                    width=sapply(nod, function(x) x=2)))

Figure 2.19: Illness-death multi-state model structure. This diagram illustrates the three states (Healthy, Illness, Death) and the possible transitions between them.

Each transition is represented by an arrow, and all individuals start in the Healthy (1) state. We need to define a survival model for each of these transitions. The model structure is given by the following: \[\left\{ \begin{array}{ c @{{}{}} r @{{}{}} r } \lambda_{i,1\rightarrow 2}(t)&=\lambda_{0,1\rightarrow 2}(t)\ \textrm{exp}\left(\gamma_{1\rightarrow 2}\cdot trt_i\right) \qquad & \textbf{S1}\\ \lambda_{i,1\rightarrow 3}(t)&=\lambda_{0,1\rightarrow 3}(t)\ \textrm{exp}\left(\gamma_{1\rightarrow 3}\cdot trt_i\right) \qquad & \textbf{S2}\\ \lambda_{i,2\rightarrow 3}(t)&=\lambda_{0,2\rightarrow 3}(t)\ \textrm{exp}\left(\gamma_{2\rightarrow 3}\cdot trt_i\right) \qquad & \textbf{S3} \end{array} \right.\] where \(trt_i\) is a binary covariate with a different fixed effect on each transition intensity captured by the \(\gamma\) effects. Note that the baseline hazard function is transition-specific (one baseline risk for each survival model), although it is possible with INLA to use the same baseline for all transitions, this is usually done for computational ease and thus not considered relevant here.

The simulation of data from a multi-state model is similar to competing risks (since it is a special case of multi-state), although it requires to simulate data as a sequence where moving from one state to the next is based on which transition occurs first. For an illness-death model starting in the Healthy state (State 1), the simulation proceeds as follows:

For each individual, we simulate latent transition times for all possible exits from State 1. In this case, we simulate \(t_{1\rightarrow 2}\) (time from Healthy to Illness) and \(t_{1\rightarrow 3}\) (time from Healthy to Death). These are drawn from their respective transition-specific hazard models. The time of the first transition is the minimum of these latent times. If the individual transitioned to the Illness state (State 2), the simulation continues from this new state. A new latent time for the only possible exit, \(t_{2\rightarrow 3}\) (time from Illness to Death), is simulated. If the first transition was to the Death state (State 3), the process stops as it is an absorbing state. We assume that the hazard for a transition out of a state depends on the time elapsed since entering the current state (semi-markov assumption).

n=500 # number of individuals
ids <- as.factor(1:n)
gamma_11=1.2 # treatment effect 1-2
gamma_12=-0.2 # treatment effect 1-3
gamma_13=-1.5 # treatment effect 2-3
baseScale1=0.5 # baseline hazard scale 1-2 (exponential)
baseScale2=0.3 # baseline hazard scale 1-3
baseScale3=0.1 # baseline hazard scale 2-3
followup=4 # censoring threshold for first transition
trt <- as.integer(1:n %in% sort(sample(1:n, n/2)))
u1 <- runif(n) # uniform distribution for survival times generation
u2 <- runif(n)
u3 <- runif(n)
eventTimes1 <- -(log(u1)/(baseScale1*exp(trt*gamma_11)))
eventTimes2 <- -(log(u2)/(baseScale2*exp(trt*gamma_12)))
eventTimes1 <- pmin(eventTimes1, followup)
eventTimes2 <- pmin(eventTimes2, followup)
eventTimes3 <- -(log(u3)/(baseScale3*exp(trt*gamma_13))) + eventTimes1
eventTimes3 <- pmin(eventTimes3, followup+eventTimes1)
eventStart3 <- eventTimes1
eventIndic1 <- as.numeric(eventTimes1<followup & eventTimes1<eventTimes2)
eventIndic2 <- as.numeric(eventTimes2<followup & eventTimes2<eventTimes1)
eventIndic3 <- as.numeric(eventTimes3<followup+eventTimes1)
eventTimes1 <- pmin(eventTimes1, eventTimes2, eventTimes3)
eventTimes2 <- pmin(eventTimes2, eventTimes1, eventTimes3)
datMS1 <- data.frame(id=as.integer(ids), eventTimes1, eventIndic1, trt)
datMS2 <- data.frame(id=as.integer(ids), eventTimes2, eventIndic2, trt)
datMS3 <- data.frame(id=as.integer(ids), eventTimes3, eventIndic3, 
                       eventStart3, trt)[-which(eventIndic2==1),]
head(datMS1)

  id eventTimes1 eventIndic1 trt
1  1   2.7496143           1   0
2  2   3.1436125           1   0
3  3   0.3985059           1   0
4  4   0.7848916           1   1
5  5   0.3662994           1   0
6  6   0.1787169           0   1

This first dataset contains the information for the transition from Healthy (1) to Illness (2) state. For example the second individual observe this transition at time ~3.14.

head(datMS2)

  id eventTimes2 eventIndic2 trt
1  1   2.7496143           0   0
2  2   3.1436125           0   0
3  3   0.3985059           0   0
4  4   0.7848916           0   1
5  5   0.3662994           0   0
6  6   0.1787169           1   1

A second dataset gives the information for the transition from Healthy (1) to Death (3) state. We can see how the second individual is censored at time ~3.14 because he does observe the transition to Illness (2) state and therefore is not at risk of transition from Healthy (1) to Death (3) anymore.

head(datMS3)

  id eventTimes3 eventIndic3 eventStart3 trt
1  1    6.749614           0  2.74961425   0
2  2    7.030343           1  3.14361247   0
3  3    2.527041           1  0.39850591   0
4  4    4.784892           0  0.78489164   1
5  5    4.366299           0  0.36629942   0
7  7    4.033388           0  0.03338753   1

The last dataset contains the information for the transition from Illness (2) to Death (3) state. This one is particular because all individuals enter in the Healthy (1) state and therefore must first transition to Illness (2) state to be at-risk of a transition from Illness (2) to Death (3) state. We therefore have an additional column here eventStart3 which gives the time of the beginning of the at-risk period. This requires to use truncation mechanism to indicate for example that individual 2 was not at risk of this transition until time ~3.14. This individual then observed the transition from Illness (2) to Death (3) state at time ~7.03.

We can now fit the illness-death model with INLAjoint, where we include a list of 3 survival formula for each transition intensity process. Since we simulated transitions between times 0 and 8 where the first transition occurs within the first 4 units of time while the transition from illness to death occurs between time 4 and 8, we need to make sure we cover the entire follow-up efficiently. The default 15 equidistant intervals may not be sufficient to capture a sequence of two event risks, therefore we manually set 100 cutpoints to have a finer resolution in the baseline hazards (in this specific example we simulated constant baseline hazards, so this will have a negligible impact but in real data application this is relevant as we don’t know the true shape of the baseline and there is no cost of having too many intervals unless for extra computation time).

CTP <- seq(0, 8, len=100) # intervals for the 3 baseline hazards
M_ms <- joint(formSurv=list(inla.surv(eventTimes1, eventIndic1) ~ trt,
                            inla.surv(eventTimes2, eventIndic2) ~ trt,
                            inla.surv(eventTimes3, eventIndic3, 
                                      truncation=eventStart3) ~ trt),
              dataSurv=list(datMS1, datMS2, datMS3), cutpoints=CTP,
              basRisk="rw1")
summary(M_ms)


Survival outcome (S1)
                              mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S1 0.0108 0.0240     0.0000   0.0031     0.0721
trt_S1                      1.2705 0.1063     1.0622   1.2704     1.4791

Survival outcome (S2)
                               mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S2  0.0202 0.0468     0.0001   0.0053     0.1382
trt_S2                      -0.0362 0.2170    -0.4609  -0.0364     0.3900

Survival outcome (S3)
                               mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S3  0.0194 0.0432     0.0001   0.0054     0.1300
trt_S3                      -1.4250 0.2615    -1.9375  -1.4252    -0.9118

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -3335.953                             -3334.752 

Deviance Information Criterion:  5833.524
Widely applicable Bayesian information criterion:  5833.474
Computation time: 2.37 seconds

We can see that results for the effect of treatment matches with true values (i.e., 1.2, -0.2 and -1.5 for the three transitions). We can also check how the baseline hazard is captured:

plot(M_ms)$Baseline + ylim(0,1)

Figure 2.20: Posterior estimates of baseline transition hazards. The plots show the posterior mean (solid line) and 95% credible interval (shaded region) for the baseline hazards of the three transitions in the illness-death model. The estimates closely match the true constant hazards (dashed red lines) used for data simulation.

The probability of transition between states can be derived using the fitted hazard functions, which describe the instantaneous risk of moving from one state to another at any given time. We can use the predict function to compute hazard functions conditional on scenarios (e.g., trt = 0 vs. trt = 1). First we create a small dataset to describe the scenarios of interest:

ND <- data.frame("id"=1:2, "trt"=0:1);ND

  id trt
1  1   0
2  2   1

Then we call the predict function with the argument return.samples set to TRUE which keeps the samples for uncertainty quantification instead of default behavior that computes summary statistics over these samples:

P <- predict(M_ms, ND, horizon=8, id="id", return.samples=TRUE)
dim(P$PredS)

[1] 300 403

Here we have 300 samples of the hazard function for each transition intensity and both scenarios, measured at multiple time points. We can derive and plot cumulative transition probabilities from these hazard functions:

ID1 <- P$PredS[P$PredS$id==1,]
ID2 <- P$PredS[P$PredS$id==2,]
TP <- ID1[ID1$Outcome=="S_1", "eventTimes1"]
H12_1 <- ID1[ID1$Outcome=="S_1", -c(1:3)]
H13_1 <- ID1[ID1$Outcome=="S_2", -c(1:3)]
H23_1 <- ID1[ID1$Outcome=="S_3", -c(1:3)]
H12_2 <- ID2[ID2$Outcome=="S_1", -c(1:3)]
H13_2 <- ID2[ID2$Outcome=="S_2", -c(1:3)]
H23_2 <- ID2[ID2$Outcome=="S_3", -c(1:3)]

p11_1 <- sapply(1:ncol(H12_1), 
                function(x) exp(-cumsum(H12_1[,x])-cumsum(H13_1[,x])))
p22_1 <- sapply(1:ncol(H12_1), function(x) exp(-cumsum(H23_1[,x])))
p12_1 <- sapply(1:ncol(H12_1), 
                function(x) cumsum(H12_1[,x]*p11_1[,x]*p22_1[,x]))
p13_1 <- sapply(1:ncol(H12_1), function(x) 1-p11_1[,x]-p12_1[,x])
p23_1 <- sapply(1:ncol(H12_1), function(x) 1-p22_1[,x])

p11_2 <- sapply(1:ncol(H12_2), 
                function(x) exp(-cumsum(H12_2[,x])-cumsum(H13_2[,x])))
p22_2 <- sapply(1:ncol(H12_2), function(x) exp(-cumsum(H23_2[,x])))
p12_2 <- sapply(1:ncol(H12_2), 
                function(x) cumsum(H12_2[,x]*p11_2[,x]*p22_2[,x]))
p13_2 <- sapply(1:ncol(H12_2), function(x) 1-p11_2[,x]-p12_2[,x])
p23_2 <- sapply(1:ncol(H12_2), function(x) 1-p22_2[,x])

qt <- function(x) apply(x, 1, function(y) quantile(y, c(0.025,0.5,0.975)))

P2 <- t(cbind(qt(p11_1), qt(p22_1), qt(p12_1), qt(p13_1), qt(p23_1),
              qt(p11_2), qt(p22_2), qt(p12_2), qt(p13_2), qt(p23_2)))
P2 <- data.frame(P2, "id"=rep(c("trt = 0", "trt = 1"), 
                              each = dim(P2)[1]/length(unique(ND$id))), 
                 "Time"=TP,
                 "Transition"=as.factor(rep(rep(c("1->1", "2->2", "1->2", 
                                                  "1->3", "2->3"), 
                                                each=50), 2)))
names(P2)[2] <- "Probability"

ggplot(data=P2, aes(Time, Probability, group=Transition, 
                    color=Transition, linetype=Transition)) + 
  facet_wrap(~id) + geom_line() + theme_minimal() +
  geom_ribbon(data = P2, aes(ymin=X2.5., ymax=X97.5.), alpha=0.1)

Figure 2.21: Predicted transition probabilities for the illness-death model. These probabilities are derived from the multi-state model and show how the risk of moving between states evolves.

We can see how the probability of each transition evolves over time, conditional on covariate trt. Note that time-dependent covariates can be included in multi-state models following the data format described in Section 2.4 and for joint models with shared time-dependent components as described in Chapter 4. More details on multi-state models and the relationship between transition intensities and transition probabilities are available in Putter et al. (2007).

2.10 Frailty model

The frailty model is an extension of the proportional hazards model that accounts for some unobserved heterogeneity in the population. This heterogeneity, often referred to as “frailty” represents group-level variability in the risk of experiencing an event that is not explained by observed covariates. The frailty is modeled as a random effect that modifies the hazard function for each group. This random effect can follow various distributions, such as gamma or lognormal, depending on the model specification. The gamma distribution has been widely used because of the analytical solution in the likelihood of the model (it avoids having to compute the integral over the risk in the survival function). However, INLA approximate the posterior distribution for each random effect anyways and is limited to Gaussian random effects as described in Chapter 1. We therefore focus on the lognormal distribution for the frailty term in this book (i.e., a lognormal frailty corresponds to a Gaussian random effect in the linear predictor, as shown below).

When each individual have their own frailty term, it captures the unique risk profile that is not explained by included covariates (i.e., the frailty random effect captures residual noise). When individuals are grouped or clustered, such as in family studies or multi-center clinical trials. The frailty term can be shared to account for the correlation within groups, allowing for more precise inference about the effect of covariates on survival. Frailty models are particularly useful for analyzing recurrent events, where individuals can experience the same non-terminal event multiple times. In the context of recurrent events, the frailty term captures the individual-level heterogeneity in the risk of experiencing repeated events, which is not explained by observed covariates. This model is referred to as the shared frailty model, it is defined as follows: \[\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left( \gamma \cdot trt_i + v_i\right)\] where \(v_i\) represents the frailty term, with a Gaussian distribution in the linear predictor (i.e., lognormal when taken out of the exponential).

\[v_i|\mu,\sigma^2 \sim \textrm{N}(\mu, \sigma^2)\]

The log-normal frailty model is equivalent to a standard proportional hazards model with an additional normally-distributed random intercept added to the linear predictor. We can simulate data from this model, where the frailty term is shared among groups of individuals:

n=500 # number of individuals
gamma_1=1.5 # treatment effect
baseScale=0.3 # baseline hazard scale
baseShape=0.7 # baseline hazard scale
followup=4 # study duration
frailSD <- 0.5
trt <- as.integer(1:n %in% sort(sample(1:n, n/2)))
u <- runif(n) # uniform distribution for survival times generation
w1 <- rep(rnorm(n/10, mean = 0, sd = frailSD), each=10) # frailty term
eventTimes <- (-(log(u))/(baseScale*exp(trt*gamma_1 + w1)))^(1/baseShape)
eventTimes <- pmin(eventTimes, followup)
eventIndic <- as.numeric(eventTimes<followup) # event indicator
datFr <- data.frame(id=1:n, idF=rep(1:(n/10), each=10),
                      eventTimes, eventIndic, trt) # dataset
str(datFr)

'data.frame':   500 obs. of  5 variables:
 $ id        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ idF       : int  1 1 1 1 1 1 1 1 1 1 ...
 $ eventTimes: num  0.12347 0.00451 1.78749 0.09765 4 ...
 $ eventIndic: num  1 1 1 1 0 1 1 1 1 1 ...
 $ trt       : int  1 0 0 0 0 0 0 0 0 0 ...

Here the column idF corresponds to the groups that share the same frailty term (here we assume groups of 10 individuals). The syntax to fit this model is similar to other survival models presented before but we need to add the frailty term. It can be included in the formula with the form (1|id) where 1 indicates the weight (here 1 corresponds to an intercept) and id is the grouping variable.

M_fr <- joint(formSurv = list(inla.surv(eventTimes,
                                        eventIndic) ~ trt + (1|idF)),
              dataSurv=datFr, id="idF")
summary(M_fr)


Survival outcome
                           mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance) 0.1284 0.0712      0.040   0.1120     0.3139
trt                      1.5610 0.1248      1.317   1.5608     1.8063

Frailty term variance
              mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept 0.3362 0.0908     0.1922   0.3247     0.5475

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -1479.262                             -1478.824 

Deviance Information Criterion:  2896.159
Widely applicable Bayesian information criterion:  2898.626
Computation time: 0.62 seconds

We can see in the results the effect of treatment that matches with the true value (1.5) while the frailty term is described by its variance. This variance quantifies the degree of unobserved heterogeneity within the population. It tells us how much the underlying risk varies between groups (or individuals) after accounting for all the measured covariates in the model. It is however easier to interpret standard deviations rather than variances. We can convert the variance of the frailty random effect to a standard deviation (\(\sigma\)) by adding the argument sdcor=TRUE to the call of the summary function:

summary(M_fr, sdcor=TRUE)


Survival outcome
                           mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance) 0.3463 0.0921     0.1999   0.3347     0.5603
trt                      1.5610 0.1248     1.3170   1.5608     1.8063

Frailty term standard deviation
              mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept 0.5747 0.0768     0.4384   0.5698     0.7399

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -1479.262                             -1478.824 

Deviance Information Criterion:  2896.159
Widely applicable Bayesian information criterion:  2898.626
Computation time: 0.62 seconds

Since the frailty term is assumed to be normally distributed on the log-hazard scale, approximately 95% of the groups will have a frailty term between \(−1.96*\sigma\) and \(+1.96*\sigma\). This can be translated into a range of hazard ratios relative to the median group (which has a frailty of 0 and a hazard ratio of exp(0)=1).

For example, a group at the 2.5th percentile of frailty has a hazard that is exp(-1.96*0.57) = 0.327 times the hazard of the median group. On the other hand, a group at the 97.5th percentile of frailty has a hazard that is exp(1.96*0.57) = 3.055 times the hazard of the median group. Therefore, we can conclude that there is substantial unobserved heterogeneity, with the risk of the event varying by a factor of roughly 9 between the least and most frail groups in the population, due to factors not captured by the measured covariates.

We can plot the posterior marginal distributions for the standard deviation of the frailty as well as the baseline hazard:

plot(M_fr, sdcor=T)$Covariances[[1]]
plot(M_fr)$Baseline

(a) Standard deviation of the frailty random effect.

The posterior distribution of the frailty for each group is available in the fitted object argument summary.random:

head(M_fr$summary.random$IDIntercept_S1[, 1:6])

  ID       mean        sd 0.025quant   0.5quant 0.975quant
1  1  0.8084502 0.3384028  0.1503465  0.8065284  1.4774868
2  2 -0.4984748 0.3329581 -1.1591219 -0.4959672  0.1481000
3  3  0.1675977 0.3156416 -0.4499939  0.1669684  0.7887248
4  4 -0.2046621 0.3025250 -0.8008890 -0.2037576  0.3864904
5  5 -0.3134734 0.2988235 -0.9030804 -0.3123422  0.2697904
6  6  0.2161856 0.3336575 -0.4359058  0.2152369  0.8735971

Here we can see how the first group has a higher risk of event compared to population average while the second group has a lower risk. Finally, we can use the predict function to compare the survival probabilities conditional on covariates while accounting for the frailty. Let’s consider two scenarios for each trt line:

NewData <- data.frame("trt"=0:1, idF=1:2);NewData

  trt idF
1   0   1
2   1   2

P <- predict(M_fr, NewData, horizon = 4, id="trt", survival=TRUE)
plot_F <- ggplot(P$PredS, aes(x = eventTimes, y = Surv_quant0.5,
                    group = trt, color=trt, linetype=trt)) +
  geom_line() + theme_minimal() +
  geom_line(aes(y = Surv_quant0.025),linetype="dotted") +
  geom_line(aes(y = Surv_quant0.975),linetype="dotted") +
  scale_color_discrete(name = "trt", labels = c("0", "1")) +
  scale_linetype_discrete(name = "trt", labels = c("0", "1")) +
  xlab("Time") + ylab("Survival probability") +
  theme(legend.position="bottom") + ggtitle("Marginal")

Here we assume the new individuals we predict for are associated to an independent draw from the posterior marginal distribution of \(v_i\) (i.e., idF=1,2 is treated as two individuals that each belong to a new frailty group). It is also possible to predict conditional on the new individual belonging to an already existing group of the frailty random effect (e.g., a frailty term is assumed for different cities and the new individual belongs to one of the city we learned from). Let’s consider our two individuals belong to frailty group 1 vs. group 2. We need to first draw samples from the conditional posterior of the corresponding frailty group, which is done with inla.posterior.sample, where we can select draws from the posterior distribution of the frailty group of interest (here the first two ids of the frailty random effect named ID_Intercept_S1). Then, we call the predict function where the pre-sampled conditional frailty values are directly provided in the set.samples argument of the predict function. Internally, INLAjoint uses these samples instead of assuming a new draw from the posterior distribution of the random effect:

NsampleHY <- 20 ; NsampleFE <- 20 # default predict() options
Frail_i_samples <- sapply(inla.posterior.sample(NsampleHY*NsampleFE, 
  M_fr, selection = list("IDIntercept_S1"=1:2)), function(x) x$latent)
P_1 <- predict(M_fr, NewData, horizon = 4, id="trt", survival=TRUE,
               set.samples=list("IDIntercept_S1"=Frail_i_samples[1,]))
P_2 <- predict(M_fr, NewData, horizon = 4, id="trt", survival=TRUE,
               set.samples=list("IDIntercept_S1"=Frail_i_samples[2,]))
plot_1 <- ggplot(P_1$PredS, aes(x = eventTimes, y = Surv_quant0.5,
                 group = trt, color=trt, linetype=trt)) +
          geom_line()+theme_minimal()+theme(legend.position="bottom") +
          geom_line(aes(y = Surv_quant0.025),linetype="dotted") +
          geom_line(aes(y = Surv_quant0.975),linetype="dotted") +
          scale_color_discrete(name = "trt", labels = c("0", "1")) +
          scale_linetype_discrete(name = "trt", labels = c("0", "1")) +
          xlab("Time")+ylab("Survival probability")+ggtitle("Group 1")
plot_2 <- ggplot(P_2$PredS, aes(x = eventTimes, y = Surv_quant0.5,
                 group = trt, color=trt, linetype=trt)) +
          geom_line()+theme_minimal()+theme(legend.position="bottom") +
          geom_line(aes(y = Surv_quant0.025),linetype="dotted") +
          geom_line(aes(y = Surv_quant0.975),linetype="dotted") +
          scale_color_discrete(name = "trt", labels = c("0", "1")) +
          scale_linetype_discrete(name = "trt", labels = c("0", "1")) +
          xlab("Time")+ylab("Survival probability")+ggtitle("Group 2")
ggarrange(plot_F, plot_1, plot_2, nrow=1)

Figure 2.23: Survival probability predictions from the frailty model comparing marginal predictions (left) with group-specific predictions for frailty groups 1 (center) and 2 (right).

We can see that using the information from a frailty group from which we already learned makes predictions more accurate compared to when we assumed the individuals belongs to a new frailty group from which we don’t know anything.

2.11 Joint frailty model

The frailty term that is captured in a shared frailty model corresponds to a deviation from the average risk. This frailty term could be shared in another submodel in order to capture the effect of being higher at risk for an event on the risk of another event. This is referred to as the joint frailty model (Rondeau et al. (2007)). While in the previous section the frailty was shared within groups of individuals with an unique event time, here we show how the frailty term can also be used to model recurrent events.

For example if we consider the risk of going to the hospital for individuals that may visit the hospital multiple times, a frailty term can capture how frail is each individual, where individuals with multiple visits to the hospital are associated to a higher frailty compared to individuals that never visit the hospital. This captured frailty could be shared in a second survival model that evaluates the effect of covariates on the risk of death, where the frailty term can be shared in order to evaluate if an individual with a higher risk of going to the hospital is associated to a higher risk of death for example. Let’s consider the following model \[\left\{\begin{array}{ c @{{}{}} r @{{}{}} r }\lambda_{i1}(t)=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}\cdot trt_i + v_i \right) \qquad & \textbf{S1}\\ \lambda_{i2}(t)=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{21}\cdot trt_i + \gamma_{22}\cdot v_i \right) \qquad & \textbf{S2}\\ \end{array}\right.\] where the first model corresponds to the risk of recurrent events for individual \(i\) while the second model corresponds to the risk of another event of interest for individual \(i\). The covariate trt is a binary covariate that affect the risk of both events through risk-specific fixed effects \(\gamma_{11}\) and \(\gamma_{21}\). The frailty term \(v_i\) captures the individual deviation from the average risk of the recurrent event and also affects the risk of the second event, scaled by the fixed effect \(\gamma_{22}\).

We can simulate data from this model, first we simulate recurrent events:

n=1000 # number of individuals
ids <- as.factor(1:n)
gamma_1=1.2 # trt effect recurrent events
baseScale=0.2 # baseline hazard scale recurrent events
followup=4 # censoring threshold for first event
frailSD <- 0.4
trt <- as.integer(1:n %in% sort(sample(1:n, n/2, replace=F)))
w1 <- rnorm(n, mean = 0, sd = frailSD) # frailty term
rate <- baseScale*exp(trt*gamma_1 + w1)
t1 <- rexp(n, rate) # Event 1
datFrJt <- data.frame(id = 1:n, trt, time.1 = pmin(t1, followup),
                      status.1 = (t1 < followup) * 1)
t2 <- datFrJt$time.1 + rexp(n, rate) # Event 2 (conditional on event 1)
datFrJt$status.2 <- (datFrJt$status.1 == 1 & t2 < followup) * 1
datFrJt$time.2 <- ifelse(datFrJt$status.1 == 1, pmin(t2, followup), NA)
t3 <- datFrJt$time.2 + rexp(n, rate) # Event 3 (conditional on event 2)
datFrJt$status.3 <- (datFrJt$status.2 == 1 & t3 < followup) * 1
datFrJt$time.3 <- ifelse(datFrJt$status.2 == 1, pmin(t3, followup), NA)
datFrJ <- reshape(datFrJt, v.names = c("eventTimes", "eventIndic"),
           varying = list(eventTimes = c("time.1", "time.2", "time.3"),
           eventIndic = c("status.1", "status.2", "status.3")),       
           idvar = "id", direction = "long")
datFrJ <- datFrJ[!is.na(datFrJ$eventTimes), ]
datFrJ <- datFrJ[order(datFrJ$id, datFrJ$eventTimes), ]
head(datFrJ)

    id trt time eventTimes eventIndic
1.1  1   1    1  2.3341131          1
1.2  1   1    2  2.9997556          1
1.3  1   1    3  3.1077727          1
2.1  2   0    1  0.4567963          1
2.2  2   0    2  4.0000000          0
3.1  3   1    1  1.4245409          1

For simplicity we limit the number of recurrent events for each individual to 3 and we assume calendar times (time to event), although switching to gap times (i.e., time between events) is straightforward.

Now, we can simulate another event time that depends on the frailty scaled by a fixed effect:

gamma21=-0.5 # trt effect second event
gamma22=1 # frailty effect second event
baseScale2=0.3 # baseline hazard scale second event
eventTimes2 <- -(log(runif(n))/(baseScale2*exp(trt*gamma21 + w1*gamma22)))
eventIndic2 <- as.numeric(eventTimes2<followup)
eventTimes2 <- pmin(eventTimes2, followup)
datFrJ2 <- data.frame("id" = 1:n, "trt" = trt,
                      "eventTimes2" = eventTimes2,
                      "eventIndic2" = eventIndic2)
head(datFrJ2)

  id trt eventTimes2 eventIndic2
1  1   1    2.413655           1
2  2   0    1.909494           1
3  3   1    4.000000           0
4  4   0    4.000000           0
5  5   1    4.000000           0
6  6   0    4.000000           0

We have two datasets because datFrJ contains multiple lines per individuals (recurrent events) while datFrJ2 contains an unique event time per individual. Of note, here the second event is assumed not terminal.

We can fit this model as follows:

M_frJ <- joint(formSurv = list(
  inla.surv(eventTimes, eventIndic, subject=id) ~ trt + (1|id),
  inla.surv(eventTimes2, eventIndic2) ~ trt), 
            assocSurv = TRUE, dataSurv=list(datFrJ, datFrJ2), id="id")

The formSurv argument for this model is similar to the competing risks and multi-state models as it involves a list of formulas for each event outcome. Here we have the first formula for recurrent events that includes the frailty term, where inla.surv includes the argument subject for the data expansion to account for individual recurrent events. The second formula corresponds to the second event where the frailty term is shared and scaled.

The parameter assocSurv indicates that the frailty term from the recurrent event model is shared in the second event model, if it was set to FALSE, then the frailty term would only be included in the first submodel. The order is important, the first model in the list of survival formulas (formSurv) should include a random effect and it can be shared in the next formulas. Multiple survival models with random effects can be accommodated and a random effect can be shared in multiple survival models (i.e., this argument should be a vector of booleans if one random effect is shared in multiple survival submodels and a list of vectors if multiple survival models with random effects share their random effects in multiple survival models).

summary(M_frJ, sdcor=T)


Survival outcome (S1)
                              mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S1 0.0795 0.0501     0.0179   0.0681     0.2094
trt_S1                      1.1708 0.0663     1.0415   1.1705     1.3017

Frailty term standard deviation (S1)
                 mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept_S1 0.4567 0.0453     0.3701   0.4563     0.5477

Survival outcome (S2)
                               mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S2  0.0612 0.0455     0.0109   0.0493     0.1830
trt_S2                      -0.4647 0.0896    -0.6410  -0.4644    -0.2897

Association survival - survival
                    mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept_S1_S2 0.9755 0.2372     0.5277   0.9692     1.4608

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -7712.683                             -7710.836 

Deviance Information Criterion:  15310.7
Widely applicable Bayesian information criterion:  15323.67
Computation time: 2.99 seconds

We can see that the model is able to capture the true values of the parameters, such as the fixed effect of treatment on first (1.2) and second (-0.5) outcomes. The association parameter links the two survival processes, when positive, individuals or groups with a higher hazard rate for the recurrent event (i.e., a higher frailty term) also have a higher hazard for the second event. A negative value means that a higher risk for the recurrent event is associated with a lower risk for the second event. If the credible interval contains zero, there is no statistical evidence of an association between the two processes, conditional on the covariates.

The estimated posterior mean for the association is 0.975, which is a log-hazard ratio. We can directly get the hazard ratio by adding the argument hr=TRUE to the call of the summary function:

summary(M_frJ, sdcor=T, hr=T)


Survival outcome (S1)
                            exp(mean)     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S1    0.0795 0.0501     0.0179   0.0681     0.2094
trt_S1                         3.2313 0.2114     2.8378   3.2232     3.6688

Frailty term standard deviation (S1)
                 mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept_S1 0.4567 0.0453     0.3701   0.4563     0.5477

Survival outcome (S2)
                            exp(mean)     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance)_S2    0.0612 0.0455     0.0109   0.0493     0.1830
trt_S2                         0.6308 0.0557     0.5279   0.6284     0.7467

Association survival - survival
                  exp(mean)     sd 0.025quant 0.5quant 0.975quant
IDIntercept_S1_S2    2.7258 0.6545      1.706   2.6302     4.2694

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -7712.683                             -7710.836 

Deviance Information Criterion:  15310.7
Widely applicable Bayesian information criterion:  15323.67
Computation time: 2.99 seconds

A one-unit increase in the log-frailty for the recurrent event is associated with a hazard ratio of 2.726 for the second event.

After accounting for the effect of treatment, there is strong evidence for a positive association between the two event processes. Specifically, a one-unit increase in the subject-specific log-frailty for the recurrent event is associated with a 173% increase on average (95% CI: [70%, 327%]) in the hazard for the second event. This suggests that common unobserved factors drive the risk for both types of events.

We can also display the baseline hazard for each outcome as well as the posterior marginal of the standard deviation of the frailty:

plot(M_frJ)$Baseline + ylim(0.1,0.4)
plot(M_frJ, sdcor=T)$Covariances[[1]]

(a) Baseline hazard functions for both recurrent (S1) and terminal (S2) events.

Finally, we can predict from this model the survival probability for each outcome conditional on covariate trt:

NewData <- data.frame("trt"=0:1, id=1:2);NewData

  trt id
1   0  1
2   1  2

P <- predict(M_frJ, NewData, horizon = 4, id="id", survival=TRUE)

ggplot(P$PredS, aes(x=eventTimes, y=Surv_quant0.5, group=Outcome, 
                    colour=Outcome, linetype=Outcome)) + theme_minimal() +
  facet_wrap(~id, labeller = labeller(id=c("1" = "trt = 0",
                                           "2" = "trt = 1"))) +
  geom_line() + geom_ribbon(aes(ymin=Surv_quant0.025,
                                ymax=Surv_quant0.975), alpha=0.1) +
  ylab("Survival probability") + ylim(0,1) + xlab("Time")

Figure 2.25: Survival probability predictions from the joint frailty model for both recurrent events (S1) and terminal events (S2), showing different treatment effects on each type of event.

We can also compute the probability of observing the second event type conditional on recurrent observations of the first event. In order to do so, let’s assume one individual observe 3 recurrent events during the follow-up at time 0.3, 0.6 and 1 while another individual is censored without observing any recurrent events at time 1, we predict at horizon 4:

NewData <- data.frame(trt=rep(0, 4), id=c(1,1,1,2), 
                      eventTimes=c(0.3,0.6,1,1), 
                      eventIndic=c(1,1,1,0));NewData

  trt id eventTimes eventIndic
1   0  1        0.3          1
2   0  1        0.6          1
3   0  1        1.0          1
4   0  2        1.0          0

P <- predict(M_frJ, NewData, horizon = 4, id="id", survival=TRUE, Csurv=1)

ggplot(P$PredS, aes(x=eventTimes, y=Surv_quant0.5,
                    group=Outcome, colour=Outcome, linetype=Outcome)) +
  facet_wrap(~id, labeller = labeller(
    id=c("1" = "3 recurrent events recorded, trt=0",
         "2" = "no recurrent events recorded, trt=0"))) +
  geom_line() + geom_ribbon(aes(ymin=Surv_quant0.025,
                                ymax=Surv_quant0.975), alpha=0.1) +
  ylab("Survival probability") + ylim(0,1) + xlab("Time") +
  theme_minimal()

Figure 2.26: Conditional survival probability predictions from the joint frailty model comparing individuals with different recurrent event histories. The individual with 3 recorded recurrent events (left) shows lower survival probabilities than the individual with no recorded events (right), reflecting the positive association between event types.

We can see how the individual that observed some recurrent events is associated to a higher frailty compared to the other individual, leading to lower survival probabilities for both events (as a higher risk of the first event is associated to a higher risk of the second event since the association \(\gamma_{22}\) is positive). These two plots presented in Figure 2.26 show respectively higher/lower survival probabilities compared to the population average shown in Figure 2.25 since they represent scenarios associated to opposite deviations from the population average.

2.12 Prior distributions for survival models

Prior distributions represent the initial beliefs or knowledge about the parameters of the model before observing the data. As discussed in Chapter 1, they are updated once data is observed according to Bayes theorem to get posterior distributions. Informative priors reflect strong beliefs or existing knowledge about the parameters range of plausible values, based on previous studies or expert opinion. Weakly informative priors, on the other hand, are vague and allow the data to dominate the inference process. However, excessively vague priors can have detrimental effects, so it is preferable to use a prior that puts more mass on a plausible range of values. In this section we present the default priors used for the models described in the rest of this chapter, and we show how to modify these default priors.

The default priors in INLAjoint are weakly informative, although for some parameters the notion of prior informativeness can be tricky. Depending on the scale of the observed data, and the desired behavior of the prior that may be based on the research question of interest, the default priors of INLAjoint could be too informative or not informative enough.

2.12.1 Fixed effects

Fixed effects are assigned a Gaussian prior with mean 0 and variance 100 (which is a vague prior for scaled covariates). It is possible to display the priors of a fitted model with the function inla.priors.used for any R-INLA or INLAjoint object, or directly by looking at the argument priors_used for INLAjoint objects. For example we can check the priors for the fixed effects of the first model fitted with INLAjoint in Section 2.2.1:

paste0("mean: ", M_exp$priors_used$priorFixed$mean)

[1] "mean: 0"

paste0("prec: ", M_exp$priors_used$priorFixed$prec)

[1] "prec: 0.01"

There are 2 elements corresponding to the mean and precision (i.e., inverse of variance) for the fixed effects. These can be changed in the control options of INLAjoint when fitting the model:

M_exp2 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                dataSurv=datExp, basRisk = "exponentialsurv",
                control=list(priorFixed=list(mean=-1, prec=1)))
paste0("mean: ", M_exp2$priors_used$priorFixed$mean)

[1] "mean: -1"

paste0("prec: ", M_exp2$priors_used$priorFixed$prec)

[1] "prec: 1"

summary(M_exp2)


Survival outcome
                     mean     sd 0.025quant 0.5quant 0.975quant
Exponential (rate) 0.2123 0.0171     0.1806   0.2115     0.2478
trt                0.1248 0.1148    -0.1002   0.1248     0.3498

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -747.2511                             -747.2511 

Deviance Information Criterion:  1486.856
Widely applicable Bayesian information criterion:  1486.856
Computation time: 0.44 seconds

Here, although we set a more informative prior (\(\mathcal{N}(\mu=-1, \sigma^2=1)\)) on the fixed effects compared to default \(\mathcal{N}(\mu=0, \sigma^2=100)\), it still includes the true values (log(0.2) for the exponential rate and 0.2 for the treatment effect) in the plausible area and thus when updating priors with the observed data, we recover the true value of the exponential rate and treatment effect in the posterior. The posterior here is mostly data-driven.

par(mar = c(4,4,2,1), xpd=FALSE, cex.lab=1, mfrow=c(1,2), cex=0.8)
curve(dnorm(x, mean = -1, sd = 1), from=-3, to=3, 
      main="Exponential baseline rate", xlab="", ylab="Density", 
      ylim=c(0,5), lwd=2)
lines(M_exp2$marginals.fixed$Intercept_S1, col=2, lty=2, lwd=2)
curve(dnorm(x, mean = -1, sd = 1), from=-3, to=3, 
      main="Treatment effect", xlab="", ylab="", ylim=c(0,5), lwd=2)
lines(M_exp2$marginals.fixed$trt_S1, col=2, lty=2, lwd=2)
par(xpd=NA)
legend(-10,-1.2, c("Prior distribution", "Posterior distribution", 
                   "True value"), lty=1:3, col=c(1,2,4), lwd=2, 
       horiz = T, bty="n")

Figure 2.27: Comparison of prior (black solid) and posterior (red dashed) distributions for the exponential baseline rate (left) and treatment effect (right). The blue dotted lines show the true values. The posteriors are largely data-driven.

INLAjoint’s plot function allows to display the priors and posterior distribution of covariates effects by adding the argument priors=TRUE to the call of the plot function:

plot(M_exp2, priors=TRUE)$Outcomes[[1]]

Figure 2.28: Prior and posterior distributions displayed using the `INLAjoint` plot function with `priors=TRUE`. This provides a convenient way to visualize the impact of prior specification on parameter estimation.

We can set an informative prior \(\mathcal{N}(\mu=-10, \sigma^2=0.01)\) which is incompatible with the true value that generated the data:

M_exp3 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                dataSurv=datExp, basRisk = "exponentialsurv",
                control=list(priorFixed=list(mean=-10, prec=100)))
par(mar = c(4,4,2,1), xpd=FALSE, cex.lab=1, mfrow=c(1,2), cex=0.8)
curve(dnorm(x, mean = -10, sd = sqrt(0.01)), from=-15, to=15, 
      main="Exponential baseline rate", xlab="", ylab="Density", 
      ylim=c(0,5), lwd=2)
lines(M_exp3$marginals.fixed$Intercept_S1, col=2, lty=2, lwd=2)
curve(dnorm(x, mean = -10, sd = sqrt(0.01)), from=-15, to=15, 
      main="Treatment effect", xlab="", ylab="", ylim=c(0,5), lwd=2)
lines(M_exp3$marginals.fixed$trt_S1, col=2, lty=2, lwd=2)
par(xpd=NA)
legend(-50,-1.2, c("Prior distribution", "Posterior distribution", 
                   "True value"), lty=1:3, col=c(1,2,4), lwd=2, 
       horiz = T, bty="n")

Figure 2.29: Impact of a misspecified informative prior. This figure illustrates the effect of using a strong but incorrect prior for the treatment effect and the exponential baseline rate. The posterior distributions (dashed lines) are heavily influenced by the misspecified priors (solid lines) and fail to capture the true parameter values (vertical dotted lines), illustrating the risks of overly confident prior specifications.

We can see that with such an informative prior, the posterior is not matching the true value anymore. There is not enough data to overcome the amount of information given by the prior.

It is also possible to set a different prior for each fixed effect, in order to do so we need to use the “build only” technique introduced in Section 1.5.8. We first create the model in INLAjoint, and then assign different priors for each fixed effect, whereafter we can run the model in R-INLA, as illustrated next.

M_exp40 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                dataSurv=datExp, basRisk = "exponentialsurv", run=FALSE)
M_exp40$.args$control.fixed$mean=list("Intercept_S1"=-1, "trt_S1"=1)
M_exp40$.args$control.fixed$prec=list("Intercept_S1"=1, "trt_S1"=2)
M_exp4 <- joint.run(M_exp40)

par(mar = c(4,4,2,1), xpd=FALSE, cex.lab=1, mfrow=c(1,2), cex=0.8)
curve(dnorm(x, mean = -1, sd = 1), from=-3, to=3, ylim=c(0,5), lwd=2,
      main="Exponential baseline rate", xlab="", ylab="Density")
lines(M_exp4$marginals.fixed$Intercept_S1, col=2, lty=2, lwd=2)
curve(dnorm(x, mean = 1, sd = sqrt(0.5)), from=-3, to=3, 
      main="Treatment effect", xlab="", ylab="", ylim=c(0,5), lwd=2)
lines(M_exp4$marginals.fixed$trt_S1, col=2, lty=2, lwd=2)
par(xpd=NA)
legend(-10,-1.2, c("Prior distribution", "Posterior distribution", 
                   "True value"), lty=1:3, col=c(1,2,4), lwd=2, 
       horiz = T, bty="n")

Figure 2.30: Posterior distributions with user-defined priors in `INLAjoint.` The plot shows the posterior distributions (dashed lines) for the treatment effect and the exponential baseline rate where the priors are displayed as solid lines. The posteriors are centered around the true parameter values (vertical dotted lines).

2.12.2 Baseline hazard parameters

We showed how the log-rate of the exponential baseline is considered as a fixed effect (i.e., the value of the linear predictor when all covariates are zero). In Section 2.2.2, we have introduced the parametric Weibull baseline hazard which involves two parameters, where the scale is defined as the rate of the exponential and the shape of the Weibull is considered as an hyperparameter. The prior for this hyperparameter can be tricky as it is not easy to figure out what a non-informative distribution would mean for such a parameter. INLA defines a PC prior for this parameter, see Van Niekerk et al. (2021) for a detailed discussion of the choice of INLA’s prior for this shape parameter.

The base model for the Weibull distribution is the simpler exponential distribution, which corresponds to a shape parameter \(\alpha=1\). This represents a constant hazard over time. The PC prior penalizes the model’s complexity as it deviates from this exponential base model. In practice, this means the prior will shrink the shape parameter toward 1, favoring a constant hazard unless there is evidence in the data for an increasing (\(\alpha > 1\)) or decreasing (\(\alpha < 1\)) hazard. This principled shrinkage helps to prevent overfitting and ensures that the more flexible Weibull model can revert to the simpler exponential model when the data supports it, a property that is not guaranteed with other common prior choices like the Gamma distribution. See inla.doc("pc.alphaw") for more details on this prior and default options with INLA.

For the random walk baseline, the model involves an additional parameter for the precision of the random walk which is assigned a PC prior, i.e., \(P(\sigma>\sigma_0) = p\), where \(\sigma_0\) and \(p\) are user-defined. This is an intuitive prior that interprets as “the probability of the parameter having a higher value than \(\sigma_0\) is \(p\)”. Default values are \(\sigma_0=0.5\) and \(p=0.01\). For example we can have a look at the prior used in the random walk of order 2 model fitted in Section 2.2.3:

M_rw2$priors_used$priorRW

[1] 0.50 0.01

And we can modify this prior, although the default option is usually a good choice for these parameters where there is no intuition on the desired behavior of the prior.

M_rw2_2 <- joint(formSurv = inla.surv(eventTimes, eventIndic) ~ trt,
                 dataSurv=datWei, basRisk = "rw2",
                 control=list(priorRW=c(1, 0.5)))
M_rw2_2$priors_used$priorRW

[1] 1.0 0.5

More generally, priors can be freely modified to any available distribution in INLA (including the possibility for the user to implement new priors, see Chapter 5 of Gómez-Rubio (2020)).

2.12.3 Frailty random effect

The variance of the frailty random effects are assigned by default an inverse-Gamma prior \(IG(\alpha=4, \beta=1)\). We can check the priors from the model fitted in Section 2.10:

M_fr$priors_used$priorFrailty

$alpha
[1] 4

$beta
[1] 1

The parameters \(\alpha\) and \(\beta\) of this prior can be changed in control options:

M_fr2 <- joint(formSurv = list(inla.surv(eventTimes,
                                         eventIndic) ~ trt + (1|idF)),
               dataSurv=datFr, id="idF", 
               control=list(priorFrailty=list(alpha=2, beta=0.1)))
M_fr2$priors_used$priorFrailty

$alpha
[1] 2

$beta
[1] 0.1

It is also possible to display the prior and posterior distributions of the variance with the boolean argument priors in the call of the plot function:

plot(M_fr2, priors=TRUE)$Covariances$L1

Figure 2.31: Prior (dashed line) and posterior (solid line) for the variance parameter of a frailty term in a proportional hazards model.

It can also be displayed as standard deviation using the argument sdcor=TRUE:

plot(M_fr2, priors=TRUE, sdcor=TRUE)$Covariances$L1

Figure 2.32: Prior (dashed line) and posterior (solid line) for the standard deviation of a frailty term in a proportional hazards model.

Finally, it is possible to switch to a PC prior for the variance of the frailty, although not directly available in INLAjoint it can be switched using the “build only” trick introduced in Section 1.5.8. The code below demonstrates this process by first setting up the model without running it (run=FALSE), and then manually modifying the underlying INLA formula. The key change is to replace the default random effect with an f() term that explicitly specifies the PC prior for the precision (prior = "pc.prec"). This prior is defined by the user-defined parameters param = c(1, 0.01), which sets the prior based on an intuitive statement about the standard deviation of the frailty (i.e., the probability that the variance of the frailty random effect is higher than 1 is 1%):

M_fr30 <- joint(formSurv = list(inla.surv(eventTimes,
                                          eventIndic) ~ trt + (1|idF)),
              dataSurv=datFr, id="idF", run=F)
M_fr30$.args$formula <- as.formula(paste(c("Yjoint ~ -1 ",
   attr(terms(M_fr30$.args$formula), which = "term.labels")[1:3],
   "+ f(IDIntercept_S1, WIntercept_S1, model = \"iid\", n = 50, 
   hyper = list(prec = list(prior = \"pc.prec\", param = c(1, 0.01))))"), 
   collapse="+"))
M_fr3 <- joint.run(M_fr30)
summary(M_fr3)


Survival outcome
                           mean     sd 0.025quant 0.5quant 0.975quant
Baseline risk (variance) 0.1282 0.0711     0.0398   0.1118     0.3137
trt                      1.5609 0.1256     1.3153   1.5607     1.8077

Frailty term variance
              mean     sd 0.025quant 0.5quant 0.975quant
IDIntercept 0.3385 0.1005     0.1814   0.3253      0.574

log marginal-likelihood (integration)    log marginal-likelihood (Gaussian) 
                            -1481.139                             -1480.708 

Deviance Information Criterion:  2896.449
Widely applicable Bayesian information criterion:  2898.933
Computation time: 0.67 seconds

The pc priors here may be useful to overcome some issues related to the inverse-Gamma distribution for variance parameters, see Sterrantino et al. (2025).

Alvares, D., Van Niekerk, J., Krainski, E. T., Rue, H., & Rustand, D. (2024). Bayesian survival analysis with INLA. Statistics in Medicine, 43(20), 3975–4010. https://doi.org/10.1002/sim.10160

Anderson-Bergman, C. (2017). icenReg: Regression models for interval censored data in R. Journal of Statistical Software, 81(12), 1–23. https://doi.org/10.18637/jss.v081.i12

Broström, G. (2022). Event history analysis with r, second edition. Chapman; Hall/CRC. https://doi.org/10.1201/9780429503764

Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x

Cox, D. R. (1975). Partial likelihood. Biometrika, 62(2), 269–276. https://doi.org/10.1093/biomet/62.2.269

Fisher, L. D., & Lin, D. Y. (1999). Time-dependent covariates in the cox proportional-hazards regression model. Annual Review of Public Health, 20(1), 145–157. https://doi.org/10.1146/annurev.publhealth.20.1.145

Garcı́a-Hernandez, A., Pérez, T., Pardo, M. del C., & Rizopoulos, D. (2020). MMRM vs joint modeling of longitudinal responses and time to study drug discontinuation in clinical trials using a “de jure” estimand. Pharmaceutical Statistics. https://doi.org/10.1002/pst.2045

Gentleman, R., Whalen, E., Huber, W., Falcon, S., Gentry, J., & Shannon, P. (2024). Graph: A package to handle graph data structures. https://doi.org/10.18129/B9.bioc.graph

Gómez-Rubio, V. (2020). Bayesian inference with INLA. Chapman; Hall/CRC. https://doi.org/10.1201/9781315175584

Gressani, O., Faes, C., & Hens, N. (2022). Laplacian-p-splines for bayesian inference in the mixture cure model. Statistics in Medicine, 41(14), 2602–2626. https://doi.org/10.1002/sim.9373

Holford, T. R. (1976). Life tables with concomitant information. Biometrics, 32(3), 587–597. https://doi.org/10.2307/2529747

Ibrahim, J. G., Chu, H., & Chen, L. M. (2010). Basic concepts and methods for joint models of longitudinal and survival data. Journal of Clinical Oncology: Official Journal of the American Society of Clinical Oncology, 28(16), 2796–2801. https://doi.org/10.1200/JCO.2009.25.0654

Johansen, S. (1983). An extension of cox’s regression model. International Statistical Review / Revue Internationale de Statistique, 51(2), 165–174. https://doi.org/10.2307/1402746

Neto, D. S., Rustand, D., Rue, H., Alvares, D., & Tomazella, V. L. (2025). A new approach for bayesian joint modeling of longitudinal and cure-survival outcomes using the defective gompertz distribution. arXiv Preprint arXiv:2507.23196. https://doi.org/10.48550/arXiv.2507.23196

Pan, W. (2000). A multiple imputation approach to cox regression with interval-censored data. Biometrics, 56(1), 199–203. https://doi.org/ 10.1111/j.0006-341x.2000.00199.x

Putter, H., Fiocco, M., & Geskus, R. B. (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine, 26(11), 2389–2430. https://doi.org/10.1002/sim.2712

Rondeau, V., Mathoulin-Pelissier, S., Jacqmin-Gadda, H., Brouste, V., & Soubeyran, P. (2007). Joint frailty models for recurring events and death using maximum penalized likelihood estimation: Application on cancer events. Biostatistics, 8(4), 708–721. https://doi.org/10.1093/biostatistics/kxl043

Royston, P., & Parmar, M. K. (2002). Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine, 21(15), 2175–2197. https://doi.org/10.1002/sim.1203

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592. https://doi.org/10.2307/2335739

Rustand, D., Van Niekerk, J., Rue, H., Tournigand, C., Rondeau, V., & Briollais, L. (2023). Bayesian estimation of two-part joint models for a longitudinal semicontinuous biomarker and a terminal event with INLA: Interests for cancer clinical trial evaluation. Biometrical Journal, 65(4), 2100322. https://doi.org/10.1002/bimj.202100322

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B: Statistical Methodology, 64(4), 583–639. https://doi.org/10.1111/1467-9868.00353

Sterrantino, A. F., Rustand, D., Niekerk, J. V., Krainski, E. T., & Rue, H. (2025). A graphical framework for interpretable correlation matrix models. arXiv Preprint arXiv:2312.06289. https://doi.org/10.48550/arXiv.2312.06289

Terry M. Therneau, & Patricia M. Grambsch. (2000). Modeling survival data: Extending the Cox model. Springer. https://doi.org/10.1007/978-1-4757-3294-8

Therneau, T. M. (2023). Survival: A package for survival analysis in r. https://doi.org/10.32614/CRAN.package.survival

Tsiatis, A. A., Degruttola, V., & Wulfsohn, M. S. (1995). Modeling the relationship of survival to longitudinal data measured with error. Applications to survival and CD4 counts in patients with AIDS. Journal of the American Statistical Association, 90(429), 27–37. https://doi.org/10.2307/2291126

Van Niekerk, J., Bakka, H., & Rue, H. (2021). A principled distance-based prior for the shape of the weibull model. Statistics & Probability Letters, 174, 109098. https://doi.org/10.1016/j.spl.2021.109098

Van Niekerk, J., Bakka, H., & Rue, H. (2023). Stable non-linear generalized bayesian joint models for survival-longitudinal data. Sankhya A.

Van Niekerk, J., & Rue, H. (2022). Use of the INLA approach for the analysis of interval-censored data. In Emerging topics in modeling interval-censored survival data (pp. 123–140). Springer.

Wang, Y., & Taylor, J. M. G. (2001). Jointly modeling longitudinal and event time data with application to acquired immunodeficiency syndrome. Journal of the American Statistical Association, 96(455), 895–905. https://doi.org/10.1198/016214501753208591

Watanabe, S. (2013). A widely applicable bayesian information criterion. The Journal of Machine Learning Research, 14(1), 867–897. https://www.jmlr.org/papers/volume14/watanabe13a/watanabe13a.pdf

Whitehead, J. (1980). Fitting cox’s regression model to survival data using GLIM. Journal of the Royal Statistical Society C (Applied Statistics), 29(3), 268–275. https://doi.org/10.2307/2346901

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. https://doi.org/10.1007/978-0-387-98141-3

Xu, C., Hadjipantelis, P. Z., & Wang, J.-L. (2020). Semi-parametric joint modeling of survival and longitudinal data: The r package JSM. Journal of Statistical Software, 93(2). https://doi.org/10.18637/jss.v093.i02