Skip to contents

This vignette defines the models and historical borrowing metrics supported in the historicalborrow package.

Models

Common notation

  • yy: vector of patient-specific clinical responses to a continuous outcome variable. Ideally, the outcome variable should be some form of change from baseline, not the response itself. If the outcome is the raw response, then the treatment effect will not be meaningful.
  • yijy_{ij}: the element of yy corresponding to study ii patient jj.
  • (X)ij(X)_{ij}: the row of matrix XX corresponding to study ii patient jj.
  • α\alpha: Vector of control group mean parameters, one for each study. The first elements are for the historical studies, and the last one is for the current study.
  • δ\delta: Vector of study-specific treatment mean parameters. There is one for each combination of study and non-control treatment group.
  • dd: integer index for the elements of δ\delta.
  • bb: integer index for the elements of β\beta.
  • β\beta: Vector of study-specific baseline covariate parameters.
  • XαX_\alpha: matrix for the control group mean parameters α\alpha. It has indicator columns to select the appropriate element of α\alpha for each element of yy.
  • XδX_\delta: matrix for the treatment mean parameters δ\delta. It has indicator columns to select the appropriate element of δ\delta for each element of yy.
  • XβX_\beta: matrix for the baseline covariate fixed effect parameters β\beta. It has indicator columns to select the appropriate element of β\beta for each element of yy.
  • σ\sigma: Vector of study-specific residual standard deviations.
  • I()I(\cdot): indicator function.

Model matrices

Each primary model is parameterized thus:

E(y)=Xαα+Xδδ+Xββ \begin{aligned} E(y) = X_\alpha \alpha + X_\delta \delta + X_\beta \beta \end{aligned}

Above, XαX_\alpha, XδX_\delta, and XβX_\beta are fixed matrices. XβX_\beta is a conventional model matrix for the baseline covariates β\beta, and the details are explained in the “Baseline covariates” section below. XαX_\alpha is a matrix of zeroes and ones. It is constructed such that each scalar component of α\alpha is the mean response of the control group in a particular study. Likewise, XδX_\delta is a matrix of zeroes and ones such that each scalar component of δ\delta is the mean response of a non-control treatment group in a particular study.

To illustrate, let yijky_{ijk} be patient ii in treatment group jj (where j=1j = 1 is the control group) of study kk, and let (Xββ)ijk\left ( X_\beta \beta \right )_{ijk} be the corresponding scalar element of the vector XββX_\beta \beta. Then,

E(yijk)=I(j=1)αk+I(j>1)δjk+(Xββ)ijk \begin{aligned} E(y_{ijk}) = I (j = 1) \alpha_{k} + I (j > 1) \delta_{jk} + \left ( X_\beta \beta \right )_{ijk} \end{aligned}

This parameterization is represented in the more compact expression Xαα+Xδδ+XββX_\alpha \alpha + X_\delta \delta + X_\beta \beta in the model definitions in this vignette.

Baseline covariates

The baseline covariates model matrix XβX_\beta adjusts for baseline covariates. It may contain a continuous column for baseline and binary indicator columns for the levels of user-defined covariates. All these columns are included if possible, but the method automatically drops baseline covariate columns to ensure that the combined model matrix Xi*=[Xα*Xδ*Xβ*]iX_i^* = \left [ {X_\alpha}^* \quad {X_\delta}^* \quad {X_\beta}^* \right ]_i is full rank. (Here, Xi*X_i^* denotes the rows of matrix XX corresponding to study ii, with additional rows dropped if the corresponding elements of yy are missing. The additional row-dropping based on the missingness of yy ensures identifiability even when the user supplies complicated many-leveled factors as covariates.) The choice of columns to drop from Xβi{X_\beta}_i is determined by the rank and pivoting strategy of the QR decomposition of XiX_i using the Householder algorithm with pivoting (base::qr(), LINPACK routine DQRDC).

Separately within each study, each column of XβX_\beta is centered to have mean 0, and if possible, scaled to have variance 1. Scaling ensures that the priors on parameters β\beta remain relatively diffuse relative to the input data. Study-level centering ensures that the α\alpha parameters truly act as unconditional study-specific control group means (as opposed to conditional on the subset of patients at the reference level of XβX_\beta), and it ensures that borrowing across α\alpha components fully presents as control group borrowing.

Post-processing

The hb_summary() function post-processes the results from the model. It accepts MCMC samples of parameters and returns estimated marginal means of the response and treatment effect. To estimate marginal means of the response, hb_summary() takes group-level averages of posterior samples of fitted values while dropping covariate adjustment terms from the model (i.e. Xαα+XδδX_\alpha \alpha + X_\delta \delta). Because the columns of XβX_\beta are centered at their means, this choice is mathematically equivalent to emmeans::emmeans() with the weights = "proportional" (Lenth (2016)).

Mixture model

Functions:

The mixture model analyzes only the data from the current study, so we use XαmixtureX_\alpha^{\text{mixture}} instead of XαX_\alpha. XαmixtureX_\alpha^{\text{mixture}} is a one-column matrix to indicate which elements of yy are part of the control group of the current study.

The historical studies contribute to the model through hyperparameters (mω)i({m_\omega})_i and (sω)i(s_\omega)_i. If study ii is a historical study, (mω)i({m_\omega})_i and (sω)i(s_\omega)_i are the posterior mean and posterior standard deviation, respectively, of the mean control group response estimated from the simple model described later. If study ii is the current study, (mω)i({m_\omega})_i and (sω)i(s_\omega)_i are chosen so the mixture component Normal((mω)i({m_\omega})_i, (sω)i(s_\omega)_i) of study ii is diffuse and non-informative. Variable ωi\omega_i of study ii is the latent variable of mixture component ii, and the index variable π\pi chooses which ωi\omega_i to use for the current study control group mean α\alpha. Hyperparameter pωp_\omega is a constant vector of prior mixture proportions of each study. The posterior histogram of π\pi gives the posterior mixture proportions.

yijindN((Xαmixtureα+Xδδ+Xββ)ij,σ2)α=ωπωiindNormal((mω)i,(sω)i2)πCategorical(pω)δdindNormal(0,sδ2)βbNormal(0,sβ2)σUniform(0,sσ) \begin{aligned} & y_{ij} \stackrel{\text{ind}}{\sim} \text{N} \left ( \left (X_\alpha^{\text{mixture}} \alpha + X_\delta \delta + X_\beta \beta \right )_{ij}, \ \sigma^2 \right) \\ & \qquad \alpha = \omega_{\pi} \\ & \qquad \qquad \omega_i \stackrel{\text{ind}}{\sim} \text{Normal}(({m_\omega})_i, (s_\omega)_i^2) \\ & \qquad \qquad \pi \sim \text{Categorical}(p_\omega) \\ & \qquad \delta_d \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\ & \qquad \beta_b \sim \text{Normal} (0, s_\beta^2) \\ & \qquad \sigma \sim \text{Uniform}(0, s_\sigma) \end{aligned}

Hierarchical model

Functions:

The hierarchical model is equivalent to the meta-analytic combined (MAC) approach analyzes the data from all studies and shrinks the control group means αi\alpha_i towards a common normal distribution with mean μ\mu and variance τ2\tau^2.

yijN((Xαα+Xδδ+Xββ)ij,σi2)αiindNormal(μ,τ2)μNormal(0,sμ2)τfτδdindNormal(0,sδ2)βbindNormal(0,sβ2)σiindUniform(0,sσ) \begin{aligned} & y_{ij} \sim \text{N} \left( \left (X_\alpha \alpha + X_\delta \delta + X_\beta \beta \right )_{ij} , \ \sigma_i^2 \right )\\ & \qquad \alpha_i \stackrel{\text{ind}}{\sim} \text{Normal}(\mu, \tau^2) \\ & \qquad \qquad \mu \sim \text{Normal}(0, s_\mu^2) \\ & \qquad \qquad \tau \sim f_\tau \\ & \qquad \delta_d \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\ & \qquad \beta_b \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\ & \qquad \sigma_i \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \end{aligned}

The prior fτf_\tau on τ\tau is critically important because:

  1. It controls the prior amount of borrowing, and
  2. The prior has a large influence if there are few historical studies in the data.

fτf_\tau can either be a flexible half-Student-t distribution with dτd_\tau degrees of freedom and scale parameter sτs_\tau:

fτ=Student-t(0,sτ,dτ)+ f_\tau = \text{Student-t}(0, s_\tau, d_\tau)^+ or a uniform distribution with lower bound 0 and upper bound sτs_\tau:

fτ=Uniform(0,sτ) f_\tau = \text{Uniform}(0, s_\tau)

Following the recommendation of Gelman (2006), please use half-Student-t if the number of historical studies is small and consider uniform for large numbers of historical studies.

For the half-Student-t distribution, the role of the sτs_\tau parameter is equivalent to the σ\sigma parameter from the Student-t parameterization in the Stan user manual.

Independent model

Functions:

The independent model is the same as the hierarchical model, but with independent control group parameters α\alpha. We use it as a no-borrowing benchmark to quantify the borrowing strength of the hierarchical model and the mixture model.

yijN((Xαα+Xδδ+Xββ)ij,σi2)αiindNormal(0,sα2)δdindNormal(0,sδ2)βbindNormal(0,sβ2)σiindUniform(0,sσ) \begin{aligned} & y_{ij} \sim \text{N} \left( \left (X_\alpha \alpha + X_\delta \delta + X_\beta \beta \right )_{ij} , \ \sigma_i^2 \right )\\ & \qquad \alpha_i \stackrel{\text{ind}}{\sim} \text{Normal}(0, s_\alpha^2) \\ & \qquad \delta_d \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\ & \qquad \beta_b \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\ & \qquad \sigma_i \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \end{aligned}

Pooled model

Functions:

Like the independent model, the pooled model is a benchmark to quantify the borrowing strength of the hierarchical model and the mixture model. But instead of the no-borrowing independent model, the pooled model represents maximum borrowing. Instead of XαX_\alpha, below, we use XαpoolX_\alpha^{\text{pool}}, which has only one column to indicate which observations belong to any control group. In other words, the α\alpha parameters are pooled, and α\alpha itself is a scalar.

yijN((Xαpoolα+Xδδ+Xββ)ij,σi2)αNormal(0,sα2)δdindNormal(0,sδ2)βbindNormal(0,sβ2)σiindUniform(0,sσ) \begin{aligned} & y_{ij} \sim \text{N} \left( \left ( X_\alpha^{\text{pool}} \alpha + X_\delta \delta + X_\beta \beta \right )_{ij} , \ \sigma_i^2 \right )\\ & \qquad \alpha \sim \text{Normal}(0, s_\alpha^2) \\ & \qquad \delta_d \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\delta^2) \\ & \qquad \beta_b \stackrel{\text{ind}}{\sim} \text{Normal} (0, s_\beta^2) \\ & \qquad \sigma_i \stackrel{\text{ind}}{\sim} \text{Uniform}(0, s_\sigma) \end{aligned}

Simple model

Functions:

The mixture model hyperparameters (mω)i(m_\omega)_i and (sω)i(s_\omega)_i of study ii are obtained by analyzing the control group data of study ii with the simple model below. (mω)i(m_\omega)_i and (sω)i(s_\omega)_i are taken to be the estimated posterior mean and posterior standard deviation, respectively, of μ\mu from this model.

yNormal(μ,σ2)μNormal(0,sμ2)σUniform(0,sσ) \begin{aligned} &y \sim \text{Normal}(\mu, \sigma^2) \\ & \qquad \mu \sim \text{Normal}(0, s_\mu^2) \\ & \qquad \sigma \sim \text{Uniform}(0, s_\sigma) \end{aligned}

Borrowing metrics

The package supports the following metrics to quantify borrowing.

Effective sample size (ESS)

See the hb_ess() function for an implementation.

Neuenschwander et al. (2006) posit a prior effective sample size metric for meta-analytic predictive (MAP) priors. In the original paper, the underlying hierarchical model only uses historical controls, and the hypothetical new study is the current study of interest. In historicalborrow, we adapt this metric to a hierarchical model which also includes both control and treatment data from the current study. We still define NN below to be the number of (non-missing) historical control patients so we can still interpret ESS on the same scale as in the paper.

For the pooled model, define V0V_0 to be the posterior predictive variance of the control mean α*\alpha^* of a hypothetical new unobserved study. According to Neuenschwander et al. (2006), it can be derived as an average of study-specific variances. In practice, we estimate V0V_0 using the average of MCMC samples of 1σi2\frac{1}{\sum \sigma_i^{-2}}.

V0:=Var(α*|y,τ=0)=1σi2 V_0 := \text{Var}(\alpha^* | y, \tau = 0) = \frac{1}{\sum \sigma_i^{-2}}

For the hierarchical model, we define the analogous posterior predictive variance VτV_\tau using the prior distribution.

Vτ:=Var(α*|y)=E[(α*E(α*|y))2|y]p(α*|μ,τ)p(μ,τ|y)dμdτ V_\tau := \text{Var}(\alpha^* | y) = \int E[(\alpha^* - E(\alpha^*|y))^2 | y] \cdot p(\alpha^* | \mu, \tau) \cdot p(\mu, \tau | y) d\mu d\tau

The above integral implies a straightforward method of estimating VτV_\tau using MCMC samples:

  1. For each MCMC sample m=1,,Mm = 1, \ldots, M from the hierarchical model, identify samples μ(m)\mu^{(m)} and τ(m)\tau^{(m)} of μ\mu and τ\tau, respectively.
  2. Draw (α*)m(\alpha^*)^{m} from a Normal(μ(m)\mu^{(m)}, (τ(m))2(\tau^{(m)})^2) distribution.
  3. Estimate VτV_\tau as the variance of the collection (α*)1,(α*)2,,(α*)M(\alpha^*)^{1}, (\alpha^*)^{2}, \ldots, (\alpha^*)^{M} from (2).

Next, define NN as the number of non-missing control patients from the historical studies only. Given NN, V0V_0, and VτV_\tau, define the effective sample size as:

ESS:=NV0Vτ \text{ESS} := N \frac{V_0}{V_\tau}

V0Vτ\frac{V_0}{V_\tau} is a weight which quantifies the fraction of historical information that the hierarchical model leverages for borrowing. Notably, the weight should be 1 if the hierarchical and pooled model exhibit the same strength of borrowing. Multiplied by NN, the quantity becomes a heuristic for the strength of borrowing of the hierarchical model, measured in terms of the number of historical patients.

Precision ratio (hierarchical model only)

The precision ratio is an experimental ad hoc metric and should be used with caution. It is implemented in the hb_summary() function for the hierarchical model.

The precision ratio compares the prior precision of a control mean response (an α\alpha component, numerator) to the analogous precision of the full conditional distribution (denominator). The former is 1τ2\frac{1}{\tau^2}, and the latter is 1τ2+nσ2\frac{1}{\tau^2} + \frac{n}{\sigma^2}. Here, nn is the number of non-missing patients in the current study, σ2\sigma^2 is the residual variance, and τ2\tau^2 is the variance of study-specific control means (components of α\alpha). The full precision ratio is:

1τ21τ2+nσ2 \begin{aligned} \frac{\frac{1}{\tau^2}}{\frac{1}{\tau^2} + \frac{n}{\sigma^2}} \end{aligned}

The precision ratio comes from the conditional distribution of αk\alpha_k in the hierarchical model given the other parameters and the data. More precisely, in this conditional distribution, the mean is a weighted average between the prior mean and data mean, and the precision ratio is the weight on the prior mean. This can be seen in a simpler case with a Bayesian model with a normal data model, a normal prior on the mean, and known constant variance. For details, see Chapter 2 of Gelman et al. (2020).

Variance shift ratio

The variance shift ratio is an experimental ad hoc metric and should be used with caution. It is implemented in the legacy hb_metrics() function.

Let VmV_m be the estimated posterior variance of αI\alpha_I (current study control group response mean) estimated by model mm. The variance shift ratio is:

Vm*VindependentVpoolVindependent \begin{aligned} \frac{V_{m*} - V_{\text{independent}}}{V_{\text{pool}} - V_{\text{independent}}} \end{aligned}

where m*m* is a historical borrowing model like the mixture model or hierarchical model.

Mean shift ratio (legacy)

The mean shift ratio is not recommended to measure the strength of borrowing. Rather, it is an informal ad hoc measure of the lack of commensurability between the current and historical data sources. It is implemented in the legacy hb_metrics() function.

To define the mean shift ratio, let θm\theta_m be the posterior mean control group response estimated by model mm. The mean shift ratio is:

θm*θindependentθpoolθindependent \begin{aligned} \frac{\theta_{m*} - \theta_{\text{independent}}}{\theta_{\text{pool}} - \theta_{\text{independent}}} \end{aligned}

where m*m* is a historical borrowing model like the mixture model or hierarchical model.

Posterior mixture proportions (mixture model only)

The posterior mixture proportion of study ii is P(π=i)P(\pi = i), and it is obtained by averaging posterior samples of π\pi.

References

Gelman, A. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models.” Bayesian Analysis 1 (3): 515–43. https://doi.org/10.1214/06-BA117A.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2020. Bayesian Data Analysis. 3rd ed. CRC Press.
Lenth, Russell V. 2016. “Least-Squares Means: The r Package Lsmeans.” Journal of Statistical Software 69 (1): 1–33. https://doi.org/10.18637/jss.v069.i01.
Neuenschwander, B., G. Capkun-Niggli, M. Branson, and D. J. Spiegelhalter. 2006. “Summarizing Historical Information on Controls in Clinical Trials.” Bayesian Analysis 1 (3): 515–43. https://doi.org/10.1214/06-BA117A.