Home > Statistics > Comparing transmissibility of Omicron lineages

Comparing transmissibility of Omicron lineages

Monitoring lineages of the Omicron variant of the SARS-CoV-2 virus continues to be an important health consideration. The World Health Organization identifies BA.1, BA.1.1, and the most recent BA.2 as the most common sublineages. A recent study from Japan, Yamasoba et al. (2022), compares, among other characteristics, the transmissibility of these three Omicron lineages with the latest Delta variant. It identifies BA.2 to have the highest transmissibility of the four. Preprint of the study is available at bioarxiv.org. One interesting aspect of the study is the application of Bayesian multilevel models for representing lineage growth dynamics. In this post, I demonstrate how to use Stata’s bayesmh and bayesstats summary commands to perform similar analysis.

The datasets are available at github.com. The data were compiled from the GISAID database, https://www.gisaid.org/hcov19-variants/, on February 1, 2022.

Observations on the number of cases for each lineage are collected for 11 countries in a period of 117 days, from October 1, 2021, to January 25, 2022.

. use omicron_ba2

. summarize

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         day |      1,185    58.18312     33.1129          1        117
        yba1 |      1,185    377.2878     1301.07          0      10251
       yba11 |      1,185    168.5561    685.6369          0       7378
        yba2 |      1,185    23.60759    128.7147          0       1481
      ydelta |      1,185     1224.78    2513.555          0      12549
-------------+---------------------------------------------------------
     country |      1,185     6.03038    3.245177          1         11

The variables yba1, yba11, yba2, and ydelta correspond, respectively, to the number of cases of BA.1, BA.1.1, and BA.2 lineages and the Delta variant. The variables day and country identify the day and country, respectively. The following table shows the country codes and the number of observed days for each country.

Id Country Days
1 Austria 105
2 Denmark 116
3 Germany 116
4 India 117
5 Israel 117
6 Philippines 43
7 Singapore 115
8 South Africa 110
9 Swiden 112
10 United Kingdom 117
11 USA 117

For some countries, the data are incomplete. In particular, for the Philippines, data are available for only 43 days, which, as we will see later, affects the analysis.

Bayesian multilevel multinomial model

In each of the 11 countries, the viral lineages are recorded as the number of cases on the observed days. To formally specify the model, let us denote variables yba1, yba11, yba2, and ydelta, respectively, as \(y_{1ct}\), \(y_{2ct}\), \(y_{3ct}\), and \(y_{4ct}\), where \(c\) is the country code and \(t\) is the observation time. \(y_{lct}\) is assumed to follow a multinomial distribution with total number of cases \(n_{ct}\). The probability of an occurrence of a lineage, \(\theta_{lct}\), for lineage \(l\), country \(c\), and time \(t\) is given by the 4-dimensional logistic function (softmax) applied to linear functions \(a_{lc} + b_{lc}\times t\), where \(a_{lc}\) and, \(b_{lc}\) are random-effect parameters.

More formally, for lineages \(l=1,\dots,4\) and countries \(c=1,\dots,11\), we have
$$
y_{lct} \sim {\rm Multinomial}(n_{ct}, \theta_{lct})\\
\theta_{lct} = \frac{{\rm exp}(\mu_{lct})}{\sum_{k=1}^4 {\rm exp}(\mu_{kct})}\hspace{.5cm}\\
\mu_{lct} = a_{lc} + b_{lc} \times t \hspace{1.5cm}
$$

The probability of observing counts \(n_1\) through \(n_4\), where \(n_1+n_2+n_3+n_4=n_{ct}\), is
$$
P(y_{1ct}=n_1, y_{2ct}=n_2, y_{3ct}=n_3, y_{4ct}=n_4) =
\frac{n_{ct}!}{n_1!n_2!n_3!n_4!}
\theta_{1ct}^{n_1}\theta_{2ct}^{n_2}\theta_{3ct}^{n_3}\theta_{4ct}^{n_4}
$$
For identifiability, I set the random parameters for the BA.1 lineage, \(a_{1c}\) and \(b_{1c}\), to 0. In this case, we have
$$
\theta_{1ct} = \frac{1}{1+\sum_{k=2}^4 {\rm exp}(\mu_{kct})}\\
\theta_{2ct} = \frac{{\rm exp}(\mu_{2ct})}{1+\sum_{k=2}^4 {\rm exp}(\mu_{kct})}\\
\theta_{3ct} = \frac{{\rm exp}(\mu_{3ct})}{1+\sum_{k=2}^4 {\rm exp}(\mu_{kct})}\\
\theta_{4ct} = \frac{{\rm exp}(\mu_{4ct})}{1+\sum_{k=2}^4 {\rm exp}(\mu_{kct})}
$$
and the \(lct\)-observation of the log likelihood is
$$
{\rm const} + y_{2ct}\times\mu_{2ct} + y_{3ct}\times\mu_{3ct} + y_{4ct}\times\mu_{4ct} \\
\hspace{1cm}- n_{ct}\times\{1+{\rm exp}(\mu_{2ct})+ {\rm exp}(\mu_{3ct})+ {\rm exp}(\mu_{4ct})\}
$$
For the purpose of sampling the posterior model, we can ignore the constant terms.

The authors propose Student’s \(t\)-distributions with 6 degrees of freedom for the random slope parameters \(b_{lc}\) to reduce the effects of outliers,
$$
b_{lc} \sim t_6(\beta_l, \sigma_l^2)
$$
and flat priors for the random intercepts \(a_{lc}\) and hyperparameters \(\beta_l\) and \(\sigma_l^2\). Instead, to have a proper prior model, I apply weakly informative priors: \(N(0, 100)\) for random intercepts \(a_{lc}\)s and random slopes \(\beta_{l}\)s and \({\rm InvGamma}(0.1,0.1)\) for the variance parameters \(\sigma_l^2\)s.

To compute the log likelihood, I define a new variable, ytotal, for the total number cases \(n_{ct}\).

. gen double ytotal = yba1+yba11+yba2+ydelta

Because we don’t have a built-in distribution in bayesmh for this likelihood model, I use the llf() option to specify it according to the above formula. The llf() option requires a univariate model specification, so I use ytotal as a dependent variable representing the vector of counts \((y_{1ct},y_{2ct},y_{3ct},y_{4ct})\).

The linear combinations \(\mu_{2ct}\), \(\mu_{3ct}\), and \(\mu_{4ct}\) are specified using the define() option. The latter include random intercepts A2[country], A3[country], and A4[country] and random slopes B2[country], B3[country], and B4[country]. The random slopes describe the transmissibility of the lineages in comparison with BA.1.

To improve the sampling, I block some of the parameters and provide initial values within the domain of the posterior.

. bayesmh ytotal, noconstant likelihood(llf(                      
>         yba11*{mu2:}+yba2*{mu3:}+ydelta*{mu4:} -                
>         ytotal*ln(1+exp({mu2:})+exp({mu3:})+exp({mu4:}))))      
>         define(mu2: A2[country] B2[country]#c.day, noconstant)  
>         define(mu3: A3[country] B3[country]#c.day, noconstant)  
>         define(mu4: A4[country] B4[country]#c.day, noconstant)  
>         prior({A2}, normal(0, 100))                             
>         prior({A3}, normal(0, 100))                             
>         prior({A4}, normal(0, 100))                             
>         prior({B2}, t({beta2}, {sigma2}, 6))                    
>         prior({B3}, t({beta3}, {sigma3}, 6))                    
>         prior({B4}, t({beta4}, {sigma4}, 6))                    
>         prior({sigma2 sigma3 sigma4}, igamma(0.1, 0.1) split)   
>         prior({beta2 beta3 beta4}, normal(0, 100) split)        
>         block({beta2 sigma2})                                   
>         block({beta3 sigma3}) block({beta4 sigma4})             
>         init({beta2 beta3 beta4} rnormal(1, 1)                  
>                 {sigma2 sigma3 sigma4} 1) rseed(17)


Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........
> 5000.........6000.........7000.........8000.........9000.........10000 done

Model summary
------------------------------------------------------------------------------
Likelihood:
  ytotal ~ logdensity()

Hyperpriors:
  {A2[country] A3[country] A4[country]} ~ normal(0,100)
                          {B2[country]} ~ t({beta2},{sigma2},6)
                          {B3[country]} ~ t({beta3},{sigma3},6)
                          {B4[country]} ~ t({beta4},{sigma4},6)
                 {sigma2 sigma3 sigma4} ~ igamma(0.1,0.1)
                    {beta2 beta3 beta4} ~ normal(0,100)

Expression:
  expr4 : yba11*xb_mu2+yba2*xb_mu3+ydelta*xb_mu4-ytotal*ln(1+exp(xb_mu2)
          +exp(xb_mu3)+exp(xb_mu4))
------------------------------------------------------------------------------

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      1,185
                                                 Acceptance rate  =      .2528
                                                 Efficiency:  min =     .02015
                                                              avg =     .06866
Log marginal-likelihood                                       max =     .09676

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
       beta2 |  .0271004     .04504   .001448    .026472  -.0587521   .1217648
      sigma2 |  .0245979   .0132208   .000554   .0207922    .010104   .0613864
       beta3 |  .1527336   .0530795   .001722   .1533625   .0424005   .2561617
      sigma3 |  .0304257   .0174173   .000794   .0260487   .0112527   .0827476
       beta4 | -.1673996   .0474555   .001539  -.1678558  -.2638208  -.0749281
      sigma4 |  .0264834   .0164447   .001159   .0227708    .010781   .0628222
------------------------------------------------------------------------------

There are no obvious convergence issues with the model, and the 7% average sampling efficiency for the hyperparameters is good.

The positive estimate for beta2 and beta3 suggests that BA.1.1 and BA.2 lineages are both more transmittable than BA.1. On the other hand, the posterior mean estimate for beta4 is negative, \(-0.17\), implying less transmittability of Delta in comparison with the original Omicron BA.1.

Relative effective reproduction by country

The authors of the study calculate the relative effective reproduction number according to the formula
$$
r_{lc} = {\rm exp}(\gamma\beta_{lc})
$$
and the average relative effective reproduction number across countries according to
$$
r_{l} = {\rm exp}(\gamma\beta_{l})
$$
where \(\gamma\) is the viral generation time, 2.1 days; see Estimating Generation Time of Omicron.

We estimate the reproduction numbers \(r_l\) using the bayesstats summary command. Because we assumed BA.1 lineage to be the baseline, the reproduction numbers are relative to BA.1.

. bayesstats summary (rba11:exp(2.1*{beta2})) 
>         (rba2:exp(2.1*{beta3})) (rdelta:exp(2.1*{beta4}))

Posterior summary statistics                      MCMC sample size =    10,000

       rba11 : exp(2.1* { beta2 } )
        rba2 : exp(2.1* { beta3 } )
      rdelta : exp(2.1* { beta4 } )

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
       rba11 |  1.063311   .1011051   .003258   1.057165   .8839283   1.291373
        rba2 |   1.38671   .1545093    .00502   1.379969   1.093126   1.712475
      rdelta |  .7070996   .0705203   .002292   .7029306    .574633   .8544061
------------------------------------------------------------------------------

Based on posterior mean estimates, the reproduction of BA.1.1 is about 6% higher, BA.2 is about 39% higher, and Delta is about 30% lower than that of BA.1. These results agree with the findings in the paper that BA.2 has 1.4-fold higher effective reproduction than that of BA.1.

Below, I show caterpillar plots for the relative effective reproduction numbers by country. Plotted are the posterior mean values along with 95% credible intervals.

The first plot compares BA.1.1 with BA.1.

. gen id = mod(_n-1, 11)+1

. quietly bayesstats summary (exp(2.1*{B2[country]}))

. mat bsum = r(summary)

. mat bsum = (bsum[1..11,1], bsum[1..11,5], bsum[1..11,6])

. svmat bsum

. twoway scatter bsum1 id || rspike bsum3 bsum2 id, legend(off)   
>         xla(1 2 3 4 5 6 7 8 9 10 11, valuelabel) xsc(r(1 11))   
>         xtitle("") title("BA.1.1 vs BA.1")

. drop bsum*

graph1

All the values are greater than 1 except for country 1, Austria. Also apparent is the wide credible interval for country 6, Philippines, due, probably, to the much fewer observation days available, only 43, in comparison with the other countries, 110 or more.

The second plot compares BA.2 with BA.1.

. quietly bayesstats summary (exp(2.1*{B3[country]}))

. mat bsum = r(summary)

. mat bsum = (bsum[1..11,1], bsum[1..11,5], bsum[1..11,6])

. svmat bsum

. twoway scatter bsum1 id || rspike bsum3 bsum2 id, legend(off)   
>         xla(1 2 3 4 5 6 7 8 9 10 11, valuelabel) xsc(r(1 11))   
>         xtitle("") title("BA.2 vs BA.1")

. drop bsum*

graph1

Again, the reproduction number for the Philippines has higher variability and tends to shift the average across-country reproduction upward.

The third plot compares Delta with BA.1.

. quietly bayesstats summary (exp(2.1*{B4[country]}))

. mat bsum = r(summary)

. mat bsum = (bsum[1..11,1], bsum[1..11,5], bsum[1..11,6])

. svmat bsum

. twoway scatter bsum1 id || rspike bsum3 bsum2 id, legend(off)   
>         xla(1 2 3 4 5 6 7 8 9 10 11, valuelabel) xsc(r(1 11))   
>         xtitle("") title("Delta vs BA.1")

. drop bsum*

graph1

As we might anticipate, all reproduction numbers are less than 1, and the reproduction number for the Philippines has higher variability than the others.

Sensitivity analysis

The authors of the study use Student’s \(t\)-prior with 6 degrees of freedom for the random slopes \(b_{lc}\). It is interesting to check whether changing the prior affects the results in any substantive way.

Below, I fit models using \(t\)-priors with 2 and 12 degrees of freedom for \(b_{lc}\).

. quietly bayesmh ytotal, noconstant likelihood(llf(              
>         yba11*{mu2:}+yba2*{mu3:}+ydelta*{mu4:} -                
>         ytotal*ln(1+exp({mu2:})+exp({mu3:})+exp({mu4:}))))      
>         define(mu2: A2[country] B2[country]#c.day, noconstant)  
>         define(mu3: A3[country] B3[country]#c.day, noconstant)  
>         define(mu4: A4[country] B4[country]#c.day, noconstant)  
>         prior({A2}, normal(0, 100))                             
>         prior({A3}, normal(0, 100))                             
>         prior({A4}, normal(0, 100))                             
>         prior({B2}, t({beta2}, {sigma2}, 2))                    
>         prior({B3}, t({beta3}, {sigma3}, 2))                    
>         prior({B4}, t({beta4}, {sigma4}, 2))                    
>         prior({sigma2}{sigma3}{sigma4}, igamma(0.1, 0.1) split) 
>         prior({beta2}{beta3}{beta4}, normal(0, 100) split)      
>         block({beta2}{sigma2})                                  
>         block({beta3}{sigma3}) block({beta4}{sigma4})           
>         init({beta2}{beta3}{beta4} rnormal(1, 1)                
>                 {sigma2}{sigma3}{sigma4} 1)                     
>         burnin(5000) rseed(17)

. bayesstats summary (r2:exp(2.1*{beta2})) 
>         (r3:exp(2.1*{beta3})) (r4:exp(2.1*{beta4}))

Posterior summary statistics                      MCMC sample size =    10,000

          r2 : exp(2.1* { beta2 } )
          r3 : exp(2.1* { beta3 } )
          r4 : exp(2.1* { beta4 } )

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
          r2 |  1.067439   .0938958   .003041   1.060892   .8874384   1.262244
          r3 |  1.394932   .1392574   .006867   1.381163   1.163633   1.702113
          r4 |   .700393   .0654377   .001962   .6977034   .5754148   .8349086
------------------------------------------------------------------------------

. quietly bayesmh ytotal, noconstant likelihood(llf(              
>         yba11*{mu2:}+yba2*{mu3:}+ydelta*{mu4:} -                
>         ytotal*ln(1+exp({mu2:})+exp({mu3:})+exp({mu4:}))))      
>         define(mu2: A2[country] B2[country]#c.day, noconstant)  
>         define(mu3: A3[country] B3[country]#c.day, noconstant)  
>         define(mu4: A4[country] B4[country]#c.day, noconstant)  
>         prior({A2}, normal(0, 100))                             
>         prior({A3}, normal(0, 100))                             
>         prior({A4}, normal(0, 100))                             
>         prior({B2}, t({beta2}, {sigma2}, 12))                   
>         prior({B3}, t({beta3}, {sigma3}, 12))                   
>         prior({B4}, t({beta4}, {sigma4}, 12))                   
>         prior({sigma2}{sigma3}{sigma4}, igamma(0.1, 0.1) split) 
>         prior({beta2}{beta3}{beta4}, normal(0, 100) split)      
>         block({beta2}{sigma2})                                  
>         block({beta3}{sigma3}) block({beta4}{sigma4})           
>         init({beta2}{beta3}{beta4} rnormal(1, 1)                
>                 {sigma2}{sigma3}{sigma4} 1)                     
>         burnin(5000) rseed(17)

. bayesstats summary (r2:exp(2.1*{beta2})) 
>         (r3:exp(2.1*{beta3})) (r4:exp(2.1*{beta4}))

Posterior summary statistics                      MCMC sample size =    10,000

          r2 : exp(2.1* { beta2 } )
          r3 : exp(2.1* { beta3 } )
          r4 : exp(2.1* { beta4 } )

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
          r2 |  1.068262   .1032594    .00325   1.061218   .8759179   1.302069
          r3 |  1.414598     .15648   .005184    1.40231    1.14427   1.765012
          r4 |  .7024785   .0725748   .001915   .6985938    .574165   .8566458
------------------------------------------------------------------------------

I also fit a model with normal priors for the random slopes.

. quietly bayesmh ytotal, noconstant likelihood(llf(              
>         yba11*{mu2:}+yba2*{mu3:}+ydelta*{mu4:} -                
>         ytotal*ln(1+exp({mu2:})+exp({mu3:})+exp({mu4:}))))      
>         define(mu2: A2[country] B2[country]#c.day, noconstant)  
>         define(mu3: A3[country] B3[country]#c.day, noconstant)  
>         define(mu4: A4[country] B4[country]#c.day, noconstant)  
>         prior({A2}, normal(0, 100))                             
>         prior({A3}, normal(0, 100))                             
>         prior({A4}, normal(0, 100))                             
>         prior({B2}, normal({beta2}, {sigma2}))                  
>         prior({B3}, normal({beta3}, {sigma3}))                  
>         prior({B4}, normal({beta4}, {sigma4}))                  
>         prior({sigma2}{sigma3}{sigma4}, igamma(0.1, 0.1) split) 
>         prior({beta2}{beta3}{beta4}, normal(0, 100) split)      
>         block({beta2}{sigma2}{beta3}{sigma3}{beta4}{sigma4},    
>                 split gibbs)                                    
>         init({beta2}{beta3}{beta4} rnormal(1, 1)                
>                 {sigma2}{sigma3}{sigma4} 1) rseed(17)

. bayesstats summary (r2:exp(2.1*{beta2})) 
>         (r3:exp(2.1*{beta3})) (r4:exp(2.1*{beta4}))

Posterior summary statistics                      MCMC sample size =    10,000

          r2 : exp(2.1* { beta2 } )
          r3 : exp(2.1* { beta3 } )
          r4 : exp(2.1* { beta4 } )

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
          r2 |  1.066153   .1080605   .001081   1.061309   .8664183   1.292496
          r3 |  1.405522    .159427   .010062    1.39786   1.120683   1.746765
          r4 |  .7089474    .075455   .000755   .7048609   .5734278   .8714144
------------------------------------------------------------------------------

As evident from the summaries, there are no substantive changes in the reproduction number estimates, which shows that our initial model is robust to the prior specification of random slopes.

As we saw above, the reproduction number estimates for the Philippines have high variability because of the smaller sample. It is interesting to check how these estimates change after excluding the Philippines from the estimation sample.

. quietly bayesmh ytotal if country != 6, noconstant likelihood(llf(      
>         yba11*{mu2:}+yba2*{mu3:}+ydelta*{mu4:} -                        
>         ytotal*ln(1+exp({mu2:})+exp({mu3:})+exp({mu4:}))))              
>         define(mu2: A2[country] B2[country]#c.day, noconstant)          
>         define(mu3: A3[country] B3[country]#c.day, noconstant)          
>         define(mu4: A4[country] B4[country]#c.day, noconstant)          
>         prior({A2}, normal(0, 100))                                     
>         prior({A3}, normal(0, 100))                                     
>         prior({A4}, normal(0, 100))                                     
>         prior({B2}, t({beta2}, {sigma2}, 6))                            
>         prior({B3}, t({beta3}, {sigma3}, 6))                            
>         prior({B4}, t({beta4}, {sigma4}, 6))                            
>         prior({sigma2}{sigma3}{sigma4}, igamma(0.1, 0.1) split)         
>         prior({beta2}{beta3}{beta4}, normal(0, 100) split)              
>         block({beta2}{sigma2})                                          
>         block({beta3}{sigma3}) block({beta4}{sigma4})                   
>         init({sigma2}{sigma3}{sigma4} 1)                                
>         burnin(5000) rseed(17)

. bayesstats summary (rba11:exp(2.1*{beta2})) 
>         (rba2:exp(2.1*{beta3})) (rdelta:exp(2.1*{beta4}))

Posterior summary statistics                      MCMC sample size =    10,000

       rba11 : exp(2.1* { beta2 } )
        rba2 : exp(2.1* { beta3 } )
      rdelta : exp(2.1* { beta4 } )

------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
       rba11 |  1.101936   .1422373   .006638   1.089014   .8338979   1.415828
        rba2 |  1.313135   .1978823   .006081   1.302977   .9514345   1.731837
      rdelta |  .7242147   .1055362   .010449   .7109377   .5368485   .9596856
------------------------------------------------------------------------------

Now the posterior estimates for the 3 relative reproduction numbers all get closer to 1. In particular, the reproduction of BA.2 in comparison with BA.1.1, in terms of posterior mean, drops from 1.4 to 1.3, thus suggesting more moderate increase in transmissibility. In addition, the 95% credible intervals for the relative reproduction of both BA.1.1 and BA.2 now include 1.

Reference

Yamasoba, D., I. Kimura, H. Nasser, Y. Morioka, N. Nao, J. Ito, K. Uriu, et al. 2022. Virological characteristics of SARS-CoV-2 BA.2 variant. https://doi.org/10.1101/2022.02.14.480335.