Archive

Author Archive

Fitting ordered probit models with endogenous covariates with Stata’s gsem command


The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.

 

Parameterizations for an ordinal probit model

 
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).

\[y^*_{i} = x_{i}\gamma + u_i\]

The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:

\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]

Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):

\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]

that is,

\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]

In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.

Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.

. sysuse auto, clear
(1978 Automobile Data)

. oprobit rep mpg disp , nolog

Ordered probit regression                         Number of obs   =         69
                                                  LR chi2(2)      =      14.68
                                                  Prob > chi2     =     0.0006
Log likelihood = -86.352646                       Pseudo R2       =     0.0783

------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
-------------+----------------------------------------------------------------
       /cut1 |  -1.570496   1.146391                      -3.81738    .6763888
       /cut2 |  -.7295982   1.122361                     -2.929386     1.47019
       /cut3 |   .6580529   1.107838                     -1.513269    2.829375
       /cut4 |    1.60884   1.117905                     -.5822132    3.799892
------------------------------------------------------------------------------

. gsem (rep <- mpg disp, oprobit), nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.352646

--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
  displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -1.570496   1.146391    -1.37   0.171     -3.81738    .6763888
         /cut2 |  -.7295982   1.122361    -0.65   0.516    -2.929386     1.47019
         /cut3 |   .6580529   1.107838     0.59   0.553    -1.513269    2.829375
         /cut4 |    1.60884   1.117905     1.44   0.150    -.5822132    3.799892
--------------------------------------------------------------------------------

. local a = sqrt(3)

. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.353008

 ( 1)  [rep78]L = 1.732051
 ( 2)  [var(L)]_cons = 1
--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |    .099532     .07113     1.40   0.162    -.0398802    .2389442
  displacement |  -.0059739   .0043002    -1.39   0.165    -.0144022    .0024544
             L |   1.732051  (constrained)
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -3.138491   2.293613    -1.37   0.171     -7.63389    1.356907
         /cut2 |  -1.456712   2.245565    -0.65   0.517    -5.857938    2.944513
         /cut3 |   1.318568    2.21653     0.59   0.552     -3.02575    5.662887
         /cut4 |   3.220004   2.236599     1.44   0.150     -1.16365    7.603657
---------------+----------------------------------------------------------------
         var(L)|          1  (constrained)
--------------------------------------------------------------------------------

 

Ordinal probit model with endogenous covariates

 
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,

\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]

where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).

We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),

\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]

 

The parameterization we will use

 
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:

\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]

where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)

\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]

By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),

\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]

This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).

The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:

\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]

\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].

\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]

Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.

 

Simulating data, and performing the estimation

 

clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
    gen x`i' =2* rnormal() + _n/1000
}

mat C = [1,.5 \ .5, 1]
drawnorm z1 z2, cov(C)

gen y2 = 0
forvalues i = 1(1)5 {
    replace y2 = y2 + x`i'
}
replace y2 = y2 + z2

gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2

gen y1 = 4
replace y1 = 3 if xb1 + z1 <=.8
replace y1 = 2 if xb1 + z1 <=.3
replace y1 = 1 if xb1 + z1 <=-.3
replace y1 = 0 if xb1 + z1 <=-.8

gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)

local y1 y1
local y2 y2

local xaux  x1 x2 x3 x4 x5
local xmain  y2 x1 x2

local s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
    local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}

foreach v in `xaux' _cons {
    local trans `trans' (`y2'_`v': _b[`y2':`v'])
}

qui tab `y1' if e(sample)
local ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
    local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}

local s1 sqrt(  _b[var(e.`y2'):_cons]  +_b[`y1':L]^2)

local trans `trans' (sig_2: `s1')
local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'

 

Results

 
This is the output from gsem:

Generalized structural equation model             Number of obs   =      10000
Log likelihood = -14451.117

 ( 1)  [y1]L - [y2]L = 0
 ( 2)  [var(L)]_cons = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <-        |
          y2 |   1.379511   .0775028    17.80   0.000     1.227608    1.531414
          x1 |   1.355687   .0851558    15.92   0.000     1.188785    1.522589
          x2 |   1.346323   .0833242    16.16   0.000      1.18301    1.509635
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
-------------+----------------------------------------------------------------
y2 <-        |
          x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
          x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
          x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
          x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
          x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
       _cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
-------------+----------------------------------------------------------------
y1           |
       /cut1 |  -1.131155   .1157771    -9.77   0.000    -1.358074   -.9042358
       /cut2 |  -.5330973   .1079414    -4.94   0.000    -.7446585    -.321536
       /cut3 |   .2722794   .1061315     2.57   0.010     .0642654    .4802933
       /cut4 |     .89394   .1123013     7.96   0.000     .6738334    1.114047
-------------+----------------------------------------------------------------
       var(L)|          1  (constrained)
-------------+----------------------------------------------------------------
    var(e.y2)|   .3823751    .074215                      .2613848    .5593696
------------------------------------------------------------------------------

These are the results we obtain when we transform the values reported by gsem to the original parameterization:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y1_y2 |   1.088455   .0608501    17.89   0.000     .9691909    1.207719
       y1_x1 |   1.069657   .0642069    16.66   0.000      .943814    1.195501
       y1_x2 |   1.062269   .0619939    17.14   0.000      .940763    1.183774
       y2_x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
       y2_x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
       y2_x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
       y2_x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
       y2_x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
    y2__cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
       cut_1 |   -.892498   .0895971    -9.96   0.000    -1.068105   -.7168909
       cut_2 |  -.4206217   .0841852    -5.00   0.000    -.5856218   -.2556217
       cut_3 |   .2148325   .0843737     2.55   0.011     .0494632    .3802018
       cut_4 |    .705332   .0905974     7.79   0.000     .5277644    .8828997
       sig_2 |   .9943267    .007031   141.42   0.000     .9805462    1.008107
      rho_12 |   .4811176   .0477552    10.07   0.000     .3875191     .574716
------------------------------------------------------------------------------

The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.

A tip to debug your nl/nlsur function evaluator program

If you have a bug in your evaluator program, nl will produce, most probably, the following error:

your program returned 198
verify that your program is a function evaluator program
r(198);

The error indicates that your program cannot be evaluated.

The best way to spot any issues in your evaluator program is to run it interactively. You just need to define your sample (usually observations where none of the variables are missing), and a matrix with values for your parameters. Let me show you an example with nlces2. This is the code to fit the CES production function, from the documentation for the nl command:

cscript
program nlces2
version 12
syntax varlist(min=3 max=3) if, at(name)
local logout : word 1 of `varlist'
local capital : word 2 of `varlist'
local labor : word 3 of `varlist'
// Retrieve parameters out of at matrix
tempname b0 rho delta
scalar `b0' = `at'[1, 1]
scalar `rho' = `at'[1, 2]
scalar `delta' = `at'[1, 3]
tempvar kterm lterm
generate double `kterm' = `delta'*`capital'^(-1*`rho') `if'
generate double `lterm' = (1-`delta')*`labor'^(-1*`rho') `if'
// Fill in dependent variable
replace `logout' = `b0' - 1/`rho'*ln(`kterm' + `lterm') `if'
end

webuse production, clear
nl ces2 @ lnoutput capital labor, parameters(b0 rho delta) ///
               initial(b0 0 rho 1 delta 0.5)

Now, let me show you how to run it interactively:

webuse production, clear
*generate a variable to restrict my sample to observations
*with non-missing values in my variables
egen u = rowmiss(lnoutput capital labor)

*generate a matrix with parameters where I will evaluate my function
mat M = (0,1,.5)
gen nloutput_new = 1 
nlces2 nloutput_new capital labor if u==0, at(M)  

This will evaluate the program only once, using the parameters in matrix M. Notice that I generated a new variable to use as my dependent variable. This is because the program nlces2, when run by itself, will modify the dependent variable.
When you run this program by itself, you will obtain a more specific error message. You can add debugging code to this program, and you can also use the trace setting to see how each step is executed. Type help trace to learn about this setting.

Another possible source of error (which will generate error r(480) when run from nl) is when an evaluator function produces missing values for observations in the sample. If this is the case, you will see those missing values in the variable nloutput_new, i.e., in the variable you entered as dependent when running your evaluator by itself. You can then add debugging code, for example, using codebook or summarize to examine the different parts that contribute to the substitution performed in the dependent variable.

For example, after the line that generates `kterm’, I could write

summarize `kterm' if u == 0

to see if this variable contains any missing values in my sample.

This method can also be used to debug your function evaluator programs for nlsur. In order to preserve your dataset, you need to use copies for all the dependent variables in your model.

Categories: Programming Tags: , , ,

Positive log-likelihood values happen

From time to time, we get a question from a user puzzled about getting a positive log likelihood for a certain estimation. We get so used to seeing negative log-likelihood values all the time that we may wonder what caused them to be positive.

First, let me point out that there is nothing wrong with a positive log likelihood.

The likelihood is the product of the density evaluated at the observations. Usually, the density takes values that are smaller than one, so its logarithm will be negative. However, this is not true for every distribution.

For example, let’s think of the density of a normal distribution with a small standard deviation, let’s say 0.1.

. di normalden(0,0,.1)
3.9894228

This density will concentrate a large area around zero, and therefore will take large values around this point. Naturally, the logarithm of this value will be positive.

. di log(3.9894228)
1.3836466

In model estimation, the situation is a bit more complex. When you fit a model to a dataset, the log likelihood will be evaluated at every observation. Some of these evaluations may turn out to be positive, and some may turn out to be negative. The sum of all of them is reported. Let me show you an example.

I will start by simulating a dataset appropriate for a linear model.

clear
program drop _all
set seed 1357
set obs 100
gen x1 = rnormal()
gen x2 = rnormal()
gen y = 2*x1 + 3*x2 +1 + .06*rnormal()

I will borrow the code for mynormal_lf from the book Maximum Likelihood Estimation with Stata (W. Gould, J. Pitblado, and B. Poi, 2010, Stata Press) in order to fit my model via maximum likelihood.

program mynormal_lf
        version 11.1
        args lnf mu lnsigma
        quietly replace `lnf' = ln(normalden($ML_y1,`mu',exp(`lnsigma')))
end

ml model lf  mynormal_lf  (y = x1 x2) (lnsigma:)
ml max, nolog

The following table will be displayed:

.   ml max, nolog

                                                  Number of obs   =        100
                                                  Wald chi2(2)    =  456919.97
Log likelihood =  152.37127                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |   
          x1 |   1.995834    .005117   390.04   0.000     1.985805    2.005863
          x2 |   3.014579   .0059332   508.08   0.000      3.00295    3.026208
       _cons |   .9990202   .0052961   188.63   0.000       .98864      1.0094
-------------+----------------------------------------------------------------
lnsigma      |  
       _cons |  -2.942651   .0707107   -41.62   0.000    -3.081242   -2.804061
------------------------------------------------------------------------------

We can see that the estimates are close enough to our original parameters, and also that the log likelihood is positive.

We can obtain the log likelihood for each observation by substituting the estimates in the log-likelihood formula:

. predict double xb

. gen double lnf = ln(normalden(y, xb, exp([lnsigma]_b[_cons])))

. summ lnf, detail

                             lnf
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -1.360689      -1.574499
 5%    -.0729971       -1.14688
10%     .4198644      -.3653152       Obs                 100
25%     1.327405      -.2917259       Sum of Wgt.         100

50%     1.868804                      Mean           1.523713
                        Largest       Std. Dev.      .7287953
75%     1.995713       2.023528
90%     2.016385       2.023544       Variance       .5311426
95%     2.021751       2.023676       Skewness      -2.035996
99%     2.023691       2.023706       Kurtosis       7.114586

. di r(sum)
152.37127

. gen f = exp(lnf)

. summ f, detail

                              f
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .2623688       .2071112
 5%     .9296673       .3176263
10%      1.52623       .6939778       Obs                 100
25%     3.771652       .7469733       Sum of Wgt.         100

50%     6.480548                      Mean           5.448205
                        Largest       Std. Dev.      2.266741
75%     7.357449       7.564968
90%      7.51112        7.56509       Variance       5.138117
95%     7.551539       7.566087       Skewness      -.8968159
99%     7.566199        7.56631       Kurtosis       2.431257

We can see that some values for the log likelihood are negative, but most are positive, and that the sum is the value we already know. In the same way, most of the values of the likelihood are greater than one.

As an exercise, try the commands above with a bigger variance, say, 1. Now the density will be flatter, and there will be no values greater than one.

In short, if you have a positive log likelihood, there is nothing wrong with that, but if you check your dispersion parameters, you will find they are small.

Categories: Statistics Tags:

Including covariates in crossed-effects models

The manual entry for xtmixed documents all the official features in the command, and several applications. However, it would be impossible to address all the models that can be fitted with this command in a manual entry. I want to show you how to include covariates in a crossed-effects model.

Let me start by reviewing the crossed-effects notation for xtmixed. I will use the homework dataset from Kreft and de Leeuw (1998) (a subsample from the National Education Longitudinal Study of 1988). You can download the dataset from the webpage for Rabe-Hesketh & Skrondal (2008) (http://www.stata-press.com/data/mlmus2.html), and run all the examples in this entry.

If we want to fit a model with variable math (math grade) as outcome, and two crossed effects: variable region and variable urban, the standard syntax would be:

(1)   xtmixed math ||_all:R.region || _all: R.urban

The underlying model for this syntax is

math_ijk = b + u_i + v_j + eps_ijk

where i represents the region and j represents the level of variable urban, u_i are i.i.d, v_j are i.i.d, and eps_ijk are i.i.d, and all of them are independent from each other.

The standard notation for xtmixed assumes that levels are always nested. In order to fit non-nested models, we create an artificial level with only one category consisting of all the observations; in addition, we use the notation R.var, which indicates that we are including dummies for each category of variable var, while constraining the variances to be the same.

That is, if we write

xtmixed math  ||_all:R.region

we are just fitting the model:

xtmixed math || region:

but we are doing it in a very inefficient way. What we are doing is exactly the following:

generate one = 1
tab region, gen(id_reg)
xtmixed math || one: id_reg*, cov(identity) nocons

That is, instead of estimating one variance parameter, we are estimating four, and constraining them to be equal. Therefore, a more efficient way to fit our mixed model (1), would be:

xtmixed  math  ||_all:R.region || urban:

This will work because urban is nested in one. Therefore, if we want to include a covariate (also known as random slope) in one of the levels, we just need to place that level at the end and use the usual syntax for random slope, for example:

xtmixed math public || _all:R.region || urban: public

Now let’s assume that we want to include random coefficients in both levels; how would we do that? The trick is to use the _all notation to include a random coefficient in the model. For example, if we want to fit

(2) xtmixed math meanses || region: meanses

we are assuming that variable meanses (mean SES per school) has a different effect (random slope) for each region. This model can be expressed as

math_ik = x_ik*b + sigma_i + alpha_i*meanses_ik

where sigma_i are i.i.d, alpha_i are i.i.d, and sigmas and alphas are independent from each other. This model can be fitted by generating all the interactions of meanses with the regions, including a random alpha_i for each interaction, and restricting their variances to be equal. In other words, we can fit model (2) also as follows:

unab idvar: id_reg* 
foreach v of local idvar{
    gen inter`v' = meanses*`v'
}

xtmixed math  meanses ///
  || _all:inter*, cov(identity) nocons ///
  || _all: R.region

Finally, we can use all these tools to include random coefficients in both levels, for example:

xtmixed math parented meanses public || _all: R.region || ///
   _all:inter*, cov(identity) nocons || urban: public

References:
Kreft, I.G.G and de J. Leeuw. 1998. Introducing Multilevel Modeling. Sage.
Rabe-Hesketh, S. and A. Skrondal. 2008. Multilevel and Longitudinal Modeling Using Stata, Second Edition. Stata Press

Categories: Statistics Tags: ,