Home > Resources > How to successfully ask a question on Statalist

How to successfully ask a question on Statalist

As everyone knows, I am a big proponent of Statalist, and not just for selfish reasons, although those reasons play a role. Nearly every member of the technical staff at StataCorp — me included — are members of Statalist. Even when we don’t participate in a particular thread, we do pay attention. The discussions on Statalist play an important role concerning Stata’s development.

Statalist is a discussion group, not just a question-and-answer forum. Nonetheless, new members often use it to obtain answers to questions and that works because those questions sometimes become gist for subsequent discussions. In those cases, the questioners not only get answers, they get much more.

One of the best features of Statalist is that, no matter how poorly you ask a question, you are unlikely to be flamed. Not only are the members of Statalist nice — just as are the members of most lists — they act just as nice on the list as they really are. You are unlikely to be flamed if you ask a question poorly, but you are also unlikely to get an answer.

Here is my recipe to increase the chances of you getting a helpful response. You should also read the Statalist FAQ before writing your question.

Subject line

Make the subject line of your email meaningful. Some good subject lines are:

Survival analysis

Confusion about -stcox-

Unexpected error from -stcox-

-stcox- output

The first two sentences

The first two sentences are the most important, and they are the easiest to write.

In the first sentence, state your problem in Stata terms, but do not go into details. Here are some good first sentences:

I’m having a problem with -stcox-.

I’m getting an unexpected error message from -stcox-.

I’m using -stcox- and don’t know how to interpret the result.

I’m using -stcox- and getting a result I know is wrong, so I know I’m misunderstanding something.

I want to use -stcox- but don’t know how to start.

I think I want to use -stcox-, but I’m unsure.

I want to use -stcox- but my data is complicated and I’m unsure how to proceed.

I have a complicated dataset that I somehow need to transform into a form suitable for use with -stcox-.

Stata crashed!

I’m having a problem that may be more of a statistics issue than a Stata issue.

The purpose of the first sentence is to catch the attention of members who have an interest in your topic and let the others, who were never going to answer you anyway, move on.

The second sentence is even easier to write:

I am using Stata 11.1 for Windows.

I am using Stata 10 for Mac.

Even if you are certain that it’s unimportant which version of Stata you are using, state it anyway.

Write two sentences and you are done with the first paragraph.

The second paragraph

Now write more about your problem. Try not to be overly wordy, but it’s better to be wordy than curt to the point of unclearness. However you write this paragraph, be explicit. If you’re having a problem making Stata work, tell your readers exactly what you typed and exactly how Stata responded. For example,

I typed -stcox weight- and Stata responded “data not st”, r(119).

I typed -stcox weight sex- and Stata showed the usual output, except the standard error on weight was reported as dot.

The form of the second paragraph — which may extend into the third, fourth, … — depends on what you are asking. Describe the problem concisely but completely. Sacrifice conciseness for completeness if you must or you think it will help. To the extent possible, simplify your problem by getting rid of extraneous details. For instance,

I have 100,000 observations and 1,000 variables on firms, but 4
observations and 3 variables will be enough to show the problem.
My data looks like this

        firm_id     date      x
          10043       17     12
          10043       18      5
          13944       17     10
          27394       16      1

I need data that looks like this:

        date    no_of_firms   avg_x
          16              1       1
          17              2      11
          18              1      12

That is, for each date, I want the number of firms and the
average value of x.

Here’s another example for the second and subsequent paragraphs:

The substantive problem is this:  Patients enter and leave the 
hospital, sometimes more than once over the period.  I think  
in this case it would be appropriate to combine the 
separate stays so that a patient who was in for 2 days and 
later for 4 days could be treated as being simply in for 6 days,  
except I also record how many separate stays there were, too.

I'm evaluating cost, so for my purposes, I think treating 
cost as proportional to days in hospital, whatever their
distribution, will be adequate.  I'm looking at total days as a
function of number of stays.  The idea is that letting patients out
too early results in an increase in total days, and I want to
measure this.

I realize that more stays and days might also arise simply because
the patient was sicker.  Some patients die, and that obviously 
truncates stay, so I've omitted them from data.  I have disease
codes, but nothing about health status within code.  

Is there a way to incorporate this added information to improve  
the estimates?  I've got lots of data, so I was thinking of 
using death rate within disease code to somehow rank the codes 
as to seriousness of illness, and then using "seriousness" 
as an explanatory variable.  I guess my question is whether  
anyone knows a way I might do this. 

Or is there someway I could estimate the model seperately within
disease code, somehow constraining the coefficient on number of 
stays to be the same?  I saw something in the manual about 
stratified estimates, but I'm unsure if this is the same thing.

You’re asking someone to invest their time, so invest yours

Before you hit the send key, read what you have written, and improve it. You are asking for someone to invest their time helping you. Help them by making your problem easy to understand.

The easier your problem is to understand, the more likely you are to get a response. Said differently, if you write in a disorganized way so that potential responders must work just to understand you, much less provide you with an answer, you are unlikely to get an response.

Sparkling prose is not required. Proper grammar is not even required, so nonnative English speakers can relax. My advice is that, unless you are often praised for how clearly and entertainingly you write, write short sentences. Organization is more important than the style of the individual setences.

Avoid or explain jargon. Do not assume that the person who responds to your question will be in the same field as you. When dealing with a substantive problem, avoid jargon except for statistical jargon that is common across fields, or explain it. Potential responders like it when you teach them something new, and that makes them more likely to respond.


Write as if you are writing to a colleague whom you know well. Assume interest in your problem. The same thing said negatively: Do not write to list members as you might write to your research assistant, employee, servant, slave, or family member. Nothing is more likely to to get you ignored than to write, “I’m busy and really I don’t have time filter through all the Statalist postings, so respond to me directly, and soon. I need an answer by tomorrow.”

The positive approach, however, works. Just as when writing to a colleague, in general you do not need to apologize, beg, or play on sympathies. Sometimes when I write to colleagues, I do feel the need to explain that I know what I’m asking is silly. “I should know this,” I’ll write, or, “I can’t remember, but …”, or, “I know I should understand, but I don’t”. You can do that on Statalist, but it’s not required. Usually when I write to colleagues I know well, I just jump right in. The same rule works with Statalist.

What’s appropriate

Questions appropriate for Stata’s Technical Services are not appropriate for Statalist, and vice versa. Some questions aren’t appropriate for either one, but those are rare. If you ask an inappropriate question, and ask it well, someone will usually direct you to a better source.

Who can ask, and how

You must join Statalist to send questions. Yes, you can join, ask a question, get your answer, and quit, but if you do, don’t mention this at the outset. List members know this happens, but if you mention it when you ask the question, you’ll sound superior and condescending. Also, stick around for a few days after you get your response, because sometimes your question will generate discussion. If it does, you should participate. You should want to stick around and participate because if there is subsequent discussion, the final result is usually better than the initial reply.

I’ve previously written on how to join (and quit) Statalist. See http://blog.stata.com/2010/11/08/statalist/.

Categories: Resources Tags:
  • Aramesh_sch

    how could I enter a fix variable in the clogit. in output the variable is omited

  • Anonymous

    I am trying to update my STATA Intercool 9.0 and after typing ‘update query’ and ‘update all’, I receive a r.(603) error message after downloading all the files and during the installation stage. I have never updated it since 2005 because I have not used it extensively.

    Below is the output. How can I ensure proper installation of the updates?

    . help whatsnew

    . update query
    (contacting http://www.stata.com)

    Stata executable
        folder:               C:Program FilesStata9
        name of file:         wstata.exe
        currently installed:  05 Jul 2005
        latest available:     20 Jul 2007

    Ado-file updates
        folder:               C:Program FilesStata9adoupdates
        names of files:       (various)
        currently installed:  05 Jul 2005
        latest available:     20 Jul 2007

        Stata 10, a new release, is available.
        For details, point your browser at

        Type -update all-


        4.  examining files

        5.  installing files


  • Majed Ayadi

    I want to estimate a meta frontier stochastic production fonction but I ignore to read shazam program to this end. I want some help on who to read a shazam program

  • Mario Marques

    I am working with count data and I need to do my estimates using zero truncated negative binomial model, however results do not converge. How can I solve this.

  • HayY

    Nice one. I want to run Meta-frontier analysis. Can someone help me please?

  • aloff

    Hi everyone! I was wondering if you could help me I got this output and it seems strange….

    What is wrong with this output? chi2 values???
    Thank you a ton!

    Structural |

    t4_YARC_Acc_ASc_mean chi2 = .

    . estat gof, stats (all)


    Fit statistic | Value Description


    Likelihood ratio |

    chi2_ms(0) | 0.000 model vs. saturated

    p > chi2 | .

    chi2_bs(3) | 57.452 baseline vs. saturated

    p > chi2 | 0.000


    Population error |

    RMSEA | 0.000 Root mean squared error of approximation

    90% CI, lower bound | 0.000

    upper bound | 0.000

    pclose | 1.000 Probability RMSEA <= 0.05


    Information criteria |

    AIC | 2082.463 Akaike's information criterion

    BIC | 2093.705 Bayesian information criterion


    Baseline comparison |

    CFI | 1.000 Comparative fit index

    TLI | 1.000 Tucker-Lewis index


    Size of residuals |

    SRMR | 0.000 Standardized root mean squared residual

    CD | 0.560 Coefficient of determination

  • Alesandra

    Hi. I’d like to run a Markov chain. I don’t even know how to organise data.

    Thank you!

  • Anand

    Hi, I am doing censored quantile regression using the cqiv command. How can I retrieve the standard errors of estimates? Any kind of help is appreciated

  • Luke

    Hello everyone, I am having troubles with reshaping my data. The data currently looks as follows:

    Firm Variable 1990 1991 1992 etc.

    x 1

    x 2

    y 1

    y 2

    z 1

    z 2

    I would like the data to look as follows:

    Firm Year Variable1 Variable2

    x 1990

    x 1991

    x 1992

    y 1990

    y 1991

    I started off with the command: reshape long y, i(Name Variable) j(Year)

    Now I have all my data long.
    Next I want to make my variables wide, I already encoded the variable names because these are string variables, with the command: reshape wide v, i(Name Year y) j(Variable).
    But when I do this it screws up my database, it does not correctly assign my data across variables, firms and year.
    If anybody could help, it would really be appreciated.

  • Kat

    Hey everyone, I have a bit of trouble with plotting my continuous interactions using the marginsplot command after xt logit. The coefficient for the interaction comes up positive, but when I use margins and then marginsplot, the actual graph indicates a negative effect, whcih is quite confusing. Variables are coded correctly, I have checked that. I presume it must be a flaw in my margins command. I use:

    margins, predict(pu0) at(var1=(x (x) x) var2=(x (x) x))

    I have also tried:

    margins c.var1#c.var2

    But Stata gives me an error message ‘variable c1 not found r(111)’

    Any suggestions are very much appreciated. Thanks very much in advance for your help with this.

  • PMC

    Q1. Is there a limit of countries to be used for cross-sectional analysis?
    Q2. Can I use 7 countries across 20 years for panel regression?

  • MCM

    I want to know if it’s possible to insert both short-run and long-run restrictions in SVAR on stata (12)

  • Michael Sikivie

    As simple as it sounds, I’m having trouble merging. 0 observations are merging. I’m merging on the variable iso_o and as far as I can see every observation in the master dataset has a value of iso_o that’s spot on identical with the value for some observation in the using set. For example, after merging there are observations with iso_o==AFG in the master only category as well as an observation of iso_o==AFG in the using only category but none matched. I checked that they’re both ASCII in Excel by using the function code and they both give the same ASCII code. Although it’s a string, using the encode command in Stata and using the new variable generated doesn’t work, as well.

    sort iso_o year
    merge m:1 iso_o using numeric_letter_code

    * The using data set really is called numeric_letter_code. Long story.

    Result # of obs.
    not matched 11,588
    from master 11,400 (_merge==1)
    from using 188 (_merge==2)

    matched 0 (_merge==3)

  • Anne-Laure

    Confusion between the options “robust” and “cluster”.

    If there is heteroskedasticity, there is the option “robust”. And to have an estimator that is robust to the correlation of disturbances within groups, I can use the option “cluster”. So the option “cluster” allows the correlation of disturbances within cluster. In my case, I would say that my residuals are probably correlated between individuals in the same country. But I already use a country fixed effect, so my residuals should not be anymore correlated between individuals in the same country. Can I say that the option “cluster” is not useful anymore, so I should use only the “robust” option?

    PS: My dependent variable is qualitative. I am using 3 models: a Linear Probability Model, a Logit and a Probit.

  • Noam

    I think I want to use “dirifit” to test differences between four dependent proportions (e.g., the proportion of presses on each one of four relevant keys during a task) but without another explanatory variable (e.g., gender), does anyone can advise me what is the correct Stata command?

    In my data each participant has four dependent proportions e.g., .30, .25, .10, 35 (sum=1). what significant test and Stata command should I use to test the difference between those proportions?

    thank you!


  • Hello everyone,

    We see that quite a few people have been posting questions after this blog entry.

    The comment fields after the blog entries here are for questions regarding each particular blog entry — not for general Stata questions. General Stata questions should typically be submitted to Stata Technical Services or to Statalist.

    It would be better for the questions below to be directed either to Stata Technical Services (see http://www.stata.com/support/tech-support/ for details on how to contact Technical Services) or to Statalist (read this blog entry to determine whether your question is appropriate for Statalist, and if so, how to post it, or whether it should be directed to Stata Technical Services).

    Thank you!

  • Oonagh Jones

    hello. i am trying to find stata results for my project.. i have choosen to do it on this.http://fmwww.bc.edu/ec-p/data/wooldridge/phillips.des. i need to find the robust regression of the time series but everything i type is wrong. can someone help me? thank you

  • Sunny Singh

    Hello, I’m trying to do rolling regression for the nonlinear equation (exponential). My functional form (stata form) for nonlinear equation is:

    nl(weeklygrowth=({alpha1}+{alpha2}*day+{alpha3}*day2+{alpha4}*day3+{alpha5}*day4{a1}*m1+ {a2}*m2+ {a3}*m3+{a4}*m4+{a5}*m5+{a6}*m6+{a7}*m7+{a8}*m8+{a9}*m9+{a10}*m10+{a11}*m11)exp(-1{beta1=0.0005}*t))

    Please tell me how to do rolling regression for this equation (window=522).

    Best Regards, Sunny

  • vincent

    am getting problem my stata11.1 version have no mdraws and egefunction mvp i need your help

  • Kwame Adobaw


    I would be grateful if you could assist me to undertake the ”test for evolving efficiency (TEE)” in stata, as used in the paper “THE CHANGING EFFICIENCY OF AFRICAN STOCK MARKETS” by Smith and Jefferis (2005) South African Journal of Economics Vol. 73:1 by kindly providing me with the syntax. I have tried using sspace estimation but could not draw the graphs showing evolving efficiency

  • Hana

    Latent Class Regression
    What are the Stata commands for latent Regression ( LCM) if my dependent variable is discrete with value of ( 0 or 1) ?

  • bill

    i am trying two days now to post a thread . however when i send an email a get a message saying my message was bounced. what am i doing wrong? this is what i want to send

    this is my first time using this service so apologies

    if i make something wrong. i am trying to replicate

    the results of the paper of mody and taylor 2003.

    in this paper they have data on real industrial

    production and want to see if a certain spread

    can predict the growth in industrial production

    when it is stripped of alternately its demand side

    and supply side components. i get to understand

    this is dome kind of decomposition. unfortunately i

    have no idea how to do it. any help would be very

    much appreciated. is there a command i should use?

    thank you in advance

    to tell you the truth so far this community was a big dissapointment. why should it be so difficult to send a message? a simple message of 8 sentences?? anyway if you could let me know what i am doing wrong sending an email please do.

  • Omnia Mansour

    Imputing missing values


    I have a problem in imputing missing values in my panel dataset, all my variables except for one have missing values!

    The problem occurs when I used usual impute, some of the imputed values are negative for variables that cant take negative values “unemployment, taxes” and another problem occurs with the imputed values for a DISCRETE variable of specified range “taking vaules… (0, 1 , 2, …6)
    I am confused, how to impute missing values in Rational pattern ( +ve and within range also).

    Thanks for your time

  • yass

    how calculate a rooling window percentile of a arry of range data ? i.e. how can use the same commande “Percentile(array, k) of excel in Stata”) to have a liste of percentile. the question i have a liste of 1000 observation and i wld to calculate of each rolling 250 data the 99th percnetile? tks.


    I am working on national household survey data. I constructed the consumption aggregate and estimated the per capita consumption per annum. The entire dataset is household level data. The data is also weighted (household weights). The data has health insurance coverage per household (i.e. number of persons covered by health insurance per household). I have divided the population into quintiles (using the command: xtile quintile=cons_pc[aw=weights], n(5)). The problem is I want to use stata to calculate health insurance coverage in the entire population across quintiles applying the household weights. I have tried the command: tab quintile nhis[aw=weights], but it doesn’t give me exactly what I am looking for. I gives me values at the household level.

    Please, I need your help.


  • Andrea

    Hello! I urgently need your help pleeeeaseeee…i am trying to make a ttest of 2 variables and have coded it like this:
    use “I_final.dta”

    append using “JP_final.dta”

    drop if year!= 2004 & year!=2005

    gen Group = (nation == “ITALY”)
    tabulate Group, gen(g)

    gen Time = (year == 2005)
    tabulate Time, gen(t)

    ////////////////////////ttest Difference1/////////////////////////////////
    gen Variable1=.
    replace Variable1=ZERORET[_n] if Group==1 & Time == 1
    replace Variable1=0 if Variable1==.

    mvdecode Variable1, mv(0=.a)

    gen Variable2=.
    replace Variable2=ZERORET[_n] if Group==1 & Time == 0
    replace Variable2=0 if Variable2==.

    mvdecode Variable2, mv(0=.b)

    ttest Variable1=Variable2

    Variable1 just contains the values of ZERORET in the case of group=1 and time=1. Variable2 just contains the values of ZERORET in the case of group=1 and time=0. The other values are going to be zero. For this reason, i have decoded the numeric “zero” in a missing value “.a” and “.b”
    the problem appears in the last code: it is appearing “no observations”….although Variable 1 and Variable 2 have numeric values (and missing values, which i have coded as a missing value)

    i am not getting the problem.

    pleas i need your help.

    Thank you very much for now!


  • Maria

    Hi, anyone has the ado for pscore2????

  • 1245


  • Jair Araujo

    i want too.

  • Belal Fallah

    what is the command to estimate a propensity score match when the outcome is binary?

  • chanarcisse

    Afternoon to all the followers.
    Please, I am blocked in my econometric analysis.
    How can I conduct an ARDL time series analysis using stata13?
    Especially, how can I conduct cointegration test using pesaran et al (2001) approch; I mean Bound Tetsting approach.
    Please, I need your help

  • chanarcisse

    I would like if possible to have required stata commands

  • Asad Ali

    how to interpret the results of wald test after ivprobit

    Wald test of exogeneity: Wald test of exogeneity: chi2(1) = 0.32 Prob > chi2 = 0.5716

    what does this means ?

  • Gaye del LO

    Hello dear all,
    I got these results of hausman and Hsiai IIA tests:

    **** Hausman tests of IIA assumption (N=38555)

    Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.

    Omitted | chi2 df P>chi2 evidence
    Bois | -1.824 14 — —
    Electric | 72.157 14 0.000 against Ho
    Gaz | -117.428 14 — —
    Note: If chi2chi2 evidence
    Bois | -1.86e+04 -1.85e+04 10.477 16 0.841 for Ho
    Electric | -6825.505 -6816.923 17.165 16 0.375 for Ho
    Gaz | -7606.779 -7599.185 15.189 16 0.511 for Ho
    Can you conclude that IIA hypothesis is hold?
    thanks for helps,

  • robert prince

    Hi there,
    I need to find the number of days between dates, calculating from the first date
    ID Date
    1 11nov2006
    1 26may2007
    1 26may2007
    1 30may2007

    gen newvar= date-date[n-1] does NOT work since it will go back to 0 for 26may2007.

    Does anyone know how to do this? Thanks

  • Lia Rodriguez

    Hello. I am not sure if the data I am working on are pooled cross section or panel data.The database includes 60 national elections of 14 countries in which non-residents were allowed to vote. The elections were held among 1993 and 2016. Variables include aggregate data such as electoral turn out of citizens living abroad, for each election in each country. My ID variable is country and my time variable is year. In some countries emigrants could vote once during that period while in other countries they could vote more than ten times. For most of the countries my database includes every election held were emigrants could vote, while for others I have missing data. Understanding if I have a panel or pooled cross sections is important to see if I have to deal or not with serial correlation residuals. Can anyone help me?

  • Sreenivas Vishnubhatla

    As an applied Statistician working with medical persons, I have to analyse large data sets. When I compile a do file with involving a fw thousands of lines, whenever the program stops, it is a cumbersome problem to locate the command where the execution stopped. I thought if the line number where the program stopped is prompted, it would be easy. Can anyone help how to tell Stata Do file to prop up the line number where it stopped
    Best wishes

  • Kwame Ohene Djan

    Hi, Please does seemingly unrelated regression address endogeneity?