## How to generate random numbers in Stata

**Overview**

I describe how to generate random numbers and discuss some features added in Stata 14. In particular, Stata 14 includes a new default random-number generator (RNG) called the Mersenne Twister (Matsumoto and Nishimura 1998), a new function that generates random integers, the ability to generate random numbers from an interval, and several new functions that generate random variates from nonuniform distributions.

**Random numbers from the uniform distribution**

In the example below, we use **runiform()** to create a simulated dataset with 10,000 observations on a (0,1)-uniform variable. Prior to using **runiform()**, we set the seed so that the results are reproducible.

. set obs 10000 number of observations (_N) was 0, now 10,000 . set seed 98034 . generate u1 = runiform()

The mean of a (0,1)-uniform is .5, and the standard deviation is \(\sqrt{1/12}\approx .289\). The estimates from the simulated data reported in the output below are close to the true values.

summarize u1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u1 | 10,000 .5004244 .2865088 .0000502 .999969

To draw uniform variates over (a, b) instead of over (0, 1), we specify **runiform(a, b)**. In the example below, we draw uniform variates over (1, 2) and then estimate the mean and the standard deviation, which we could compare with their theoretical values of 1.5 and \(\sqrt{(1/12)} \approx .289\).

. generate u2 = runiform(1, 2) . summarize u2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u2 | 10,000 1.495698 .2887136 1.000088 1.999899

To draw integers uniformly over {a, a+1, …, b}, we specify **runiformint(a, b)**. In the example below, we draw integers uniformly over {0, 1, …, 100} and then estimate the mean and the standard deviation, which we could compare with their theoretical values of 50 and \(\sqrt{(101^2-1)/12}\approx 29.155\).

. generate u3 = runiformint(0, 100) . summarize u3 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u3 | 10,000 49.9804 29.19094 0 100

**Set the seed and make results reproducible**

We use **set seed** *#* to obtain the same random numbers, which makes the subsequent results reproducible. RNGs come from a recursive formula. The “random” numbers produced are actually deterministic, but they appear to be random. Setting the seed specifies a starting place for the recursion, which causes the random numbers to be the same, as in the example below.

. drop _all . set obs 6 number of observations (_N) was 0, now 6 . set seed 12345 . generate x = runiform() . set seed 12345 . generate y = runiform() . list x y +---------------------+ | x y | |---------------------| 1. | .3576297 .3576297 | 2. | .4004426 .4004426 | 3. | .6893833 .6893833 | 4. | .5597356 .5597356 | 5. | .5744513 .5744513 | |---------------------| 6. | .2076905 .2076905 | +---------------------+

Every time Stata is launched, the seed is set to 123456789.

After generating \(N\) random numbers, the RNG wraps around and starts generating the same sequence all over again. \(N\) is called the *period* of the RNG. Larger periods are better because we get more random numbers before the sequence wraps. The period of Mersenne Twister is \(2^{19937}-1\), which is huge. Large periods are important when performing complicated simulation studies.

In Stata, the seed is a positive integer (between 0 and \(2^{31}-1\)) that Stata maps onto the state of the RNG. The state of an RNG corresponds to a spot in the sequence. The mapping is not one to one because there are more states than seeds. If you want to pick up where you left off in the sequence, you need to restore the state, as in the example below.

drop _all . set obs 3 number of observations (_N) was 0, now 3 . set seed 12345 . generate x = runiform() . local state `c(rngstate)' . generate y = runiform() . set rngstate `state' . generate z = runiform() . list +--------------------------------+ | x y z | |--------------------------------| 1. | .3576297 .5597356 .5597356 | 2. | .4004426 .5744513 .5744513 | 3. | .6893833 .2076905 .2076905 | +--------------------------------+

After dropping the data and setting the number of observations to 3, we use **generate** to put random variates in **x**, store the state of the RNG in the local macro **state**, and then put random numbers in **y**. Next, we use **set rngstate** to restore the state to what it was before we generated **y**, and then we generate **z**. The random numbers in **z** are the same as those in **y** because restoring the state caused Stata to start at the same place in the sequence as before we generated **y**. See Programming an estimation command in Stata: Where to store your stuff for an introduction to local macros.

**Random variates from various distributions**

So far, we have talked about generating uniformly distributed random numbers. Stata also provides functions that generate random numbers from other distributions. The function names are easy to remember: the letter *r* followed by the name of the distribution. Some common examples are **rnormal()**, **rbeta()**, and **rweibull()**. In the example below, we draw 5,000 observations from a standard normal distribution and summarize the results.

. drop _all . set seed 12345 . set obs 5000 number of observations (_N) was 0, now 5,000 . generate w = rnormal() . summarize w Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- w | 5,000 .0008946 .9903156 -3.478898 3.653764

The estimated mean and standard deviation are close to their true values of 0 and 1.

**A note on precision**

So far, we generated random numbers with the default data type of *float*. Generating the random numbers with type *double* makes ties occur less frequently. Ties can still occur with type *double* because the huge period of Mersenne Twister exceeds the precison of \(2^{-53}\), so a long enough sequence of random numbers will have repeated numbers.

**Conclusion**

In this post, I showed how to generate random numbers using random-number functions in Stata. I also discussed how to make results reproducible by setting the seed. In subsequent posts, I will delve into other aspects of RNGs, including methods to generate random variates from other distributions and in Mata.

**Reference**

Matsumoto, M., and T. Nishimura. 1998. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. *ACM Transactions on Modeling and Computer Simulation* 8: 3–30.