Archive

Author Archive

Putting the Stata Manuals on your iPad

You can install the Stata manuals on your iPad. Here’s how: install GoodReader and copy the manuals from your computer to your iPad. It takes a few minutes and will cost you about $7 to purchase the app.

Once installed, launch GoodReader, press the bookmark icon at the bottom of the screen, and GoodReader shows you the list of the manuals.

outlines_small

Well, that’s only a partial list. We’d have to scroll to see them all.

If you tap on a manual, it opens,

g_small

You can swipe to go forward,

All the links are live. If you tap on graph intro, the reader jumps to the manual entry,

gentry_small

Here are some formulas:

formulas_small

To illustrate formulas, I jumped to mi estimate in the [MI] manual. I can jump anywhere because I have all 21 manuals—all 11,000-plus pages—installed on my iPad.

You can have them installed on your iPad, too.

Here’s how.

Step 1. Install GoodReader on your iPad

You must purchase GoodReader 4 from the App Store. No other PDF reader will do. What makes GoodReader a good reader for the Stata manuals is that it can handle links across manuals. As of this date, only GoodReader will do this.

Step 2. Copy the manuals

We are going to copy the manuals from your computer to your iPad. You need a computer containing Stata. This does not have to be the same computer to which you sync your iPad.

Before we do that, however, let’s verify your Stata is up to date. We want to copy the the latest version of the manuals, and StataCorp sometimes updates them. Launch Stata and type update query. Follow update‘s instructions if there’s an update. Updates are free.

We are ready to copy. There are two ways you can copy the manuals. You can physically copy them using iTunes by plugging your iPad into your computer, or you can use GoodReader to wirelessly copy them. We recommend using iTunes because the other method requires file sharing be enabled on the computer and setting that up can be difficult.

 

Option 1 (recommended): Copy the manuals using iTunes

 
Copying the manuals with iTunes is simple and is the method we recommend.

  1. Plug your iPad into the computer that has Stata on it and unlock the iPad if necessary. If you don’t use this computer to sync your iPad, the iPad will ask whether you “Trust this Computer?”. Tap Trust.
  2. Launch iTunes if it hasn’t already launched itself.
  3. The remaining steps you perform on your computer, in iTunes:
    1. Select your iPad from the upper-right corner of the iTunes window.

      ipad_button_small

    2. Click the Apps tab at the top of the iTune’s window. Do not backup or restore your iPad.
    3. Use the mouse to put the cursor on white space in the iTunes window, so that the entire window scrolls. Scroll to the bottom until you see the heading File Sharing and the lists below it.
    4. Select GoodReader—it’s in the Apps list on the left. The list on the right will become GoodReader Documents.

      apps_tab_small

    5. Click the Add… button under GoodReader Documents. A file dialog will appear.
    6. In the file dialog, navigate to the Stata folder and open it so that you see its contents. The Stata folder is located in C:\Program Files\Stata13 in 32-bit Windows, C:\Program Files (x86)\Stata13 in 64-bit Windows, and /Applications/Stata on the Mac.
    7. Select the docs folder then click the Add button.

Once the docs folder appears in GoodReader Documents list, you’re done! Unplug your iPad from the computer. If it makes you feel better, you can eject your iPad first.

Skip to Viewing the manuals below.

 

Option 2: Copy the manuals wirelessly

 

GoodReader can copy the manuals wirelessly from your computer and, even better, keep the copied manuals in sync with the manuals on the computer. However, file sharing must be enabled on your computer and setting that up can be difficult. So we’ll just assume that file sharing is working and, if you have trouble using file sharing, copy the manuals using iTunes as covered above.

  1. Launch GoodReader.
  2. Tap the Connect icon in the upper right corner of the screen.

    connect_button

  3. Tap the Add button next to the Connect to Servers label.

    connect_button

  4. A list of possible connection types will appear. If you’re using Windows, select SMB Server. If you’re using a Mac, select AFP Server.
  5. Enter the login information for the server then tap the Add button.
  6. Tap the server you just added from the Known Servers list. A file dialog will appear.
  7. Navigate to the Stata folder using the file navigator. The Stata folder is located in C:\Program Files\Stata13 in 32-bit Windows, C:\Program Files (x86)\Stata13 in 64-bit Windows, and /Applications/Stata on the Mac. To open a folder, tap the arrow next to the folder name.
  8. In the Stata folder, select the docs folder.
  9. If you wish for GoodReader to keep the manuals on your iPad in sync with manuals on your computer, tap the Sync button. If you wish to just copy the manuals to your iPad, tap the Download button.
  10. Another file navigator appears to allow you to choose where the manuals will be saved. If you want to have the manuals on your iPad automatically synced with your computer, tap the Download here & synchronize button. If you just want the manuals copied over to your iPad, tap the Download folder here button.
  11. If you tapped Download here & synchronize, another popup window will appear allowing you to modify the sync options. Just tap the Sync button to start the sync.

Once the download has completed, the progress window will close and you’re done!

Viewing the manuals

Launch GoodReader.

This first time, it will show you a list of folders. Select docs to open the folder and view a list of Stata manuals.

Select one. GoodReader will show you the first page of the manual, surrounded by GoodReader icons. Tap once in the middle of the screen to hide the icons. Tap later in the middle of the screen to bring them back.

The most useful icon is bookmarks located at the center at the bottom of the screen,

bookmarks_button

Tap the icon and the list of manuals reappears in the Outlines tab so that you can choose another manual.

Updating the manuals

StataCorp sometimes updates the manuals. Refreshing the manuals on your iPad is easy enough.

If you copied the manuals using iTunes — or even if you didn’t — you can repeat the steps to copy the manuals using iTunes.

If you copied the manuals wirelessly, just tap the the Sync button from the main screen of GoodReader while your iPad is on the same network as the computer.

When manuals were real, I used to write in them. I’d highlight something I thought important, or put an arrow here or there. With virtual manuals, that’s called annotating.

There’s an issue with annotating, however. If you update your manuals, you lose your annotations. So either don’t annotate or don’t update.

Using Stata’s random-number generators, part 4, details

For those interested in how pseudo random number generators work, I just wrote something on Statalist which you can see in the Statalist archives by clicking the link even if you do not subscribe:

http://www.stata.com/statalist/archive/2012-10/msg01129.html

To remind you, I’ve been writing about how to use random-number generators in parts 1, 2, and 3, and I still have one more posting I want to write on the subject. What I just wrote on Statalist, however, is about how random-number generators work, and I think you will find it interesting.

To find out more about Statalist, see

Statalist

How to successfully ask a question on Statalist

Using Stata’s random-number generators, part 3, drawing with replacement

The topic for today is drawing random samples with replacement. If you haven’t read part 1 and part 2 of this series on random numbers, do so. In the series we’ve
discussed that

  1. Stata’s runiform() function produces random numbers over the range [0,1). To produce such random numbers, type
    . generate double u = runiform()
    

     

  2. To produce continuous random numbers over [a,b), type
    . generate double u = (b-a)*runiform() + a
    

     

  3. To produce integer random numbers over [a,b], type
    . generate ui = floor((b-a+1)*runiform() + a)
    

    If b > 16,777,216, type

    . generate long ui = floor((b-a+1)*runiform() + a)
    

     

  4. To place observations in random order — to shuffle observations — type
    . set seed #
    . generate double u = runiform()
    . sort u
    

     

  5. To draw without replacement a random sample of n observations from a dataset of N observations, type
    . set seed #
    . sort variables_that_put_dta_in_unique_order
    . generate double u = runiform()
    . sort u
    . keep in 1/n
    

    If N>1,000, generate two random variables u1 and u2 in place of u, and substitute sort u1 u2 for sort u.

     

  6. To draw without replacement a P-percent random sample, type
    . set seed #
    . keep if runiform() <= P/100
    

I’ve glossed over details, but the above is the gist of it.

Today I’m going to tell you

  1. To draw a random sample of size n with replacement from a dataset of size N, type
    . set seed #
    . drop _all
    . set obs n
    . generate long obsno = floor(N*runiform()+1)
    . sort obsno
    . save obsnos_to_draw
    
    . use your_dataset, clear
    . generate long obsno = _n
    . merge 1:m obsno using obsnos_to_draw, keep(match) nogen
    

     

  2. You need to set the random-number seed only if you care about reproducibility. I’ll also mention that if N ≤ 16,777,216, it is not necessary to specify that new variable obsno be stored as long; the default float will be sufficient.

     
    The above solution works whether n<N, n=N, or n>N.

 

Drawing samples with replacement

The solution to sampling with replacement n observations from a dataset of size N is

  1. Draw n observation numbers 1, …, N with replacement. For instance, if N=4 and n=3, we might draw observation numbers 1, 3, and 3.

  2. Select those observations from the dataset of interest. For instance, select observations 1, 3, and 3.

As previously discussed in part 1, to generate random integers drawn with replacement over the range [a, b], use the formula

generate varname = floor((b-a+1)*runiform() + a)

In this case, we want a=1 and b=N, and the formula reduces to,

generate varname = floor(N*runiform() + 1)

So the first half of our solution could read

. drop _all
. set obs n
. generate obsno = floor(N*runiform() + 1)

Now we are merely left with the problem of selecting those observations from our dataset, which we can do using merge by typing

. sort obsno
. save obsnos_to_draw
. use dataset_of_interest, clear
. generate obsno = _n
. merge 1:m obsno using obsnos_to_draw, keep(match) nogen

Let’s do an example. In part 2 of this series, I had a dataset with observations corresponding to playing cards:

. use cards

. list in 1/5

     +-------------+
     | rank   suit |
     |-------------|
  1. |  Ace   Club |
  2. |    2   Club |
  3. |    3   Club |
  4. |    4   Club |
  5. |    5   Club |
     +-------------+

There are 52 observations in the dataset; I’m showing you just the first five. Let’s draw 10 cards from the deck, but with replacement.

The first step is to draw the observation numbers. We have N=52 cards in the deck, and we want to draw n=10, so we generate 10 random integers from the integers [1, 52]:

. drop _all

. set obs 10                            // we want n=10
obs was 0, now 10

. gen obsno = floor(52*runiform()+1)    // we draw from N=52


. list obsno                            // let's see what we have

     +-------+
     | obsno |
     |-------|
  1. |    42 |
  2. |    52 |
  3. |    16 |
  4. |     9 |
  5. |    40 |
     |-------|
  6. |    11 |
  7. |    34 |
  8. |    20 |
  9. |    49 |
 10. |    42 |
     +-------+

If you look carefully at the list, you will see that observation number 42 repeats. It will be easier to see the duplicate if we sort the list,

. sort obsno
. list

     +-------+
     | obsno |
     |-------|
  1. |     9 |
  2. |    11 |
  3. |    16 |
  4. |    20 |
  5. |    34 |
     |-------|
  6. |    40 |
  7. |    42 |     <- Obs. 42 repeats
  8. |    42 |     <- See?
  9. |    49 |
 10. |    52 |
     +-------+

An observation didn’t have to repeat, but it’s not surprising that one did because in drawing n=10 from N=52, we would expect one or more repeated cards about 60% of the time.

Anyway, we now know which cards we want, namely cards 9, 11, 16, 20, 34, 40, 42, 42 (again), 49, and 52.

The final step is to select those observations from cards.dta. The way to do that is to perform a one-to-many merge of cards.dta with the list above and keep the matches. Before we can do that, however, we must (1) save the list of observation numbers as a dataset, (2) load cards.dta, and (3) add a variable called obsno to it. Then we will be able to perform the merge. So let’s get that out of the way,

. save obsnos_to_draw                // 1. save the list above
file obsnos_to_draw.dta saved

. use cards                          // 2. load cards.dta

. gen obsno = _n                     // 3.  Add variable obsno to it

Now we can perform the merge:

. merge 1:m obsno using obsnos_to_draw, keep(matched) nogen

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                                10
    -----------------------------------------

I’ll list the result, but let me first briefly explain the command

merge 1:m obsno using obsnos_to_draw, keep(matched) nogen

merge …, we are performing the merge command,

1:m …, the merge is one-to-many,

using obsnos_to_draw …, we merge data in memory with obsnos_todraw.dta,

, keep(matched) …, we keep observations that appear in both datasets,

nogen, do not add variable _merge to the resulting dataset; _merge reports the source of the resulting observations; we said keep(matched) so we know each came from both sources.

And here is the result:

. list

     +-------------------------+
     |  rank      suit   obsno |
     |-------------------------|
  1. |     8      Club       9 |
  2. |  Jack      Club      11 |
  3. |   Ace     Spade      16 |
  4. |     2   Diamond      20 |
  5. |     6     Spade      34 |
     |-------------------------|
  6. |     8     Spade      40 |
  7. |     9     Heart      42 |   <- Obs. 42 is here ...
  8. | Queen     Spade      49 |
  9. |  King     Spade      52 |
 10. |     9     Heart      42 |   <- and here
     +-------------------------+

We drew 10 cards — those are the observation numbers on the left. Variable obsno in our dataset records the original observation (card) number and really, we no longer need the variable. Anyway, obsno==42 appears twice, in real observations 7 and 10, and thus we drew the 9 of Hearts twice.

 

What could go wrong?

Not much can go wrong, it turns out. At this point, our generic solution is

. drop _all
. set obs n
. generate obsno = floor(n*runiform()+1)
. sort obsno
. save obsnos_to_draw

. use our_dataset
. gen obsno = _n
. merge 1:m obsno using obsnos_to_draw, keep(matched) nogen

If you study this code, there are two lines that might cause problems,

. generate obsno = floor(N*runiform()+1)

and

. generate obsno = _n

When you are looking for problems and see a generate or replace, think about rounding.

Let’s look at the right-hand side first. Both calculations produce integers over the range [1, N]. generate performs all calculations in double and the largest integer that can be stored without rounding is 9,007,199,254,740,992 (see previous blog post on precision). Stata allows datasets up to 2,147,483,646, so we can be sure that N is less than the maximum precise-integer double. There are no rounding issues on the right-hand side.

Next let’s look at the left-hand side. Variable obsno is being stored as a float because we did not instruct otherwise. The largest integer value that can be stored without rounding as a float (also covered in previous blog post on precision) is 16,777,216, and that is less than Stata’s 2,147,483,646 maximum observations. When N exceeds 16,777,216, the solution is to store obsno as a long. We could remember to use long on the rare occasion when dealing with such large datasets, but I’m going to change the generic solution to use longs in all cases, even when it’s unnecessary.

What else could go wrong? Well, we tried an example with n<N and that seemed to work. We should now try examples with n=N and n>N to verify there’s no hidden bug or assumption in our code. I’ve tried examples of both and the code works fine.

 

We’re done for today

That’s it. Drawing samples with replacement turns out to be easy, and that shouldn’t surprise us because we have a random-number generator that draws with replacement.

We could complicate the discussion and consider solutions that would run a bit more efficiently when n=N, which is of special interest in statistics because it is a key ingredient in bootstrapping, but we will not. The above solution works fine in the n=N case, and I always advise researchers to favor simple-even-if-slower solutions because they will probably save you time. Writing complicated code takes longer than writing simple code, and testing complicated code takes even longer. I know because that’s what we do at StataCorp.