Showing posts with label loops. Show all posts
Showing posts with label loops. Show all posts

Monday, February 20, 2012

STATA - loops for big-data

A user requested the following:
"I've never read a Stata dta in a loop before. Can you give an example of how this would work? Maybe a use-case as well? Thanks."

There are a number of ways to read in sets of data files into STATA using a loop.  Following on Ari's example (see previous post), let's say you have a file with a million lines which is too large for stata and you want to read in a thousand lines at a time, do some stuff to it to make it smaller, then append the smaller data sets together to create your final single analytic file.  Here is one way

*** Loop will start at 1000, then increment by 1000
***     until it gets to one million
forvalues high = 1000(1000)1000000 {
            local low = `high' - 999               //simple counter
            use datafile.dta in `low'/`high', clear

            <insert code to cut down size of file>

          *** Now create temporary file
           if `high' == 1000 {
                save temp, replace         //only first time through the loop
           }
           else {
               append using temp       
               save temp, replace
          }
}

save finalfile.dta, replace
erase temp
 *** You can also use a tempfile
***  and avoid the extra erase statement


Another way is to use the 'if' statement.  Lets say you have a large database but only want to look at females in that dataset:

use datafile.dta if gender=="female"

You could also put this into a loop to get certain cuts of data, again the gender example

local sex male female
foreach s of local sex {
     use datafile.dta if gender == "`s'", clear
    ** create two data files
    ** male_newfile.dta and then female_newfile.dta
    save `s'_newfile.dta, replace 
}

 Back to my bananas...

Sincerely,
primary data primate