A user requested the following:
"I've never read a Stata dta in a loop before. Can you give an example of how this would work? Maybe a use-case as well? Thanks."
There are a number of ways to read in sets of data files into STATA using a loop. Following on Ari's example (see previous post), let's say you have a file with a million lines which is too large for stata and you want to read in a thousand lines at a time, do some stuff to it to make it smaller, then append the smaller data sets together to create your final single analytic file. Here is one way
*** Loop will start at 1000, then increment by 1000
*** until it gets to one million
forvalues high = 1000(1000)1000000 {
local low = `high' - 999 //simple counter
use datafile.dta in `low'/`high', clear
<insert code to cut down size of file>
*** Now create temporary file
if `high' == 1000 {
save temp, replace //only first time through the loop
}
else {
append using temp
save temp, replace
}
}
save finalfile.dta, replace
erase temp
*** You can also use a tempfile
*** and avoid the extra erase statement
Another way is to use the 'if' statement. Lets say you have a large database but only want to look at females in that dataset:
use datafile.dta if gender=="female"
You could also put this into a loop to get certain cuts of data, again the gender example
local sex male female
foreach s of local sex {
use datafile.dta if gender == "`s'", clear
** create two data files
** male_newfile.dta and then female_newfile.dta
save `s'_newfile.dta, replace
}
Back to my bananas...
Sincerely,
primary data primate
Subscribe to:
Post Comments (Atom)
Note that Stata has a `touch` command that you can use with `capture` to avoid the if statements the first time you run a loop. Basically `touch` creates a blank file (optionally with the variables you need in it) so that the subsequent `append` works even if the file hadn't previously existed. The `capture` makes sure that when `touch` fails after the first iteration (because the file already exists) the error just gets ignored. This is also a great example of how solving your own problems can solve other peoples', as a certain simian wrote `touch` about 5 years ago and still gets e-mail from people who find it useful.
ReplyDeleteGreat stuff Data Monkey. Thanks for the follow-up post.
ReplyDeleteI still like your way better though. A blog should be appreciated for its overall beauty
ReplyDeleteBuy Pre Written Essays
Online Writing Services
Accounts Software For Small Business
Users of this technique should be aware that the use statement reads the entire .dta file, if if instructed to store only a subset of observations. So to keep run times reasonable it is best to keep the number of iterations in the for loop as small as possible, while still keeping within the available memory.
ReplyDelete