Monday, April 5, 2010

STATA: Working with large databases

STATA is limited in that it has to have all observations in active memory. This can cause problems especially for really large files like Medicare, HCUP, which will typically exceed the computers ability to set mem. However, many times you don't need a lot of the information that is in these large files.

In order to read in only the information that you need you can use an 'if' clause with your 'use' statemenat and direct stata to only pull in the variables and observations that meet certain conditions.

use charges dx1 age admit if dx1=='714' using neds_2006_core.dta, clear

This example will only pull in people with a primary diagnosis of 714 (Rhematoid arthritis) and the variables 'charges', 'dx1', 'age', and 'admit'. This many times will obviate the need of going to SAS to cut the data first. Obviously using a * in the var list will pull in all variables.

You can get even more complicated by using logical operators as well.

STATA: Getting variable information into a local macro


Many times one may want to get information from a variable and use it dynamically in your code. There is an easy and powerful way to do this.

Let's say that I wish to have a loop based on the max of a variable you can do

sum var1
local numbr : display r(max)
forvalues i = 1(1)`numbr' {

Anything that you can "display" you can get into a macro. To see a list of what you can recover type

return list

after you do the sum or sum, detail and

ereturn list

after a regression