Monday, April 5, 2010

STATA: Working with large databases

STATA is limited in that it has to have all observations in active memory. This can cause problems especially for really large files like Medicare, HCUP, which will typically exceed the computers ability to set mem. However, many times you don't need a lot of the information that is in these large files.

In order to read in only the information that you need you can use an 'if' clause with your 'use' statemenat and direct stata to only pull in the variables and observations that meet certain conditions.

use charges dx1 age admit if dx1=='714' using neds_2006_core.dta, clear

This example will only pull in people with a primary diagnosis of 714 (Rhematoid arthritis) and the variables 'charges', 'dx1', 'age', and 'admit'. This many times will obviate the need of going to SAS to cut the data first. Obviously using a * in the var list will pull in all variables.

You can get even more complicated by using logical operators as well.

No comments:

Post a Comment