Saturday, January 29, 2011

STATA: Matrices

Have a loop that runs through a bunch of different, say, years of data sets, but at the end you only want to store a few summary values for each year? It can be a pain keeping track if the dataset is too large to load all the years into memory and do something with -by-. Plus there are things that are hard to store except as individual values. There's no collapse command for correlation, for instance.

What to do....

Well, other languages (especially R) handle this much more elegantly. But there are some work-arounds in Stata. Basically, we'll create an empty matrix with the number of rows for the number of loops we're going to run and the number of columns for the different types of things we want to store (in this case four correlations for each year).

Alternatively, you could use systematically-named locals (e.g. `cor`yr''), but those get moderately ugly when you get past a few variables, 'cause you still have to output them to a dataset.

Here goes:

local startyr = 1988
local endyr = 2001

local numyears = `endyr' - `startyr' + 1
matrix correlations = J(`numyears',4,.)
matrix colnames correlations = CVD_LC CVD_NONLC CVD_INJINT INJINT_INJACC

forvalues year = `startyr'/`endyr' {
local yearnum = `year' - `startyr' + 1
use "`year'data"

corr drateCVD drateLC
matrix correlations[`yearnum',1] = r(rho)
corr drateCVD drateNONLC
matrix correlations[`yearnum',2] = r(rho)
corr drateCVD drateINJINT
matrix correlations[`yearnum',3] = r(rho)
corr drateINJINT drateINJACC
matrix correlations[`yearnum',4] = r(rho)
}

drop _all
svmat correlations, names(col)

exit

No comments:

Post a Comment