Wednesday, November 2, 2011

Health economics datasets

"This Compendium of Health Datasets for Economists (ICoHDE) provides the largest collection of English specific datasets available for researchers interested in the field of economics of health and health care."

Monday, September 19, 2011

STATA: Capturing information in labels

Been quoit for awhile, here is a tip on using and getting variable and value labels

Labeling variables and values is useful for understanding what your underlying data represents. This is great while you are in the STATA environment. But many times you may want to use this information in more dynamic ways.

For example, lets say that I am using the autos database and I want to output a simple table of frequencies that looks like this:

Car type    Freq. 
Domestic    52 
Foreign     22 

 In order to do this I have to get the information stored in my variable labels and value label, so follow along:

set more off 
sysuse auto 
label list 

tempname my_table file open `my_table' using ///
"c:\my_table.xls", write replace 

** This is where I get the variable label  ** 
** in long hand                            **  
** local var_name : variable label foreign ** 
local var_name : var l foreign 

** Now that this is in a local   **
** I can use it anywhere         **
** so let's write it to our file ** 
file write `my_table' ("`var_name'") _tab ("Freq.") _n 

** Now lets get our frequencies   **
** and value labels First get the ** 
** name of the label value        ** 
local nm_label : val l foreign 

 forvalues x=0(1)1 { 
        quietly sum foreign if foreign == `x' 
       ** Now to get the label values      **
       ** for 0 "Domestic" and 1 "Foreign" ** 
       ** in the value label origin        ** 
       local val_name : label `nm_label' `x' 
       file write `my_table' ///
         ("`val_name'") _tab (r(N)) _n 

 file close `my_table' 

 This is a powerful way to export data in a meaningful fashion and can save you a lot of time. Recall that after the sum, there are a number of values that we can recover. Type return list, if you need other descriptive statics use the detail option for the sum command. Also you can get post regression estimates through ereturn list after you run a regression. If you are familiar with using matrices in STATA then you can get all of your coefficents, etc.

 More on that later

 Happy coding monkeys...

Friday, June 24, 2011

STATA: Geographic data - cool new commands

I emailed this to most Wharton PhD health care students, but thought this was worthy posting here for others. There are two new commands in stata that allow you to link with google maps and turn addresses into latitudes and longitudes as well as calculate distances and travel times.

First type:

findit geocode

And install the two commands, geocode and travel time

What do these two commands do?

First geocode can take addresses, in many sorts of formats, and then return the latitude and longitude based on Google Geocoding. Because it is using Google the matches can be pretty good, there is flexibility on the addresses, and geocode can also return the geoscore which gives you an estimate of accuracy of the match.

Once you have the latitudes and longitudes you can use the traveltime command to find the distance between points AND the travel time. What is really cool is that it can be driving, walking, or public transport time.

These are probably really useful for a lot of hospital based studies, and other things. Either way check out the help documentation to learn more.

All the best!!! - Hat tip to Mike Harhay who put me onto this.

Thursday, June 23, 2011

Two cool new packages for R

Let's say you have some data stored in a primate-tive format like paper. But you'd like to get it into something a little more evolved. A new R package called digitize lets you do just that. Click a few points to calibrate the axis, and all your new shiny scatterplot points will be stored as real digital data. Not a tool you'll use often, but invaluable when you need it.

If you've ever monkeyed around with ArcGIS, you'll know that it produces pretty maps. Unfortunately its interface is terrible, it crashes frequently, and it's not very easy to automate. R, on the other hand, does not crash and is ea to automate, but its maps are pretty ugly. Enter rworldmap, a package which produces pretty world maps like this:

It's a marked improvement. Read more in the R Journal.

Thursday, June 9, 2011

We are now a part of

R-bloggers is a site that aggregates many of the best R blogs on the internet. We're glad they've allowed our R-related posts to be aggregated there. If you mainly write in R, it's worth checking them out.

R: Speeding things up

R is many things, but it's not exactly speedy like a Patas Monkey. In fact, while it is much faster than many other solutions, R is notably slower than Stata (even inspiring talks that it should be rewritten from scratch!).

Fortunately, Radford Neal has been hard at work speeding R up, and has released some new patches to play with if you find it too slow. You can also try writing key sections in C++, or using Revolution Analytics' offerings (free for academics).

For extreme speed needs, however, R can't be beat, as it has long offered graphics-card based extreme parallelism that commercial solutions are only beginning to match.

Of course, for more prosaic needs, focusing on vectorizing key operations can solve speed troubles. And it's worth noting that the $1,000+ per copy that Stata costs can buy an awful lot of extra processing power to throw at the problem.

Monday, April 18, 2011

SAS: Design of experiments - Marketing research


There have been some requests for SAS tips so I'll post a couple of useful things over the next couple of weeks. SAS has a lot of functions that STATA doesn't or are hard in STATA. For example, doing maps with data is quite easy, like displaying immunization rates by country on a world map (more on this later).

For this post, I just wanted to point pople to an excellent resource if you ever have to design an experiment.

This was put together by Walter Kuhfeld and is an excellent guide on how to design discrete choice and conjoint studies using SAS, along with a number of other marketing based analyses. These obviously come out of the marketing area, but these techniques are being increasingly adapted to the health care field to elicit patient or provider preferences. I found it quite useful in a discrete choice experiment I will be testing on physicians dealing with smoking cessation.

Monkey out...

Wednesday, April 13, 2011

STATA: file write or a way to exporting of almost anything

This is a bit of a repost, but it is so useful that I thought it would useful to people.

Ever want to get a formatted table of summary statitics exported directly from Stata? Outreg2 does a great job with exporting regression results, but what about variable means, variances, or other summary statitics. A great way to do this is with file write. This is a great command and provides you with a lot of control. Its simple:

sysuse auto
file open myfile using"C:/mytable.txt", write replace
file write myfile "Table of descriptive stats" _n _n
file write myfile _tab "Mean" _tab "5th pct" _tab "95th pct"_n
quietly sum price, detail
file write "Price" _tab %7.2f (r(mean)) _tab %7.2f (r(p5)) ///
_tab %7.2f (r(p95)) _n
file close myfile

Here is what just happened. We first open a file with the handle "myfile" that is associated with a text file "mytable.txt". Then I write a header on the first line. The _n sends a hard return, so I sent two hard returns after the header. Then I write my column headers, seperated by tabs (_tab). Then I write my formated summary statistics (%7.2f), again seperated by tabs. Note: you can send anything that is shown in return list or ereturn list so it is pretty flexible. Finally, I close the file. I have created a tab deliminated text file that we can open in excel or elsewhere.

When you combine this with loops and lists of variables that you can store in a local macro, it makes exporting standard tables very easy and automated. See my February 2010 post for a more complicated example.

Happy coding...

Thursday, March 10, 2011

R: Drop factor levels in a dataset

R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist

# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!

The solution is simple: run factor() again:
x <- factor(x)

If you need to do this on many factors at once (as is the case with a data.frame containing several columns of factors), use drop.levels() from the gdata package:
x <- x[x!="antiquewhite1"]
df <- data.frame(a=x,b=x,c=x)
df <- drop.levels(df)

Now I'm going to quit monkeying around and get to sleep.

Wednesday, March 9, 2011

STATA: Useful tidbits and are you there?

Hey all,

A couple things if you find this useful please comment or "follow us" on the blog. Questions? Leave them in the comments or post it (or just email me or Ari and we can post):

Useful tidbit?
Two super important user written codes for STATA that you may not be aware of but will make your life A LOT EASIER:




outreg2: exports your regressions to journal ready tables in text, excel, latex, or other formats. It has a lot of options such as controlling formatting, adding in stars, number of decimal places, etc. It can also append multiple models to the same output file.

logout: This nice little utility also allows you to output almost anything that appears on the STATA window to a file like tables of summary statistics, cross-tabs, etc.

How do you add them to your local copy of STATA? Just type

findit outreg2 and findit logout

Then just download the .ado and .hlp files and you are all set. I give them my highest rating, five bananas, so download them now!

Monday, March 7, 2011

STATA: To the Power of _n and _N, filling in missing data

I'm posting this based on a question I got from one of the other students, and it is a common enough of an issue that I thought it would be worthwhile posting a solution.

STATA has a number of built in variables that you can use in pretty powerful ways. Two key ones are _n and _N where _n is the observation number and _N is the total number of observations in your data. One way to use these is to have stata look "up" or "down" your data.

For example, many times you will have data in the following format

id group name
1 1 "Mickey"
2 1 ""
3 1 ""
4 2 "Davy"
5 2 ""
6 3 "Peter"
7 4 "Michael"
8 4 ""
9 4 ""

But you want your data to look like this

id group name
1 1 "Mickey"
2 1 "Mickey"
3 1 "Mickey"
4 2 "Davy"
5 2 "Davy"
6 3 "Peter"
7 4 "Michael"
8 4 "Michael"
9 4 "Michael"

A very simple solution is:

gsort group -name
replace name = name[_n-1] if name=="" & _n !=1

STATA will then go through the data, in the order it is sorted*, and pull the string value for the previous observation [_n-1] and put it in the current observation if it meets the conditions noted (i.e. it isn't the first observation and the current observation has a missing value in the name variable)

* Important note: For string variables you need to specify gsort group -name. The "-" makes sure that the missing values are below the non-missing. For numeric variables, the opposite is required, namely gsort group num_var because STATA handles missing numeric values as very large numbers.

Also, if your data has been tset (set to a time series database) you can use tsfill. Ah but that is for a later post. I need a banana...good monkey...

Happy Coding!!!

Thursday, March 3, 2011

R: Spatial statistics tutorials

I've done more than just monkey around with spatial statistics and map-making, and for that R is one of (if not the single) best platform out there. Now there's a promising new tutorial to make some of the analysis a little easier to work out. Looks like a big help for people just getting started exploring spatial data.

Tuesday, March 1, 2011

R: Excel spreadsheet manipulation

Sure, statistical packages are much cooler than Excel for data work, but sometimes other monkeys just like doing things in Excel. And primates are social creatures, so you have to collaborate with them. What to do?

There's a nifty new R package called XLConnect that looks like it will manipulate Excel files nicely.

Friday, February 25, 2011

Recommended text editors

This list is by no means intended to be exhaustive, and merely represents editors that I have used and which work well with at least one statistical language.

• jEdit - Free/Open Source: Powerful and easy to use

• UltraEdit - Commercial/$60: Powerful and easy to use

• Eclipse with StatET plugin - Free/Open Source: Capable of being used as a text editor or a full IDE

• EMACS with ESS plugin - Free/Open Source: Steep initial learning curve but extremely powerful

• AutoIt is not a text editor but rather a Windows tool which helps link text editors to your statistical package

Monday, February 7, 2011

R: Functions and environments and a debugging utility, oh my!

The Mark Fredrickson blog has a superb post on R functions and environments that's well worth checking out.

He also includes a handy function for debugging:

> fnpeek <- function(f, name = NULL) {
+ env <- environment(f)
+ if (is.null(name)) {
+ return(ls(envir = env))
+ }
+ if (name %in% ls(envir = env)) {
+ return(get(name, env))
+ }
+ return(NULL)
+ }
> fnpeek(f1)
[1] "n"
> fnpeek(f1, "n")
[1] 7

Friday, February 4, 2011

R: Colors

"It ain't easy being green. ~ Kermit T. F.

Matt Blackwell at the SSSB has made it easy to access all the Craylola(tm) colors in R.

And in case you're not familiar with the way R handles color, here are a few resources:

* The best color chart for R.
* Color palettes in R (allows plotting a spectrum or coordinated palette of colors easily).

AutoIt: The basics

The setup:

Your boss cracks the whip and says, "Get me this data by tomorrow or I'll turn your pet rat into a human and send him to Azkaban!" The data you need though, is stored in some evil organization's crazy computer program, and the only way you can figure out to get it out is to copy and paste a number, click the "Next" button, scroll down three lines, rinse, and repeat. How dreary, and you'd much rather be hanging out with your cool non-data-monkey friends on a Friday night anyway, right?

The solution:

What's that do? Well, it automates all the clicky/typey things that bore humans but make computers leap with excitement. Basically, if you can type it or click it (or both), you can figure out a way to make AutoIt automate it. It's almost like having your very own wand.

Find it here:

Best way to get started is to poke through some of the included scripts and play around. Online docs are here:

Thursday, February 3, 2011

Regular expressions, an example

Why regular expressions are your friend. This is written in Stata but applies to any language where regular expressions exist.

Original version
if length("`qx'")==3 { /*ex: 5q0*/
local a = substr("`qx'", 1, 1)
local b = substr("`qx'", 3, 1)
else if substr("`qx'", 2, 1) == "q" & length("`qx'")==4 { /*ex: 5q20*/
local a = substr("`qx'", 1, 1)
local b = substr("`qx'", 3, 2)
else if substr("`qx'", 3, 1) == "q" & length("`qx'")==4 { /*ex: 10q5*/
local a = substr("`qx'", 1, 2)
local b = substr("`qx'", 4, 1)
else if length("`qx'")==5 { /*ex: 10q20*/
local a = substr("`qx'", 1, 2)
local b = substr("`qx'", 4, 2)

Regular expression version
foreach q in `qx' {
local a = regexr("`q'","q.+","")
local b = regexr("`q'",".+q","")

Wednesday, February 2, 2011

STATA: Input

Quick tip: you can use -input- to send someone a test data set by e-mail without having to deal with attachments.

Try copying and pasting this into your Stata window and look at the dataset that results.

input x y
1 4
2 6
3 7
1 2
1 5

Tuesday, February 1, 2011

STATA: Tempfile

-tempfile- creates a local containing a file address. This file will automatically be removed by Stata once your do file ends, so you don't have to worry about cleaning up your mess. It will always (except for rare occasions in Stata 9 at least when it would mess up) be unique, even if you run two copies of the same do file simultaneously. In short, they're cool.

All -tempfile- does is create the local. It's exactly like if you'd created a local containing a file that you want to exist but that doesn't yet exist. If you want to have a file already existing (e.g. for using with -append- in a loop), check out -findit touch-.

If you call -tempfile- twice with the same name, the first file is gone forever, replaced with a new pointer to a temporary file. This is good in loops.

tempfile c2file
use $xdrive/projects/SeattleStinks/countries_original, clear
rename ex0 ex0_original
sort iso
save `c2file'
use $xdrive/projects/SeattleStinks/countries_new, clear
sort iso
merge iso using `c2file'

Monday, January 31, 2011

STATA: Open partial files

A handy trick which works for any situation in which you can't open a file because it's too big for memory (or you just want massive files to load faster to get a sense of what the data's like before you begin analysis in earnest). You can combine -use- with an if or in statement, as in:

use hugefile.dta in 1/50000, clear


use hugefile.dta if sex == 1, clear

Sunday, January 30, 2011

Tab completion

Let's say your hands are aching from too much typing in of variables. What to do? Get a keyboard tray and learn proper ergonomics, of course.

But what if you just want to reduce the amount of typing in of variables you do for reasons of laziness...err...efficiency. Well, you can type the part of the variable that's unique and then hit Tab. Stata or R (or many other programming environments) will both fill in the rest for you.

Suppose you have variables named:

If you wanted the first variable to show up. Just type "A" and hit TAB.
But if you want the second, you'd have to type "Mara" and hit TAB, because until you hit the fourth letter it won't be sure which variable you want.

Saturday, January 29, 2011

STATA: Matrices

Have a loop that runs through a bunch of different, say, years of data sets, but at the end you only want to store a few summary values for each year? It can be a pain keeping track if the dataset is too large to load all the years into memory and do something with -by-. Plus there are things that are hard to store except as individual values. There's no collapse command for correlation, for instance.

What to do....

Well, other languages (especially R) handle this much more elegantly. But there are some work-arounds in Stata. Basically, we'll create an empty matrix with the number of rows for the number of loops we're going to run and the number of columns for the different types of things we want to store (in this case four correlations for each year).

Alternatively, you could use systematically-named locals (e.g. `cor`yr''), but those get moderately ugly when you get past a few variables, 'cause you still have to output them to a dataset.

Here goes:

local startyr = 1988
local endyr = 2001

local numyears = `endyr' - `startyr' + 1
matrix correlations = J(`numyears',4,.)
matrix colnames correlations = CVD_LC CVD_NONLC CVD_INJINT INJINT_INJACC

forvalues year = `startyr'/`endyr' {
local yearnum = `year' - `startyr' + 1
use "`year'data"

corr drateCVD drateLC
matrix correlations[`yearnum',1] = r(rho)
corr drateCVD drateNONLC
matrix correlations[`yearnum',2] = r(rho)
corr drateCVD drateINJINT
matrix correlations[`yearnum',3] = r(rho)
corr drateINJINT drateINJACC
matrix correlations[`yearnum',4] = r(rho)

drop _all
svmat correlations, names(col)


Friday, January 28, 2011

STATA: egen basics

-egen- has all sorts of cool things for you to play with. In particular, whenever you're thinking about doing something that spans multiple columns or rows, -egen- is usually the preferred solution. It's especially useful in combination with the -by:- prefix.

For instance:
* Want to sum across rows? egen poptotal = rsum(pop1-pop10)
* Want to figure out how many apples are in each household (assuming each row is a person and the apple variable contains the number of apples they own?
bysort householdID: egen applestotal = apples
egen tag = tag(householdID)
keep if tag == 1
drop tag
keep householdID applestotal

Thursday, January 27, 2011

STATA: Assert

Do you have trouble sleeping at night? Do you have a massive proliferation of -if- statements designed to check that all is well in the world? Have I got a product for you! And what a price, only $9.95. Please address all checks to: Me.

Err, right. -assert- is a simple command. Give it a logical statement (like you would an if option), and it will make your program fail if it's not true. Easy error checking. Now you can sleep.

STATA: Formatting display numbers


set obs 2
gen x = 1.1234567
gen y = 2
format * %09.3f
format * %9.3f
format * %9.3g

Just to clarify, the * in the format command is a varlist (the * means "all" in pretty much any language). You could give it x or y instead. See -help varlist- for more fun with varlists.

And to further clarify, the 0 before the rest of the format string (as in the zero in %09.3f) makes Stata pad out zeroes. Why would you ever want this? Well, say you had a state/county code for Augusta, Alabama (these are known as FIPS codes). That's state 01, county 001. So the combined code is 01001. Now say you actually wanted it to export as 01001 instead of 1001, as it would if it were numeric....

set obs 1
gen fips = 01001
tostring fips, replace format(%05.0f)

Wednesday, January 26, 2011

STATA: Mass renaming variables

Have a bunch of variables with the same beginning of their name, but you want them to be named something else? E.g.

You could -reshape long-, then -rename-, then -reshape wide-, but that's ugly, takes forever, will generate missing values if you don't have all the years, doesn't work for things not years, etc. etc.

Instead, try:

E.g. -renpfix pop mom-

Now you've got:

Tuesday, January 25, 2011

STATA: Tokenizing locals

There's a command called -tokenize-. Some people use it a lot, some people only a little. You could do everything it does with regular expressions if you reeeally wanted to, but it makes the whole process a bit easier. It works like this:
* First you run -tokenize- on a string you want to break up into pieces. E.g.:
local States "AL MI TN FL"
tokenize `States'
* Now every word (e.g. something separated by a space in that string) is stored in a series of macros `1' `2' `3'. Try it:
di "`1' `2' `3' `4' `5'"
* But what use is that, you ask? Ah, well you can use them one at a time:
while "`*'" != "" {
local ifrace "`ifrace' | race == `1'"
macro shift
keep if `ifrace'
* What's this funny `*' thing? And what the heck is -macro shift-? We'll start with the latter. -macro shift- does exactly what it sounds like: it "shifts" the entire stack of tokenized locals over by one. So the one that used to be `2' will now be `1', and so forth all the way down the line. The one that used to be `1' is vanished into the ether. The `*' local contains all the remaining token terms that haven't been shifted off the end until the ether yet. So that loop will essentially keep looping over all the words/terms in `States' until they're exhausted, then be done. Within the loop, you can do whatever you want with the contents.
* Note that to actually make that loop produce a working if statement, you'll need to remove the first |. You could do that either by putting the first -local ifrace...- and -macro shift- outside of the loop, or you could use a regular expression to remove the first | once the local is created.

Monday, January 24, 2011

STATA: Appending in loops

Say you have a loop and you want to add the results of the previous iteration on the end of one data file.

local PBFs "David1 David2 David3"
tempfile PBFdata
foreach PBF of local PBFs {
set obs 5
gen currentPBF=`PBF'
append using `PBFdata'
save `PBFdata', replace

This is all well and good, except that code above doesn't work. It fails the first time you run the loop, because there's no `PBFdata' file at that point, only the local pointing to an empty location.

What to do? You've got some options:

-if word("`PBFs'",1) == "`PBF'" append using `PBFdata'-
// This works because it checks whether this is the first time you're running the loop or not, but who wants to type all that? Still, that word function is pretty cool, huh?

-cap append using `PBFdata'-
//This works, but if it fails for other reasons, your program will keep on going and things mess up badly.

-touch `PBFdata'-
//This is what I do. I like it so much, that I had to write the -touch- command just to make it work. Note you have to put it outside the loop (I usually put it at the very top of my do file), otherwise you'll overwrite your file each time!

To install touch, type -findit touch- to locate it in the user-contributed repositories.

Sunday, January 23, 2011

STATA: Regular expressions

A regular expression allows you to do a moderately fancy search (and replace if you want). So say you wanted to replace all the "Dennis"s in a variable with "Awesome"s, but only if they're at the end of the line. You could try:
-replace PBFnamevar = regexr(PBFnamevar,"Dennis$","Awesome")-
You could also replace any character, or just capitals, or just digits...there are lots of possibilities:

You can also use it for locals:
-local strata = regexr("agecat","age")-

Or -if- commands:
if regexm("`strata'","age") {

On a related note (although not actually regular expressions), say that you've got a string variable that consists of a bunch of what should be separate variables, only lumped all into one, separated by a semicolon (e.g. a row might look like "1;15.2;89;hi;21"). Try -split-:
-split textvar, gen(newtextvars) parse(";")-

I should note that Stata's regular expressions are wimpy compared to what other languages support. R supports PERL regular expressions, which can do so many things it's scary.

Saturday, January 22, 2011

STATA: Locals in global names

> i have a series of globals with the names: strata_pop1 strata_pop2
> strata_pop3, etc. all the way up to strata_pop21
> i'm trying to reference the values stored in each with a loop
like this:
> forvalues i = 1/21 {
> display $strata_pop`i'
> }
> the problem is, stata seems to display the global $strata_pop first
> (which has nothing stored in it), and then the value of the local
> so all i get is the values of the local `i' spit back at me. is there
> a way to use a loop to reference the values stored in each of the
> strata_pop globals?

The solution is simple: Enclose the global in curly braces, like this:
display ${strata_pop`i'}

Friday, January 21, 2011

STATA: Stata resources


Lifted from marginal revolutions ( but reproduced fully here. And I totally agree on the Baum book, buy it now, it is awesome (go to marginal revolution for the links)

Stata Resources

Here are some Stata resources that I have found useful. Statistics with Stata by Hamilton is good for beginners although it is overpriced. For the basics I like German Rodriguez's free Stata tutorial best, good material can also be found at UCLA's Stata starter kit and UNC's Stata Tutorial; two page Stata is good for getting started quickly.

Christopher Baum's book An Introduction to Modern Econometrics using Stata is excellent and worth the price. The world is indebted to Baum for a number of Stata programs such as NBERCycles which shades in NBER recession dates on time series graphs--this was a big help in producing graphs for our textbooks!--so buy Baum's book and support a public good.

I have found it hugely useful to peruse the proceedings of Stata meetings where you can find professional guides to using Stata to do advanced econometrics. For example, here is Austin Nichols on Regression Discontinuity and related methods, Robert Guitierrez on Recent Developments in Multilevel Modeling, Colin Cameron on Panel Data Methods and David Drukker on Dynamic Panel Models.

I found A Visual Guide to Stata Graphics very useful and then I lent it to someone who never returned it. I suppose they found it very useful as well. I haven't bought another copy, since it is fairly easy to edit graphs in the newer versions of Stata. You can probably get by with this online guide.

German Rodriguez, mentioned earlier, has an attractively presented class on generalized linear models with lots of material. The LSE has a PhD class on Stata, here are the class notes: Introduction to Stata and Advanced Stata Topics.

Creating a map in Stata is painful since there are a host of incompatible file formats that have to be converted (I spent several hours yesterday working to convert a dBase IV to dBase III file just so I could convert the latter to dta). Still, when it works, it works well. Friedrich Huebler has some of the details.

The reshape command is often critical but difficult, here is a good guide.

Here are many more sources of links: Stata resources, Stata Links, Resources for Learning Stata, and Gabriel Rossman's blog Code and Culture.

Slash confusion

Windows uses backwards slashes to mark off directories (e.g. "c:\temp\PBFSRULE.dta"). UNIX uses forwards slashes (e.g. "c:/temp/PBFSRULE.dta"). Stata accepts either on Windows. However, since the back slash is also used as an escape character (e.g. if you want the ` that starts a local to actually appear as a ` instead of starting a local, you can type \` ), it is not a bad idea to get in the habit of using forward slashes.

That prevents problems with something like this:
local outfile "PBFSRULE.dta"
save "c:\temp\`outfile'"

Besides, if you switch between UNIX and Windows environments, your code will be usable in both environments this way.

For a little history:
And a more lyrical interpretation:

Try these in Stata:
local filename "blah.dta"
local directory "c:/temp"
di " `directory'\`filename' "
di "Oops. It doesn't work because the backslash is escaping the character that follows it instead of allowing Stata to interpret it as usual."
di " `directory'/`filename' "
di "Forward slashes don't have this problem."
di " `directory'\\`filename' "
di "This time the first backslash escapes the second one, preventing it from having its usual function of escaping the local quote mark \` "
di "Crazy huh?"

Thursday, January 20, 2011

STATA: Do file header

Recommended code to put at the top of every do file:

cap log close
set mem __m // Where __ is the memory size you want, obviously
set more off
pause on

STATA: Window management

Have you moved all your windows within Stata to the wrong place and can't figure out how to get them back? Try -window manage prefs default- .

Wednesday, January 19, 2011

STATA: Debugging with trace

Having a problem figuring out where a pesky line of code is hiding? Type -set trace on- and run your do file again. You'll get more ridiculously verbose output than you can shake a stick at, which will take twice as long to run but let you know in excruciating detail exactly what Stata was thinking. Just remember to -set trace off- when you're done.

Tuesday, January 18, 2011


Each time Stata loads, it runs the commands in the file located in the Stata installation directory (typically "c:\program files\stata11" on a Windows machine). This provides a handy way to configure your Stata environment just the way you like it. For instance, if you often find that 10MB of memory isn't enough, that you want a global pointing to a particular location*, and that your custom ADO files are in another location, your might look like:
set mem 50m
global xdrive "c:/xdrive"
adopath ++ "$xdrive/projects/ado"

* This is particularly useful when you routinely do your work on several computers with different directory structures.

Monday, January 17, 2011

STATA: Nifty commands (-expand- and -set obs-)

One of the common ways of getting things done in Stata is to add observations to the end of the dataset, then modify them in some way. The -expand- command makes this easy, by adding replicates of the observations in memory on to the end, after which you can modify them. You will likely want to save the current number of observations to a local so you know which are the new copies: - local originalN = _N -

Want to duplicate your dataset?
-expand 2-
Want to triplicate your dataset?
-expand 3-
Want to duplicate only the observations from year 1999?
-expand 2 if year==1999-

If you want to create blank rows at the end of a dataset, use -set obs- instead:
local numobs = _N + 1
set obs `numobs'
replace x = 10 in l

Thursday, January 13, 2011

STATA: Executing code over multiple dataset with different variables

Many times panel datasets comes as yearly data files, which may have variables that are dropped or added over time. This is especially true as the panel gets longer. There are times when you just want to automatically go over each yearly file and execute some code, but if you try to execute code on a variable that doesn't exist STATA will halt the execution of your do file. You can use a bunch of if statements, but an easy solution:

capture confirm variable varname
if _rc == 0 {
code block

The capture command allows a command to be executed even if it creates an error, in this case the command is: confirm variable varname). If there is an error capture puts a code in the scalar _rc, but allows the do file to keep running.

So in this case if the variable didn't exist the scalar _rc would be 111, if the variable does exist _rc is set to 0. So the only time that the code block is executed is if the variable actually exists, if it doesn't then the do file skips that step and then keeps going.