The Data Monkey: May 2010

Friday, May 28, 2010

STATA: Did you know stata has all the ICD9 codes?

Yes that's right fellow dissertators, need to know what a specific ICD9 code is, or what is within a range of icd9 codes? You can just type 'icd9 lookup' For example:

icd9 lookup 740*

Returns all ICD9 codes that begin with 740
4 matches found:
740 anencephalus/simil anom*
740.0 anencephalus
740.1 craniorachischisis
740.2 iniencephaly

Interested in pneumonia, but don't know the relevant codes, type

icd9 search "pneumonia"

47 matches found:
003.22 salmonella pneumonia
011.6 tuberculous pneumonia*
011.60 tb pneumonia-unspec
011.61 tb pneumonia-no exam
011.62 tb pneumonia-exam unkn
011.63 tb pneumonia-micro dx
011.64 tb pneumonia-cult dx
011.65 tb pneumonia-histo dx
011.66 tb pneumonia-oth test
041.3 klebsiella pneumoniae
055.1 postmeasles pneumonia...

There are a number of other ICD9 tools (like dealing with those pesky dots), just type 'help icd9'

Thursday, May 13, 2010

STATA: Adding a dot to a graph

I recently had the need to put a single dot of a different color on a graph, and noted that stata doesn't have an option to allow you to place a single dot on a graph. You can put a line using xline and yline options (see below). I came up with the following work around:

sysuse auto
sum mpg
local avgmpg : display r(mean)
gen avgmpg = `avgmpg'

sum price
local avgprice : display r(mean)
gen avgprice = `avgprice'

twoway (scatter mpg price) ///
(scatter avgmpg price if _n==1, mcolor(red) legend(off) ///
text(`avgmpg' `avgprice' "average"))

What I did was create the average price and mpg as varabiles, then drew my primary scatter plot and then overlaid a single dot. The restriction if _n==1 tells stata to only use observation number one. This prevents me from having _N (total number of observations in the data set) dots all right on top of each other, which will use up more resources. The text option just allows me to place text at a given location.

If I wanted to intersecting lines at the averages, I would have done

twoway (scatter mpg price, xline(`avgprice') yline(`avgmpg'))

Happy Coding!

Wednesday, May 12, 2010

STATA: Power calculations

There may be instances, while planning your study, that you need to determine what sample size will give you sufficient power to detect an effect. Formal power calculations are many times used in clinical trials, but this can also be used for econometric studies. For instance, I recently had to do a series of power calculations to determine if we needed to buy the Medicare 100% file, or if we had enough observations in the 5% file. Additionally, if you are at the start of a research project it may be helpful to know if the data that you have on hand is sufficient to detect your hypothesized effect.

In order to do a power calculation you will need four items:
alpha: 0.05 (probability of choosing the alternate hypothesis when the null is true)
power: 0.80 (this is 1 - beta, i.e. the probability of choosing the alternative hypothesis when true)
effect size: how large and effect do you hypothesize, the difference in effect between your treatment and control groups
variance

Once you have these you can use powercal in stata, which is an add on so you will have to download it, type:

search powercal

in stata

Wednesday, May 5, 2010

Two Way Graphs in STATA

From Rob Lieberthal:

If you are using the graph twoway command to create graphs in Stata will render the graph in the same order as your command. For example:

graph twoway (scatter mpg car_weight) (lfitci mpg car_weight)

Will produce a scatter plot of mpg as a function of car_weight, then overlay the fitted line with 95% confidence interval on top of the scatter plot. The confidence interval will cover up most of your scatter plot points. If you want Stata to show the points over the confidence interval, switch the order of the scatter and lfitci commands like so:

graph twoway (lfitci mpg car_weight) (scatter mpg car_weight)

The Data Monkey