Friday, June 24, 2011

STATA: Geographic data - cool new commands

I emailed this to most Wharton PhD health care students, but thought this was worthy posting here for others. There are two new commands in stata that allow you to link with google maps and turn addresses into latitudes and longitudes as well as calculate distances and travel times.

First type:

findit geocode

And install the two commands, geocode and travel time

What do these two commands do?

First geocode can take addresses, in many sorts of formats, and then return the latitude and longitude based on Google Geocoding. Because it is using Google the matches can be pretty good, there is flexibility on the addresses, and geocode can also return the geoscore which gives you an estimate of accuracy of the match.

Once you have the latitudes and longitudes you can use the traveltime command to find the distance between points AND the travel time. What is really cool is that it can be driving, walking, or public transport time.

These are probably really useful for a lot of hospital based studies, and other things. Either way check out the help documentation to learn more.

All the best!!! - Hat tip to Mike Harhay who put me onto this.

Thursday, June 23, 2011

Two cool new packages for R

Let's say you have some data stored in a primate-tive format like paper. But you'd like to get it into something a little more evolved. A new R package called digitize lets you do just that. Click a few points to calibrate the axis, and all your new shiny scatterplot points will be stored as real digital data. Not a tool you'll use often, but invaluable when you need it.

If you've ever monkeyed around with ArcGIS, you'll know that it produces pretty maps. Unfortunately its interface is terrible, it crashes frequently, and it's not very easy to automate. R, on the other hand, does not crash and is ea to automate, but its maps are pretty ugly. Enter rworldmap, a package which produces pretty world maps like this:

It's a marked improvement. Read more in the R Journal.

Thursday, June 9, 2011

We are now a part of

R-bloggers is a site that aggregates many of the best R blogs on the internet. We're glad they've allowed our R-related posts to be aggregated there. If you mainly write in R, it's worth checking them out.

R: Speeding things up

R is many things, but it's not exactly speedy like a Patas Monkey. In fact, while it is much faster than many other solutions, R is notably slower than Stata (even inspiring talks that it should be rewritten from scratch!).

Fortunately, Radford Neal has been hard at work speeding R up, and has released some new patches to play with if you find it too slow. You can also try writing key sections in C++, or using Revolution Analytics' offerings (free for academics).

For extreme speed needs, however, R can't be beat, as it has long offered graphics-card based extreme parallelism that commercial solutions are only beginning to match.

Of course, for more prosaic needs, focusing on vectorizing key operations can solve speed troubles. And it's worth noting that the $1,000+ per copy that Stata costs can buy an awful lot of extra processing power to throw at the problem.