Friday, June 24, 2011

STATA: Geographic data - cool new commands

I emailed this to most Wharton PhD health care students, but thought this was worthy posting here for others. There are two new commands in stata that allow you to link with google maps and turn addresses into latitudes and longitudes as well as calculate distances and travel times.

First type:

findit geocode

And install the two commands, geocode and travel time

What do these two commands do?

First geocode can take addresses, in many sorts of formats, and then return the latitude and longitude based on Google Geocoding. Because it is using Google the matches can be pretty good, there is flexibility on the addresses, and geocode can also return the geoscore which gives you an estimate of accuracy of the match.

Once you have the latitudes and longitudes you can use the traveltime command to find the distance between points AND the travel time. What is really cool is that it can be driving, walking, or public transport time.

These are probably really useful for a lot of hospital based studies, and other things. Either way check out the help documentation to learn more.

All the best!!! - Hat tip to Mike Harhay who put me onto this.


  1. Proceed with caution with this STATA TravelTime if you are working with unmasked patient address or unmasked patient latitude/longitude. This is likely well out of compliance with federal and state patient privacy laws such as HIPAA. Essentially you are sending Google unencrypted protected health information (PHI) identifiers. (Yes, patient address and or lat/long is PHI with or without name or ID.) Same goes with any online geocoding or geoprocessing tool. Check with your privacy officer and make sure the technology is approved with IRB.

    1. Is it adequate to anonymize the street numbers so that the last 2 digits are randomly transformed? This would still be within about a block of the true location but not reveal the actual building. Alternatively, can a true address be geocoded along with a list of 100 fake addresses?

      Lastly, does it matter if the addresses being geocoded are ambulance request locations, not necessarily a patient's home addresses?

    2. If you work with sensitive geo referenced data, a new stata command is now available which allows to calculate the travel time offline. The command is called 'osrmtime' and is available here:

      The paper can also be found here:

    3. The ado-files of the command osrmtime moved to this place:

  2. I don't see why patient address or lat/long would be "well out of compliance". Data is considered PHI when the personally identifiable information, such as patient address, is combined with health status, health payment or health provision information. Only then is it PHI. An address alone, pushed out to Google is not PHI.