If you ever need to download a lot of data, use iMacros (http://wiki.imacros.net/Main_Page)
Recently I wanted to download a large public data sets for multiple years (NHANES) however this would have required a lot of manual downloading . For example, the 2007-2008 NHANES wave has 113 individual files and I wanted all the files from 1999-2010 so close to a thousand different files.
In order to do this I found a free browser automation tool iMacros that can automate anything that you do in a browser.
The other nice thing is that it can read in data from a .csv file to update what it has to do. So I just copied cut and pasted the names of the data files. Wrote eleven lines of code and off the program went, resulting in a a rich repeated cross sections of NHANES with close to 7000 different variables. Here's the code:
VERSION BUILD=7401110 RECORDER=FX
TAB T=1
TAB CLOSEALLOTHERS
URL GOTO=ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/nhanes/2001-2002/
SET !TIMEOUT 500
SET !DATASOURCE c:\nhanes_names.csv
SET !DATASOURCE_COLUMNS 1
SET !DATASOURCE_LINE {{!LOOP}}
ONDOWNLOAD FOLDER=* FILE={{!COL1}} WAIT=YES
TAG POS=1 TYPE=A ATTR=TXT:{{!COL1}} CONTENT={{!COL1}}
Play around with the tutorials, but it is real easy tool with minimal upfront cost but huge potential returns.
Friday, January 27, 2012
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment