Tuesday, January 25, 2011

STATA: Tokenizing locals

There's a command called -tokenize-. Some people use it a lot, some people only a little. You could do everything it does with regular expressions if you reeeally wanted to, but it makes the whole process a bit easier. It works like this:
* First you run -tokenize- on a string you want to break up into pieces. E.g.:
local States "AL MI TN FL"
tokenize `States'
* Now every word (e.g. something separated by a space in that string) is stored in a series of macros `1' `2' `3'. Try it:
di "`1' `2' `3' `4' `5'"
* But what use is that, you ask? Ah, well you can use them one at a time:
while "`*'" != "" {
local ifrace "`ifrace' | race == `1'"
macro shift
}
keep if `ifrace'
* What's this funny `*' thing? And what the heck is -macro shift-? We'll start with the latter. -macro shift- does exactly what it sounds like: it "shifts" the entire stack of tokenized locals over by one. So the one that used to be `2' will now be `1', and so forth all the way down the line. The one that used to be `1' is vanished into the ether. The `*' local contains all the remaining token terms that haven't been shifted off the end until the ether yet. So that loop will essentially keep looping over all the words/terms in `States' until they're exhausted, then be done. Within the loop, you can do whatever you want with the contents.
* Note that to actually make that loop produce a working if statement, you'll need to remove the first |. You could do that either by putting the first -local ifrace...- and -macro shift- outside of the loop, or you could use a regular expression to remove the first | once the local is created.

No comments:

Post a Comment