The “by” prefix

May 12, 2009 at 8:29 am 1 comment

| Gabriel |
In the referrer logs, somebody apparently reached us by googling “tag last record for each individual in stata”. Thisis pretty easy to do with the “by” syntax, which applies the command within the cluster. This syntax is very useful for all sorts of cleaning tasks on multilevel data. Suppose we have data where “i” is nested within “j” and the numbers of “i” are ordinally meaningful. For instance, it could be longitudinal data where “j” is a person and “i” is a person-interview so the highest value of “i” is the most recent interview of “j”.

. use “/Users/rossman/Documents/oscars/IMDB_tiny_sqrt.dta”
. browse
. sort film
. browse
. gen x=0
. by film: replace x=1 if [_n]==[_N]
(16392 real changes made)
sort j i 
gen lastobs=0
by j: replace lastobs=1 if [_n]==[_N]

Entry filed under: Uncategorized. Tags: , .

Reply to All: Unsubscribe! Gretl

1 Comment

  • 1. edwin  |  June 20, 2009 at 7:24 am

    can be a one-liner:

    . bysort j (i): gen lastobs = _n==_N

The Culture Geeks

%d bloggers like this: