Simulations, numlist, and order of operations

April 26, 2011 at 4:46 am 2 comments

| Gabriel |

I’ve been programming another simulation and as is typical am batching it through various combinations of parameter values, recording the results each time. In making such heavy (and recursive) use of the forvalues loop I noticed some issues with numlist and orders of operation in algorithms.

First, Stata’s numlist expression (as in the “forvalues” syntax) introduces weird rounding errors, especially if specified as fractions. Thus it is preferable to count by integers then scale down to the fractional value within the loop. This is also useful if you want to save each run of the simulation as a file as it lets you avoid fractional filenames.

So instead of this:

forvalues i=0(.01)1 {
	replace x=sin(`i')
	save sin`i'.dta, replace
}

Do this:

forvalues i=0/100 {
	local i_scaled=`i'/100
	replace x=sin(`i_scaled')
	save sin`i'.dta, replace
}

Another issue with numlist is that it can introduce infintessimal errors so that evaluating “1==1” comes back false. If you have a situation like this you need to make the comparison operator fuzzy. So instead of just writing the expression “if y==x” you would use the expression

if y>x-.0001 & y<x+.0001

Finally, I’ve noticed that when you are running nested loops the number of operations grows exponentially and so it makes a big difference in what order you do things. In particular, you want to arrange operations so they are repeated the least numbers of times. For instance, suppose you have batched a simulation over three parameters (x, y, and z) and saved each combination in its own dataset with the convention “results_x_y_z” and you wish to append the results in such a way that the parameter values are variables in the new appended dataset. The simple (but slow) way to run the append is like this:

clear
gen x=.
gen y=.
gen z=.
forvalues x=1/100 {
	forvalues y=1/100 {
		forvalues z=1/100 {
			append using results_`x'_`y'_`z'
			recode x .=`x'
			recode y .=`y'
			recode z .=`z'
		}
	}
}

Unfortunately this is really slow. The following code has the same number of lines but it involves about half as many operations for the computer to do. In the first version there are four commands that are each run 100^3 times. The second version has two commands that run 100^3 times, one command that runs 100^2 times, and one command that runs 100 times.

clear
gen x=.
gen y=.
gen z=.
forvalues x=1/100 {
	forvalues y=1/100 {
		forvalues z=1/100 {
			append using results_`x'_`y'_`z'
			recode z .=`z'
		}
		recode y .=`y'
	}
	recode x .=`x'
}

Entry filed under: Uncategorized. Tags: , .

Thanks Kate Another R bleg about loops and graph devices

2 Comments

  • 1. Nick Cox  |  April 26, 2011 at 12:14 pm

    The problem of looping over non-integers was discussed in

    SJ-10-1 pr0051 . . . . . . . . . . . . Stata tip 85: Looping over nonintegers
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q1/10 SJ 10(1):160–163 (no commands)
    tip on using forvalues to loop over numbers

    The executive summary is: Don’t. That is, as Gabriel explains, loop over integers and map to desired non-integers within the loop.

    There is nothing weird about the precision errors if you think about them carefully. Stata cannot hold decimals _exactly_ if they do not have exact binary equivalents. That means .1, .2, .3, .4, .6, .7, .8, .9, etc. This is inevitable given that at machine level a computer is binary, as the first lesson on computers should explain!

    If you consider the code

    forvalues i=0/100 {
    local i_scaled=`i’/100
    replace x=sin(`i_scaled’)
    save sin`i’.dta, replace
    }

    you can slim this down to

    forvalues i=0/100 {
    replace x=sin(`i’/100)
    save sin`i’.dta, replace
    }

    This is called “cutting out the middle macro”.

    • 2. gabrielrossman  |  April 26, 2011 at 12:21 pm

      thanks Nick, i knew you’d written something on this but i forgot to look up the citation.


The Culture Geeks


%d bloggers like this: