Posts Tagged random variables

Probability distributions

| Gabriel |

I wrote this little demo for my stats class to show how normal distributions result from complex processes that sum the constituent parts whereas count distributions result from complex processes where a single constituent failure is catastrophic.

*this do-file is a simple demo of how you statistical distributions are built up from additive vs sudden-death causation
*this is entirely based on a simulated coin toss -- the function "round(uniform())"
*one either counts how many heads out of 10 tosses or how long a streak of heads lasts
*I'm building up from this simple function for pedagogical purposes, in actual programming there are much more direct functions like rnormal()

*1. The normal distribution
*Failure is an additive setback
clear
set obs 1000
forvalues var=1/10 {
	quietly gen x`var'=.
}
forvalues row=1/1000 {
	forvalues var=1/10 {
		quietly replace x`var'=round(uniform()) in `row'
	}
}

gen sumheads=x1+x2+x3+x4+x5+x6+x7+x8+x9+x10
order sumheads
lab var sumheads "How Many Heads Out of 10 Flips"
*show five examples
list in 1/5
histogram sumheads, discrete normal
graph export sumheads.png, replace

*2. Count distribution
*Failure is catastrophic
clear
set obs 1000
forvalues var=1/30 {
	quietly gen x`var'=.
}
gen streak=0
lab var streak "consecutive heads before first tails"
gen fail=0
forvalues row=1/1000 {
	forvalues var=1/30 {
		quietly replace x`var'=round(uniform()) in `row'
		quietly replace fail=1 if x`var'==0
		quietly replace streak=`var' if fail==0
	}
	quietly replace fail=. in `row'/`row'
}
quietly replace streak=0 if x1==0 

*show five partial examples
list streak x1 x2 x3 x4 x5 in 1/5
histogram streak, discrete
graph export streakheads.png, replace

*have a nice day

Add comment October 12, 2009

so random

| Gabriel |

Nate Silver at 538 has accused Strategic Vision of fudging their numbers and his argument is simply that few of their estimates end in “0″ or “5″ and a lot of them end in “7.” The reason this is meaningful is that there’s a big difference between random and the perception of random. A true random number generator will give you nearly equal frequency of trailing digits “0″ and “7,” but to a human being a number ending in “7″ seems more random than one ending in “0.” Likewise clusters occur in randomness but human beings see clustering as suspicious. A scatterplot of two random variables drawn from a uniform has a lot of dense and sparse patches but people expect it to look like a slightly off-kilter lattice. That is, we intuitively can’t understand that there is a difference between a uniform distribution and a random variable drawn from a uniform distribution.

This reminded me of two passages from literature. One is in Silence of the Lambs when Hannibal Lector tells Clarice that the locations of Buffalo Bill’s crime scenes is “desperately random, like the elaborations of a bad liar.” The other is from Stephenson’s Cryptonomicon, where a mathematician explains how he broke a theoretically perfect encryption scheme:

That is true in theory, … In practice, this is only true if the letters that make up the one-time pad are chosen perfectly randomly … An English speaker is accustomed to a certain frequency distribution of letters. He expects to see a great many e’s t’s, and a’s, and not so many z’s and q’s and x’s. So if such a person were using some supposedly random algorithm to generate the letters, he would be subconsciously irritated every time a z or an x came up, and, conversely, soothed by the appearance of e or t. Over time, this might skew the frequency distribution.

Going a little bit further afield, in a recent bloggingheads, Knobe and Morewidge discuss the latter’s psych lab research on various issues, including how people tend to ascribe misfortune to malicious agency but fortune to chance. They then note that this is the opposite of how we tend to talk about God, seeing fortune as divine agency and misfortune as random. This is true for Americans, but this has less to do with human nature than with the unusual nature of the Abrahamic religions.*

Ironically, the lab research is pretty consistent with the modal human religious experience — animism organized around a “do ut des” relationship with innumerable spirits that control every aspect of the natural world. Most noteworthy is that much of this worship appears aimed not at some special positive favor but at getting the gods to leave you alone. So the Romans had sacrifices and festivals to appease gods like Robigus, the god of mold, and Cato the Elder’s De Agricultura explains things like how when you clear a grove of trees you need to sacrifice a pig to the fairies who lived in the trees so they don’t haunt the farm. These religious practices seem pretty clearly derived from a human tendency to treat misfortune as the result of agency and to generalize this to supernatural agency, absent cultural traditions to the contrary.

—————-

*I generally get pretty frustrated with people who talk about religion and human nature proceeding from the assumption that ethical monotheism and atheism are the basic alternatives. Appreciating that historically and pre-historically most human beings have been animists makes the spandrel theory of hyper-sensitive agency-detection much more plausible than the group-selectionist theory of solidarity and intra-group altruism.

Add comment September 28, 2009

St with shared frailty only

| Gabriel |

Several Stata commands in the xt family allow you to specify a random model (i.e., structured error terms) with no fixed model (i.e., independent variables). For instance:

xtreg y, re i(clustervar)
xtmixed y || clustervar:
gllamm y, i(clustervar)

This is very useful if the only thing you’re interested in is rho, the proportion of variance clustered within groups. To take a classic example of multilevel modeling, you might have test score data on students by classroom and you may be interested simply in how much good performance clusters by classroom (rho) before you get to independent variables like whether teacher credentials or class size matter.

In the [st] syntax, shared frailty is closely analogous to random effects (and strata are analogous to fixed effects). However unlike most xt commands, the st syntax expects there to be independent variables and it chokes if it doesn’t get them. Fortunately this is not a limitation of the model, only the syntax parsing, and you can trick Stata by feeding it a constant. It drops the constant from the model and estimates only the shared frailty. For instance, this model shows only the extent to which radio station adoptions of a particular song clustered by the stations’ corporate owners:

. gen x1=1

. streg x1, shared(owner_n) distribution(exponential)

Note: frailty(gamma) assumed.

         failure _d:  add
   analysis time _t:  (fpdate-origin)
             origin:  time firstevent
                 id:  station_n
note: x1 dropped because of collinearity

Fitting exponential model:

Iteration 0:   log likelihood =  -260.2503
Iteration 1:   log likelihood = -252.85211
Iteration 2:   log likelihood = -246.83872
Iteration 3:   log likelihood = -246.28281
Iteration 4:   log likelihood = -246.10157
Iteration 5:   log likelihood = -246.10117
Iteration 6:   log likelihood = -246.10117  

Exponential regression --
         log relative-hazard form               Number of obs      =       171
         Gamma shared frailty                   Number of groups   =        46
Group variable: owner_n

No. of subjects =          171                  Obs per group: min =         1
No. of failures =          164                                 avg =  3.717391
Time at risk    =         3739                                 max =        58

                                                F(   0,      .)    =         .
Log likelihood  =   -246.10117                  Prob > F           =         .

------------------------------------------------------------------------------
          _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
-------------+----------------------------------------------------------------
     /ln_the |  -1.792063   .4458721    -4.02   0.000    -2.665956   -.9181693
-------------+----------------------------------------------------------------
       theta |   .1666161   .0742895                      .0695329    .3992493
------------------------------------------------------------------------------
Likelihood-ratio test of theta=0: chibar2(01) =    15.60 Prob>=chibar2 = 0.000

1 comment April 2, 2009


The Culture Geeks

Tags

bayesian cleaning culture diffusion economics economic sociology ethnomethodology financial crisis graphs history IMDB loops lyx macros networks phenomenology philosophy of science R random variables regular expressions resampling shell sociology of organizations sociology of science st Stata superstar text editor typesetting

Archives

Recent Posts

Recent Comments

Blogroll