Archive for June, 2009

MDC code

| Gabriel |

A reader requested that I post some code relevant to the multilevel diffusion curve (MDC) method I published in Sociological Methodology. I have code for both the more primitive techniques we discuss in the lit review and our new MDC technique, but neither script is as elegant as it should be.

I’ve already posted code to do the precursor approach by Edwin Mansfield, though I recently learned some matrix syntax that will let me rewrite it to run much more cleanly when I find a chance to do so. The problem with the current version is that it makes extensive use of writing to disk and POSIX commands via “shell.” On Mac/Linux this is ugly but perfectly functional, but it won’t work at all on Windows (at least not without CygWin). I hope to rewrite it to be more elegant and completely self-contained in Stata, but this is a luxury as the current ugly version works on my Mac.

Likewise, I have code (posted below) to do MDC, but it’s also less than ideal. MDC doesn’t regress anything interesting directly, but first runs a regression (“table 2” in the paper) and then uses the quadratic equation to make the results intelligible (“table 3” in the paper). The problem is that my Stata code only does the first step. To do the second half you need to take the output and put it in this Excel spreadsheet. I’m hoping to rewrite it so that the command produces useful output directly but this is easier said than done as it requires a lot of saving returned results, matrix multiplication, and other things that are somewhat difficult to program.

Anyway, in the meantime, here’s the code. It follows a syntax similar to xtreg. In addition to “i” you also specify “nt” which means adoptions to date.

capture program drop mdcrun
program define mdcrun
	set more off
	syntax varlist , i(string asis) nt(string asis) 

	disp "This code gives information which must be interpreted"
	disp " with the spreadsheet at http://www.sscnet.ucla.edu/08F/soc210a-1/mdc.xls"
	disp "comments in this output give hints on how to use the spreadsheet"

	gettoken first varlist : varlist

	preserve
	gen cons=1
	foreach var in `varlist' cons {
		quietly gen `var'_1=`nt'*`var'
		quietly gen `var'_2=`nt'*`nt'*`var'
	}

	foreach var in `varlist' cons {
		local varlist_ext="`varlist_ext' `var' `var'_1 `var'_2"
	}

	* create `varlist_ext' as an alternate varlist macro that has the interactions

	disp "-------------------------------"
	disp "Columns M+J, mean and sd"
	sum `varlist'

	disp "-------------------------------"
	disp "put the baseline beta+sd model in J7:N5"
	xtreg `first' `varlist', re i(`i')
	disp "-------------------------------"
	disp "coefficients are vars + interactions with nt and nt^2"
	disp "additive beta+se in J and K"
	disp "var_1 beta+se in L and N"
	disp "var_2 beta+se in T and V"
	disp "Please see AC-AJ for interpretation"
	xtreg `first' `varlist_ext', re i(`i')
	disp "-------------------------------"
	disp "For citation and help with theory/interpretation, see"
	disp `"Rossman, Chiu, and Mol. 2008. "Modeling Diffusions of"'
	disp "Multiple Innovations Via Multilevel Diffusion Curves:"
	disp `"Payola in Pop Music Radio" Sociological Methodology"'
	disp "38:201-230."
	restore

end

June 16, 2009 at 5:04 am 1 comment

Clara est la fille de Sophie

| Gabriel |

As referenced at Montclair and Contexts-Graphic, there’s been some interesting work on French baby names. Of course the application of name data to diffusion was seriously kicked off by Lieberson’s book on American baby names. One of the most basic findings of these studies is that names are so fashion-prone that you can practically use them to carbon date birth cohort, especially for women. For instance I was born in 1977 and all the girls I grew up with had Hebrew names (Elizabeth, Rachel, Sarah). My daughter was born in 2007 and, at least in our social class, all the girls have Victorian names (Frances, Rose, Lillian). It seems like every girl born in the 1980s and 1990s has a Celtic name (Britney, Erin, Caitlin).

One piece of research I haven’t seen discussed in the soc-blogs is the recent Berger and LeMens PNAS article, which uses data from both countries. This article basically argues that names with extremely rapid rise are stigmatized as faddish and are thereafter dropped from the culture’s active repertoire. I loved this article, and as I’ve argued before, we need more studies of abandonment.

June 15, 2009 at 1:03 am

Delenda Affectus Epistula Est

| Gabriel |

The new version of Skype has a fantastic “screen sharing” feature but an incredibly annoying feature called mood messages. This feature constantly gives you updates on your contacts, including even what they happen to be listening to on iTunes and other status updates they choose to post. Not only does it make this information accessible should you choose to look for it, but it shows up as a history event, so I’m constantly thinking I missed a phone call or something important, only to find out that it’s just that one of my friends is listening to another song. Personally, I don’t feel the need for all my friends and colleagues to know what song I’m listening to or when I’m using the toilet, nor do I care to know the same about them. If I wanted a ubiquitous adolescent stream of narcissistic micro-banalities I’d already be using Twitter.

Anyway, this feature is turned on by default so I’m describing how to turn it off and return Skype to being what it ought to be, a dignified and professional tool for video conferencing, and not another technologically enabled manifestation of the erosion of personal space through incessant distraction. First, regain your own privacy by going to “Preferences” and unselecting “Enable Mood Message Chat.” Second, ignore the extroversion of your contacts by right-clicking within the mood message pseudo-chat window and select “Chat Notification Settings.” In this screen choose “Do Not Notify Me” and “Mark unread messages as read immediately.”

Your dignity and privacy has now returned, enjoy it.

June 14, 2009 at 9:27 pm 4 comments

The fat tail

| Gabriel |

At Slate XX, Virginia Postrel has an article explaining why women’s clothing sizing hasn’t kept up with the increasingly large American woman herself. This is often explained as an indulgence of taste (designers don’t like making clothes for people they find unattractive) or a Podolny-esque status thing (serving stigmatized customers kills your brand). Postrel doesn’t buy any of that, taking the Gary Becker economics-of-discrimination line that some entrepreneur should be filling this demand unless there is some good business reason not to. (I think the “taste” and “status” things are very plausible at the high-end, but I agree with Postrel about “the customer is always right” for the mass market). Her argument is that it’s about the cost of fabric (boring) and much more interestingly, the right-skewed distribution for size. I think this is worth unpacking and ruminating on because it’s a good example of how it’s more useful to think about distributions than just central tendencies (as people often find more natural).

While (within gender) height follows a normal distribution pretty closely, weight has a right skew. Here’s what that means. The medical people tell us that the ideal BMI is about 22, give or take a few points, which works out to about 130 lbs (+/- 20) for a 5’4″ woman. Now a woman at the first percentile for BMI is going to weigh about 90 lbs. On the other hand, a woman as fat (in percentile terms) as Victoria Beckham is thin is going to weigh over 220 lbs, probably more.

So relative to healthy, extremely thin is about 40 lbs off whereas extremely fat is at least 90 lbs off. If “healthy” is  a reasonable approximation for the median, then for any given point of weight to the left of the median there’s going to be about twice as much density (in the statistical sense) than a comparable point to the right of the median. In other words (to paraphrase Tolstoy) thin women are all alike; every fat woman is fat in her own way. Postrel’s argument is that to the extent that clothing is meant to be tailored to be a pretty close fit for your body then any given plus size will fit few people even if many people fit some plus size.

If you think of clothing sizes as analogous to binning a distribution to plot it as a histogram, any given bin on the skewed side of a distribution will have less density than a bin at the opposite percentile. Postrel notes that there are certain design costs and inventory costs associated with keeping a size in stock and so if it takes sizes 16 and 17 combined to equal the sales just of size 5, then it’s rational for companies to consider dropping their plus sizes, even though in the aggregate they serve a lot of paying customers.

This makes a fair amount of sense, but I wonder about the extent to which it relies on the assumption that the breadth of a size is always a constant range, say +/- 3lbs from some target customer. For all I know this is how clothes are sized and ought to be sized, but I wonder if it is the practice for sizing to have a wider tolerance at higher weights. In quant work when we have a right-skewed distribution we often log the variable. What this effectively does is make the raw scale bin width a function of x, so as you get higher on x the bins get wider on the raw scale even though the bins are all the same width on the log scale. I can think of two substantive reasons why it might be appropriate to imagine any given plus clothing size encompassing a wider range of weight than any given petite clothing size.

First, there might be a taste difference where thin people tended to prefer tighter-fitting clothes and fat people looser fitting clothes, especially for things like jeans. Since loose clothes are more forgiving of fit then it would make sense to have broader plus sizes. You see a similar thing in that people with short hair get it cut much more often than people with long hair. I have very short hair and when I think “I need a hair cut,” I’m thinking something closer to “my hair is 30% longer than it should be” rather than “my hair is one inch longer than it should be.”

Second is that maybe we shouldn’t be thinking about clothing sizes in pounds at all, but something like inches (which is how men’s clothes are sized). In this case, geometry diminishes the skew. If you imagine the radius of a human being in cross-section, that person’s weight is approximately pi*r2*height whereas that person’s waist size is approximately 2*pi*r. The squared term for weight means that weight will be more right-skewed than circumference. These are the same people, but depending on how you measure “size” the distribution may be skewed or it might be symmetrical. This is actually a big deal generally in statistics since assuming the wrong distribution for a variable can lead to weird distributions for the error term. Hence good statistical practice either transforms skewed variables as part of the data cleaning or uses “count” analyses like Poisson and negative-binomial that are designed to work with skewed distributions. There’s also a more basic theoretical question of whether the skewed variable is even the right operationalization. If we’re interested in the size of a person is weight better than waist size? If we’re interested in the size of an organization is number of employees better than length of the chain of command? In both cases the answer is that it depends on what you’re trying to explain.

Anyway, if either for reasons of taste or reasons of geometry, a size 0 has less tolerance (as measured in pounds) than a size 8, which in turn has less tolerance than a size 16, then this could partially compensate for the dynamic Postrel is describing. However note that, unlike me, she actually talked to clothiers so take her data over my armchair speculation.

June 12, 2009 at 5:17 am 2 comments

Speaking of gold

| Gabriel |

This guy could have saved himself a lot of trouble if he read Cryptonomicon (the boring parts that take place in the 90s, not the hilarious parts in WW2) and saw who the customer base is for an anonymous banking/currency system. This case raises some interesting questions about the role of the state in providing the infrastructure of the market economy and what happens when someone tries to privatize some of these functions (providing currency) and avoid others (disclosure/surveillance).

June 11, 2009 at 1:33 pm 2 comments

Fool’s Gold and Organizational Inertia

| Pierre |

Daniel Beunza at Socializing Finance links to Donald MacKenzie’s LRB review of Fool’s Gold, Gillian Tett’s new book on the financial crisis. I just finished reading the book, and I can only recommend it. Tett is an editor at the Financial Times; she also has a PhD in anthropology from Cambridge, which probably explains why the book somehow reads more like an econ-soc analysis of the crisis than as a journalistic account. In his review piece, MacKenzie gives a clear and detailed overview of the book’s main argument — and as such, his review is one of the best and most accessible accounts of the recent developments in structured finance I’ve read so far. But there’s a point that bothered me in the book, and which MacKenzie doesn’t seem to touch on:

Tett tells us about the crisis mostly from the standpoint of bankers and credit derivative specialists at J.P. Morgan — a bank which, by and large, stayed out the mortgage-backed securities mess and emerged relatively unscathed from the crisis. There’s nothing suprising about this: it is probably easier to find people willing to be interviewed on the crisis at J.P. Morgan nowadays than at, say, Citigroup or AIG. But this angle is precisely what makes the story so fascinating. The book starts from a simple but puzzling observation: the credit instruments at the center of the crisis (those pools of loans that were sliced into multiple tranches with distinct risk levels, which were in turn sold with overly optimistic ratings, aka collateralized debt obligations or CDOs) originated not from the home mortgage market but from the corporate bond market. Arguably, the crisis was largely caused by the belief that the same structured products could easily be used to transfer home mortgage risk away from banks’ balance sheets—although estimating the risk of CDO tranches turned out to be much more complex for mortgages than for corporate debt (in particular, there wasn’t any reliable data allowing to estimate the correlation of default probabilities). But surprisingly, the pioneer and leader in corporate debt CDOs, J.P. Morgan, decided not to further their advantage in structured finance: instead of moving into the mortgage-backed securities market and applying the same recipes they had just developed for corporate debt on a much wider scale, J.P. Morgan largely stayed out the market. Incentives were there: J.P. Morgan had expertise in such structured products; investment banks could collect enormous fees for underwriting CDOs and the market was booming; but JP Morgan executives, Tett observes, stayed on the sidelines and were even puzzled by the development of the market. So, what happened?

Tett’s account is essentially an organizational story. She argues that the prevailing culture at J.P. Morgan (a rather old-fashioned, boring and elitist institution) favored more prudent risk management strategies and more effective oversight than at other banks. This is an interesting hypothesis, but it may be, in part, the product of Tett’s methodology: the evidence supporting this argument comes mostly from interviews with J.P. Morgan executives — which are of course subject to response bias, recall bias and so forth. MacKenzie seems to generally agree with Tett’s thesis: what made J.P. Morgan different from other banks was the foresight of its management. Maybe reading too much organizational sociology has made me more cynical than I should be, but there’s an alternative explanation that Tett never fully explores or dismisses: organizational inertia. At the begining of the housing bubble, J.P. Morgan was specialized in corporate debt, and had no experience with home mortgages unlike BoA or Citi (J.P. Morgan was later absorbed by Chase Manhattan — but even then, only Chase got involved in mortgage CDOs). I do not doubt that organizational culture played an important role, but I did not see much evidence in the book that J.P. Morgan’s corporate culture (coupled with its expertise in structured finance) made its management more aware of the fragility of the mortgage CDOs than its competitors — if this were truly the case, one would have expected the bank to bet against the market by massively shorting CDO indexes, as a few hedge funds did. The alternative hypothesis, of course, is that organizational culture contributes to organizational inertia — and while it did not necessarily make J.P. Morgan’s executives more prudent or more aware of the risks inherent in the mortgage market, it may have prevented the bank from taking positions (long or short) in a segment of the industry it did not belong to.

June 10, 2009 at 5:43 pm 1 comment

Field-tagged data

| Gabriel |

Most of the datasets we deal with are rectangular in that the variables are always in the same order (whether they are free or fixed) and the records are delimited with a carriage return. A data format that’s less familiar to us but actually quite common in other applications is the field-tagged format. Examples are the BibTex citation database format. Likewise, some of the files in IMDB are a weird hybrid of rectangular and field-tagged. If data formats were human languages and sentences were data records, rectangular formats would be word order syntax (like English) and field-tagged formats would be case marker syntax (like Latin or German). (Yes, I have a bad habit of making overly complicated metaphors make sense only to me.)

In rectangular formats like field-delimited data (.csv or .tsv) or fixed-width data (.prn) you have one record per row and the same variables in the same order for each row, with the variables being either separated by a delimiter (usually comma or tab) or fixed-width with each variable being defined in the data dictionary as columns x-y (which was a really good idea back when we used punch cards, you know, to keep track of our dinosaur herds). In contrast with a field-tagged format, each record spans multiple rows and the first row contains the key that identifies the data. Subsequent rows usually begin with a tab, then a tag that identifies the name of the variable, followed by a delimiter and finally the actually content of the variable for that case. The beginning and end of the record are flagged with special characters. For example here’s a BibTex entry:

@book{vogel_entertainment_2007,
	address = {Cambridge},
	edition = {7th ed.},
	title = {Entertainment Industry Economics: A Guide for Financial Analysis},
	isbn = {9780521874854},
	publisher = {Cambridge University Press},
	author = {Harold Vogel},
	year = {2007}
},

The first thought is, why would anyone want to organize data this way? It certainly doesn’t make it easier to load into Stata (and even if it’s less difficult in R it’s still going to be harder than doing a csv). Basically the reasons people use field-tagged data are that it’s more human-readable / human-editable (a lot of people write BibTex files by hand, although personally I find it easier to let Zotero do it). Not only do you not have to remember what the fifth variable is, but you have more flexibility with things like “comment” fields which can be any length and have internal carriage returns. This is obviously a nice feature for a citation database as it means you can keep detailed notes directly in the file. Furthermore, they are good with situations where you have a lot of “missing data.” BibTex entries can potentially have dozens of variables but most works only require a few of them. For instance the Vogel citation only has eight fields and most of the other potential fields, things like translator, editor, journal title, series title, etc., are appropriately “missing” because they are simply not applicable to this book. It saves a lot of whitespace in the file just to omit these fields entirely rather than having them in but coded as missing (which is what you’d have to do to format BibTex as rectangular).

Nonetheless, if you want to get it into Stata, you need to shoehorn it into rectangular format. Perhaps this is all possible to handle with the “infile” command but last time I tried I couldn’t figure it out. (Comments are welcome if anyone actually knows how to do this). The very clumsy hack I use for these kind of data is to use a text editor to do a regular expression search that first deletes everything but the record key and the variable I want. I then do another search to convert carriage returns to tabs for lines beginning with the record key. I now have a rectangular dataset with the key and one variable. I can save this and get it into Stata. This is a totally insane example both because I can’t imagine why you’d want citation data in Stata and also because there are easier ways to do this (like export filters in citation software) but imagine that you wanted to get “year” and “author” out of a BibTex file and make it rectangular. You would want to run the following regexp patterns through a text editor (or write them into a perl script if you planned on doing it regularly):

^\t[^(year)|@].+\r

Sometimes this is all you need, but what if you want several variables. Basically, rinse, wash repeat until you have one file per variable then you can merge them in Stata. The reason you need a separate file for each variable is because otherwise it’s really easy to get your variables switched around. Because field-tagged formats are so forgiving about having variables in arbitrary orders or missing altogether, when you try to turn it into rectangular you’ll get a lot of values in the wrong column.

June 10, 2009 at 5:52 am 5 comments

we can draw the line some other time

| Gabriel |

Lots of people have been talking about the NY Times article on Williamsburg trustafarians suddenly facing reality. A lot of the commentary has been of the “this is the world’s tiniest violin” variety. I have two thoughts.

1. It’s amazing how pretty much all of these people have completely unremunerative creative-sector careers supplemented by service sector work. The article describes a musician, a writer, and a “designer wallpaper” entrepreneur (good luck with that). Of course this actually shouldn’t be surprising to me since I lecture my soc of culture undergrads for about a half hour on the “starving artist” phenomenon. Here’s the elevator version of that lecture.

Lots more people want to be artists than there is demand for and this shift in the supply schedule depresses the price, which is why the median artist makes less than you’d predict from his education. There are two theories as to why (unlike other workers in a similar situation) artists don’t then quit and get jobs that other people are actually willing to pay them for. The self-subsidy theory says that artistic work is really a form of leisure consumption enjoyed by those who can afford it, often by drawing on family resources. The tournament model theory says that the opportunity cost of low wages now is buying entry to a tournament, the winner of which enjoys the kind of decadent lifestyle that could only have been dreamt of by Caligula but is in fact enjoyed by the Rolling Stones even as they fade into vampiric living death. Interestingly, both theories make the (accurate) prediction that this is highly tied to the life course and most (unsuccessful) artists will seek more mainstream employment when they enter prime fertility years.

2. Wow, check out Ms. Calvert’s polyester dress. The elaborate and very ironic backstory for this garment writes itself.

June 9, 2009 at 11:29 pm

Scraping 101

| Gabriel |

One of the great things about being a cultural sociologist is that there’s so much data out there for the taking. However much of it is ephemeral so you need to know how to get it quickly and effortlessly.

In grad school I spent hundreds of hours dragging and dropping radio data from IE to Excel. While this was relaxing and gave me a Sisyphean sense of accomplishment, it was otherwise a waste of time as there are much more efficient ways to do this. The two most basic things to know are the Unix commands “cron” (a scheduling daemon) and “curl” (an html scraper). This will be most effective if you have a computer or server that’s always on such as a server. Also note that while I don’t think there’s anything dangerous involved here, it does involve going “sudo” (ie, taking the safety off of UNIX) so to be cautious I suggest bracketing it from your main computer, either by doing it inside a virtual machine or by putting it on a dedicated user account created only for this purpose. (Even though all this should work on my Mac I created and debugged it with a Ubuntu virtual machine running and plan to ultimately run it on my old Ubuntu desktop).

First you need to make sure curl is installed. In Ubuntu go to the terminal and type

sudo apt-get install curl

Then you need to create a directory to store the data and keep the script. I decided to call the directory “~/scrape” and the script “~/scrape/scrapescript.sh” You can create the shell script with any text editor. After the comments, the first thing in the script should be to create a timestamp variable. (I’m using “YYYYMMDD” to make it easier to sort). This will let you create new versions of each file rather than overwriting the old one each time. Then you use a series of cURL commands to scrape various websites.

# this script scrapes the following URLs and saves them to timestamped files
TIMESTAMP=`date '+%Y%m%d_%H%M'`
curl -o ~/scrape/cc_$TIMESTAMP.htm https://codeandculture.wordpress.com
curl -o ~/scrape/se_$TIMESTAMP.htm http://soc2econ.wordpress.com

The next step is to use the terminal to make the shell script executable.

sudo chmod +x ~/scrape/scrapescript.sh

Finally, you need to cron it so it runs by itself at regular intervals. In Ubuntu you go to the terminal and type

crontab -e

You then get a choice of terminal-based text editors. If you’re not a masochist you’ll choose nano (which is very familiar if you’ve ever used pine mail).  Then you have to add a line scheduling the job. For instance, if you want it to run once a day, just before midnight you would enter

59 11 * * * ~/scrape/scrapescript.sh

That’s all you really need but if you don’t already back up the entire account you might want to add a cp or rsync command to cron to back up your scrapes every day or every week.

Once you have all this data collected you’ll need to clean it, probably with regular expressions in perl but TextWrangler (on the mac) is about as good as an interactive GUI program can be for such a thing. Also note that this is going to produce a lot of files and so after you clean them you’re going to want a Stata script that can recognize a large batch of files and loop them.

June 9, 2009 at 5:59 am 4 comments

Off by 50 or off by 10?

| Gabriel |

Via MR, the NY Times has an article noting that two models of swine flu drastically under-estimated the spread of the epidemic. It notes that the actual number of American cases is something like 100,000 but the estimates were about 2,000. The natural inference is to think that they were off by a laughably wild factor of 50. However this just shows how hard it is to think about nonlinearity. The authors of the original predictions blamed the error on an underestimate of the number of infections seeding the system.

This sounds very plausible to me and in fact I can demonstrate just how sensitive contagion models are to such assumptions as the number of seed values. Here’s a graph of two contagions. Although the assumptions vary by an order of magnitude, they diverge by even more.

projection_pooled

Here’s are some of the simplifying assumptions:

  • the diffusion follows a Bass model
  • after the intial cases there are no exogenous infections (e.g., from foreign travel)
  • the population is homogenous population with no network structure

Here’s the Stata code so anyone with a copy of Stata should be able to replicate this and even fiddle with the parameters.

capture program drop bassproject
program define bassproject
	set more off
	syntax anything(name=commandinput) [, graphoff nosave]
	local match=regexm("`commandinput'","a ?\(([^\)]+)\) ?")
	if `match'==1 {
		local a=regexs(1)
	}
	else {
		local a=0
	}
	local match=regexm("`commandinput'","b ?\(([0-9]*\.?[0-9]*)\) ?")
	if `match'==1 {
		local b=regexs(1)
	}
	else {
		local b=0
	}
	local match=regexm("`commandinput'","[nN]?max ?\(([0-9]*\.?[0-9]*)\) ?")
	if `match'==1 {
		local nmax=regexs(1)
	}
	else {
		local nmax=1
	}
	local match=regexm("`commandinput'","seed ?\(([0-9]*\.?[0-9]*)\) ?")
	if `match'==1 {
		local seed=regexs(1)
	}
	else {
		local seed=0.01
	}
	local match=regexm("`commandinput'","periods ?\(([0-9]*\.?[0-9]*)\) ?")
	if `match'==1 {
		local periods=regexs(1)
 	}
	else {
		local periods=20
	}
	local match=regexm("`commandinput'","modeln?a?m?e? ?\(([^\)]+)\)")
	if `match'==1 {
		local modelname=regexs(1)
	}
	else {
		local model="model"
	}
	disp "model named(`modelname')."
	disp "delta_Nt=(`a' + `b' * Nt) * (`nmax' - Nt)"
	disp "projected over `periods' spells with N_0=`seed'"
	preserve
	clear
	set obs `periods'
	quietly gen t=[_n]-1
	quietly gen Nt=.
	quietly gen deltan=.
	quietly replace Nt=`seed' in 1
	quietly replace deltan=(`a' + (`b' * Nt)) * (`nmax' - Nt) in 1
	forvalues period=2/`periods' {
		quietly replace Nt=Nt[_n-1]+deltan[_n-1] in `period'
		quietly replace deltan=(`a' + (`b' * Nt)) * (`nmax' - Nt) in `period'
	}
	sort t
	ren Nt nt_`modelname'
	drop deltan
	if "`nosave'"~="nosave" {
		save projection_`modelname', replace
	}
	if "`graphoff'"~="graphoff" {
		twoway (line nt_`modelname' t, lwidth(thick)), ytitle(Saturation to Date) xtitle(Time) /* ylabel(none, nolabels)  xlabel(none, nolabels) */
		graph export projection_`modelname'.png, replace
	}
	clear
	restore
end

cd ~/Documents/codeandculture/blackswine
*the assumptions go in the following two lines. aside from periods, all #s should be between 1 and 0
bassproject a(0) b(.5) seed(.00001) nmax(1) periods (20) model(smallseed)
bassproject a(0) b(.5) seed(.0001) nmax(1) periods (20) model(bigseed)

use projection_bigseed.dta, clear
append using projection_smallseed
lab var nt_bigseed "Assume many seed"
lab var nt_smallseed "Assume few seed"
twoway (line nt_bigseed t, lwidth(thick)) (line nt_smallseed t, lwidth(thick)), ytitle(Saturation to Date) xtitle(Time) /* ylabel(none, nolabels)  xlabel(none, nolabels) */
graph export projection_pooled.png, replace

June 8, 2009 at 9:22 am 1 comment

Older Posts Newer Posts


The Culture Geeks