Oscar Appeal
July 29, 2013 at 8:06 am gabrielrossman 5 comments
| Gabriel |
This post contains two Stata do-files for constructing the “Oscar appeal” variable at the center of Rossman & Schilke “Close But No Cigar.”
The first file takes IMDb files (cleaned and imported into Excel), reads them into Stata, and does various cleaning, most notably the calculation of lambda and tau.
*mergeorama.do *Stata do-file for Rossman & Schilke "Close But No Cigar" *Please direct any questions to rossman@soc.ucla.edu *This do-file imports raw data and precodes it, including coding lambda_it and tau_j *You can find the raw IMDb data at http://www.imdb.com/interfaces *This do-file assumes that you have pre-cleaned the IMDb files and stored them XLSX format *Note that IMDb files cannot be imported directly into Stata but require pre-cleaning for several reasons * 1. Many IMDb files have a field-tagged data structure whereas Stata expects rectangular (rows are cases, columns are variables) data. Even the rectangular artist-level datasets are odd, having a different format for the first row describing an artist than for subsequent rows * For Perl scripts useful for changing the data structure of IMDb files, please see * https://codeandculture.wordpress.com/2010/07/26/imdb_personnel-pl/ * https://codeandculture.wordpress.com/2010/03/31/gross-pl/ * 2. The IMDb files are extremely large and so for memory purposes it is desirable to drop as much data as possible using one-line-at-a-time approaches like Perl or Bash before using software like Stata or R that reads a file completely into memory *This one-liner drops most of the tv episodes, direct-to-video, video games, and pre-1970 films. It should be run *after* the step 1 scripts. *perl -ne '!/.+\) {.+/ && !/.+[0-9]\) \(T?VG?\)/ && !/.+\(19[0-6][0-9]\).*/ && print' foo.txt > bar.txt * 3. Stata's insheet command chokes on certain characters. (The usual error is to skip several hundred or thousand rows of data). Rather than identifying and eliminating all of these special characters, we find it easiest to pass the data through Excel as our tests show that this works reliably with no degradation in data quality or quantity. *The IMDb license prevents us from recirculating our copy of the data. Please contact rossman@soc.ucla.edu for more advice on pre-cleaning the data *Our file structure is unfortunately somewhat complicated but we always refer to paths through globals. There are no hard-coded paths so changing these globals to fit your own file system should solve everything global parentpath "~/Dropbox/oscarsperformativity/studiosystem" global oscarsfolder "~/Documents/oscars" global bigstuff "$oscarsfolder/studiosystemoverflow" global recent_dir "$oscarsfolder/dataapril2010/maria" global recent_eligibles "$recent_dir/filmtitles_Apr2010.txt" global recent_oscars "$recent_dir/oscars20062010.txt" global tables "$parentpath/tables" global rawimdbfiles "$oscarsfolder/dataapril2010" *These switches turn batches of code on and off. This both facilitates code-folding and debugging. For final run though make sure everything is turned on *switches local keywords 1 local genres 1 local mpaa 1 local pnomimport 1 local distributors 1 local releasedates 1 *options local minkwfreq 10 /*how many times must keyword occur throughout period to count*/ local window 5 /*how long is the time window for calculating y-hat, tau, etc */ local startdate 1985 /*when does the _analysis_ start ? (lagged covariates start earlier)*/ local enddate 2009 local prestart=`startdate'-`window' /*covariates like tau are lagged and so start before the analysis*/ capture log close log using $parentpath/mergeorama.log, replace cd $bigstuff *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *distributors *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `distributors'==1 { import excel "$oscarsfolder/dataapril2010/distributors_1970andup_noTV.xlsx", sheet("distributors_1970andup_noTV.txt") clear ren A imdbtitle_raw ren B distributor ren C release gen nation=regexs(1) if regexm(distributor,"\[(.+)\]") keep if nation=="" | regexm(nation,"us")==1 gen releasedate=real(regexs(1)) if regexm(release,"^\(([1-2][0-9][0-9][0-9])") gen filmdate=real(regexs(1)) if regexm(imdbtitle_raw,"\(([1-2][0-9][0-9][0-9])") drop if filmdate<1950 drop if releasedate> filmdate+5 & releasedate!=. *eliminate release dates that clearly don't reflect the original theatrical release drop if regexm(release,"\(re\-release\)") drop if regexm(release,"\(laserdisc\)") drop if regexm(release,"\(video\)") drop if regexm(release,"\(laserdisc\)") drop if regexm(release,"\(non-USA\)") *among ambiguous cases choose the one that most closely reflects the US theatrical release gen priority=99 replace priority=1 if regexm(release,"\(USA\) \(theatrical\)") replace priority=1.5 if regexm(release,"\(North America\) \(theatrical\)") replace priority=2 if regexm(release,"\(USA\) \(all media\)") replace priority=2.5 if regexm(release,"\(North America\) \(all media\)") replace priority=3 if regexm(release,"\(worldwide\) \(all media\)") replace priority=3 if regexm(release,"\(worldwide\) \(theatrical\)") replace priority=10 if regexm(release,"^\([12][0-9]+\) \(USA\)") & priority==99 replace priority=10.5 if regexm(release,"^\([12][0-9]+\) \(North America\)") & priority==99 replace distributor=subinstr(distributor," [us]","",1) sort imdbtitle_raw priority *debug list if regexm(imdbtitle_raw ,"Girl in a") list if regexm(imdbtitle_raw ,"Pulp Fiction") by imdbtitle_raw : gen rownum=[_n] replace rownum=rownum[_n-1] if imdbtitle_raw==imdbtitle_raw[_n-1] & priority==priority[_n-1] replace rownum=rownum[_n-2] if imdbtitle_raw==imdbtitle_raw[_n-2] & priority==priority[_n-2] *debug list if regexm(imdbtitle_raw ,"Girl in a") list if regexm(imdbtitle_raw ,"Pulp Fiction") keep if rownum==1 drop rownum priority merge m:1 imdbtitle_raw using $parentpath/rawmerge2 keep if _merge==3 drop imdbtitle_raw imdbtitle_merge _merge releasedate nation filmdate order imdbtitle year compress gen major=0 /*core division of major studios*/ gen indymajor=0 */ "independent film" subsidiaries of major studios*/ *note, true independents are ommitted category foreach major in "Columbia Pictures" "Columbia TriStar Film Distributors International" "Metro-Goldwyn-Mayer (MGM)" "Orion Pictures Corporation" "Paramount Pictures" "Sony Pictures Entertainment (SPE)" "TriStar Pictures" "Twentieth Century Fox Film Corporation" "United Artists" "Universal Pictures" "Warner Bros. Pictures" "Warner Brothers" { replace major=1 if distributor=="`major'" } replace major=1 if regexm(distributor,"DreamWorks") /*regexm captures several variations*/ foreach indymajor in "Fox Searchlight Pictures" "Focus Features" "Sony Pictures Classics" "Warner Independent Pictures (WIP)" { /*"Focus" is part of Universal, in other cases relationship is obvious*/ replace indymajor=1 if distributor=="`indymajor'" } *for indymajor's that were acquired and/or divested during period, make coding time-varying replace indymajor=1 if distributor=="Dimension Films" & year>=1993 & year<2005 /* horror film division of Miramax, acquired by Disney as part of Miramax, but divested when the Weinstein's split to form "Weinstein Company" */ replace indymajor=1 if distributor=="Miramax Films" & year>=1993 /* Disney radically restructured/sold off in late 2010 */ replace indymajor=1 if distributor=="Samuel Goldwyn Company, The" & year>=1997 replace indymajor=1 if distributor=="New Line Cinema" & year>=1996 /*TW / Turner merger*/ replace indymajor=1 if distributor=="Fine Line Features" & year>=1996 /*TW / Turner merger*/ collapse (max) major indymajor, by (imdbtitle year) replace major=0 if indymajor==1 *debug list if regexm(imdbtitle,"Girl in a") list if regexm(imdbtitle,"Pulp Fiction") list if regexm(imdbtitle,"Left Foot") *note, "Girl" should not be "indymajor" but Pulp Fiction should sum *major* save $bigstuff/distributors.dta, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *keywords *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `keywords'==1 { *just imports, lambda and tau are later import excel "$rawimdbfiles/keywords_1970andup_noTVG_a.xlsx", sheet("keywords_1970andup_noTVG_a.txt") clear save $bigstuff/keywords.dta, replace import excel "$rawimdbfiles/keywords_1970andup_noTVG_b.xlsx", sheet("keywords_1970andup_noTVG_b.txt") clear append using $bigstuff/keywords.dta ren A imdbtitle_raw ren B keyword drop if regexm(imdbtitle,"\) \(T?VG?\)") drop if regexm(keyword,"oscar-nom") drop if regexm(keyword,"oscar-win") drop if regexm(keyword,"academy-award") drop if regexm(keyword,"critically-acclaimed") sort keyword by keyword: drop if [_N]<`minkwfreq' /*eliminate extremely rare keywords*/ compress sort imdbtitle merge m:1 imdbtitle_raw using $parentpath/rawmerge2 keep if _merge==3 drop _merge imdbtitle_raw imdbtitle_merge sort keyword preserve contract keyword sum _freq, detail local maxkwfreq=`r(p99)' sort _freq list if _freq>=`maxkwfreq' restore by keyword: drop if [_N]<`minkwfreq' by keyword: drop if [_N]>`maxkwfreq' sort imdbtitle compress save $bigstuff/keywords.dta, replace contract keyword list if _freq>50 save $bigstuff/keywords_freqs, replace outsheet using $bigstuff/keywords_freqs.txt, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *genres *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `genres'==1{ *just imports, lambda and tau are later import excel "$rawimdbfiles/genres_1970andup_noTVG.xlsx", sheet("genres_1970andup_noTVG.txt") clear ren A imdbtitle_raw ren B genre merge m:1 imdbtitle_raw using $parentpath/rawmerge2 keep if _merge==3 drop _merge imdbtitle_raw imdbtitle_merge drop if genre=="Adult" drop if genre=="Reality-TV" drop if genre=="Short" drop if regexm(imdbtitle,"\(T?VG?\)$") replace genre=lower(genre) replace genre=regexs(1)+regexs(2) if regexm(genre,"(.+)-(.+)") save $bigstuff/genre_long, replace foreach g in action adventure biography comedy crime drama family fantasy filmnoir history horror music musical mystery romance scifi sport thriller war western { gen g_`g'=0 replace g_`g'=1 if genre=="`g'" } sort imdbtitle gen rows=1 collapse (sum) rows (max) g_action g_adventure g_biography g_comedy g_crime g_drama g_family g_fantasy g_filmnoir g_history g_horror g_music g_musical g_mystery g_romance g_scifi g_sport g_thriller g_war g_western, by (imdbtitle year) sort imdbtitle save $bigstuff/genre.dta, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *mpaa *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `mpaa'==1{ import excel "$parentpath/film_ratings_short.xlsx", sheet("film_ratings_short.csv") firstrow clear *for documentation: the additional data came from filmratings.com, inbaseline.com, and movies.yahoo.com keep imdbtitle mpaa ren imdbtitle imdbtitle_raw ren mpaa rating save $bigstuff/certificates_handcoded.dta, replace insheet using $oscarsfolder/dataapril2010/certificates.txt, tab clear ren v1 imdbtitle_raw ren v2 rating ren v3 notes append using $bigstuff/certificates_handcoded.dta merge m:1 imdbtitle_raw using $parentpath/rawmerge2 list if regexm(imdbtitle,"Imaginarium of Doctor Parnassus") /*debug*/ drop _merge imdbtitle_raw imdbtitle_merge gen priority=0 replace notes=regexs(1)+" "+regexs(2) if regexm(notes,"(.+)'(.+)") /*similar to "Rosemary's Baby" but for "director's cut*/ replace priority=-1 if regexm(notes,"^\(re-rating\) \([0-9][0-9][0-9][0-9]\)") replace priority=-1 if regexm(notes,"^\(DVD.*\)") replace priority=-1 if regexm(notes,"^\(VHS.*\)") replace priority=-1 if regexm(notes,"^\(video.*\)") replace priority=-1 if regexm(notes,"^\(TV rating\)") replace priority=-1 if regexm(notes,"^\(director s cut\)") gen mpaa=0 replace mpaa=1 if rating=="USA:G" | rating=="G" replace mpaa=2 if rating=="USA:PG" | rating=="PG" replace mpaa=3 if rating=="USA:PG-13" | rating=="PG13" replace mpaa=4 if rating=="USA:R" | rating=="R" replace mpaa=5 if rating=="USA:NC-17" | rating=="NC17" replace mpaa=6 if rating=="USA:X" | rating=="X" replace mpaa=7 if rating=="USA:Unrated" | rating=="USA:Not Rated" | rating=="NR" recode mpaa 6/7=5 /*fold X & unrated into NC17*/ capture lab drop mpaa lab def mpaa 1 "G" 2 "PG" 3 "PG13" 4 "R" 5 "else" notes mpaa: category 5 includes X, NC17, and unrated list if regexm(imdbtitle,"Imaginarium of Doctor Parnassus") /*debug*/ drop if mpaa==0 | priority==-1 collapse (min) mpaa, by (imdbtitle year) compress lab val mpaa mpaa sort imdbtitle list if regexm(imdbtitle,"Imaginarium of Doctor Parnassus") /*debug*/ save $bigstuff/mpaa.dta, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *Oscar noms + wins *Past noms variables *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `pnomimport'==1 { *Oscars data isn't in the FTP version of IMDb, coded from Oscars.org *this batch of code: * 1. combines recent Oscars data w older data from Rossman, Esparza, and Bonacich 2010 data * 2. recodes the data to create noms to-date for workers, merge them onto films. insheet using $recent_oscars, clear replace filmtitle=regexs(2)+", "+regexs(1) if regexm(filmtitle,"(A) (.+)") gen imdbtitle=filmtitle+" ("+filmyear+")" replace imdbtitle="Squid and the Whale, The (2005)" if imdbtitle=="The Squid and the Whale (2005)" replace imdbtitle="Crash (2004)" if imdbtitle=="Crash (2004/I)" replace imdbtitle="I'm Not There. (2007)" if imdbtitle=="I'm Not There (2007)" replace imdbtitle="Mrs. Henderson Presents (2005)" if imdbtitle=="Mrs Henderson Presents (2005)" replace imdbtitle="A Serious Man (2009)" if imdbtitle=="Serious Man, A (2009)" replace imdbtitle="Constant Gardener, The (2005)" if imdbtitle=="The Constant Gardener (2005)" replace imdbtitle="A Single Man (2009)" if imdbtitle=="Single Man, A (2009)" save $bigstuff/recent_oscars.dta, replace *nominee var means redundant at j level drop nominee contract _all drop _freq filmtitle filmyear save $bigstuff/recent_oscars_jcat.dta, replace drop category gen noms_j=1 ren winner wins_j collapse (sum) noms_j wins_j, by (imdbtitle) sort imdbtitle save $bigstuff/recent_oscars_j.dta, replace *load each profession file, then "keep if" and append *best picture use $bigstuff/recent_oscars.dta, clear keep if category=="Best Picture" drop category append using "$oscarsfolder/bestpicture.dta" replace year=real(regexs(1)) if year==. & regexm(filmyear,"([0-9]+)")==1 replace bp_nom=1 if winner!=. replace bp_oscar=1 if winner==1 recode bp_oscar .=0 replace imdbtitle=film if imdbtitle=="" drop oscaryear filmtitle filmyear film winner compress replace year=1985 if imdbtitle=="Room with a View, A (1985)" save $bigstuff/bp.dta, replace drop nominee contract _all drop _freq save $bigstuff/bp_j.dta, replace *directors use $bigstuff/recent_oscars.dta, clear keep if category=="Best Director" drop category ren nominee director ren imdbtitle film append using "$oscarsfolder/directors/directors_gabey.dta" ren film imdbtitle replace Fdirector_nom=1 if winner!=. replace Fdirector_oscar=1 if winner==1 recode Fdirector_oscar .=0 replace year=real(regexs(1)) if year==. & regexm(filmyear,"([0-9]+)")==1 gen altyear=real(regexs(1)) if regexm(imdbtitle,"[^\(]+\(([0-9][0-9][0-9][0-9])") drop if altyear!=year drop winner filmtitle filmyear oscaryear altyear save $bigstuff/directors.dta, replace drop if Fdirector_nom==0 contract imdbtitle Fdirector_nom Fdirector_oscar year drop _freq drop if Fdirector_oscar==0 & imdbtitle=="Brokeback Mountain (2005)" sort imdbtitle save $bigstuff/directors_j.dta, replace *writers use $bigstuff/recent_oscars.dta, clear keep if category=="Best Writing Written Directly for Screen" | category=="Best Writing Previously Produced or Published" drop category ren nominee writer save $bigstuff/writers.dta, replace insheet using $oscarsfolder/writers/woscarnoms.txt, tab clear keep if confirm==1 drop confirm tempfile writers save `writers', replace ren film imdbtitle append using $bigstuff/writers.dta /* 2005-2010 */ gen Fwriter_nom=1 replace fwriter_oscar=winner if fwriter_oscar==. drop winner replace year=real(regexs(1)) if year==. & regexm(imdbtitle,"[\()]([0-9]+)")==1 drop oscaryear filmtitle filmyear sort imdbtitle save $bigstuff/writers.dta, replace contract fwriter_oscar Fwriter_nom imdbtitle year drop _freq sort imdbtitle save $bigstuff/writers_j.dta, replace *actors use $bigstuff/recent_oscars.dta, clear keep if category=="Best Actor Leading Role" | category=="Best Actor Supporting Role" | category=="Best Actress Leading Role" | category=="Best Actress Supporting Role" gen year=real(regexs(1)) if regexm(filmyear,"([0-9][0-9][0-9][0-9])") ren nominee name drop filmyear filmtitle oscaryear save $bigstuff/actors.dta, replace use "$oscarsfolder/oscar_noms4.18.06.dta", clear ren film imdbtitle decode category, gen(category_s) drop compid category ren category_s category ren actor_oscar winner append using $bigstuff/actors replace category="Best Actor" if category=="Best Actor Leading Role" replace category="Best Actress" if category=="Best Actress Leading Role" replace category="Supporting Actor" if category=="Best Actor Supporting Role" replace category="Supporting Actress" if category=="Best Actress Supporting Role" recode actor_nom .=1 contract _all drop _freq save $bigstuff/actors.dta, replace collapse (sum) winner actor_nom, by(imdbtitle year) ren winner actor_wins ren actor_nom actor_noms sort imdbtitle save $bigstuff/actors_j.dta, replace *attach all noms / wins use imdb19752010_mk, clear sort imdbtitle year merge 1:1 imdbtitle year using $bigstuff/actors_j tab _merge if year<`prestart' drop _merge sort imdbtitle year merge 1:1 imdbtitle year using $bigstuff/bp_j tab _merge if year<`prestart' drop _merge sort imdbtitle year merge 1:1 imdbtitle year using $bigstuff/directors_j tab _merge if year<`prestart' drop _merge sort imdbtitle year merge 1:1 imdbtitle year using $bigstuff/writers_j tab _merge if year<`prestart' drop _merge foreach var in actor_wins actor_noms bp_oscar bp_nom Fdirector_nom Fdirector_oscar fwriter_oscar Fwriter_nom { recode `var' .=0 } sum desc save $bigstuff/oscarsnomswins, replace lab var eligible "dummy AMPAS nom ballot" lab var sstitle "title in StudioSystem" lab var actor_wins "Count of acting wins, across sub-cats" lab var actor_noms "Count of acting noms, across sub-cats" lab var bp_oscar "Best Picture win" lab var bp_nom "Best Picture nom" lab var Fdirector_nom "Director nom" lab var Fdirector_oscar "Director win" lab var fwriter_oscar "Writing win" lab var Fwriter_nom "Writing nom" gen totalwins=actor_wins+bp_oscar+Fdirector_oscar+fwriter_oscar gen totalnoms=actor_noms+bp_nom+Fdirector_nom+Fwriter_nom /*notation: kappa_j */ gen pl_totwins=ln(totalwins+(totalwins^2+1)^0.5) gen pl_totnoms=ln(totalnoms+(totalnoms^2+1)^0.5) lab var pl_totwins "pseudo-log(total wins)" /*"pl" or "pseudo-log" is our shorthand for inverse hyperbolic sine*/ lab var pl_totnoms "pseudo-log(total noms)" save imdb19752010_oscars, replace drop if eligible==0 | eligible==.a desc compress imdbtitle save $bigstuff/imdb19752010_eligible_oscars, replace *=-=-=-=-=-=-=-=-=-=-=-=- *directors *=-=-=-=-=-=-=-=-=-=-=-=- *create running tally to date of oscar noms *create annual files forvalues t=1927/`enddate' { use $bigstuff/directors.dta, clear keep if year==`t' & Fdirector_nom==1 compress sort director gen noms=1 collapse (sum) nom , by (director) sort director compress save $bigstuff/annual/director_`t'.dta, replace } *create to-date director CVs clear set obs 1 gen director="" gen directornomsum=. gen year=. lab var directornomsum "n nominations to date" sort director forvalues t=1927/`enddate' { merge director using $bigstuff/annual/director_`t'.dta drop _merge recode noms .=0 recode directornomsum .=0 replace directornomsum=directornomsum+noms drop noms replace year=`t'+1 lab var year "year +1 (for lag)" sort director save $bigstuff/annual/directorsum_`t'.dta, replace } clear *create single director-year file forvalues t=1927/`enddate' { append using $bigstuff/annual/directorsum_`t'.dta } drop if director=="" sort director year save $bigstuff/directorsum_allt.dta, replace *get raw data import excel "$rawimdbfiles/directors_1970andup_noTVG.xlsx", sheet("directors_1970andup_noTVG.txt") firstrow clear drop C /*notes about what the person contributed or under what alias*/ ren Titles imdbtitle_raw ren Name director merge m:1 imdbtitle_raw using $parentpath/rawmerge2 drop if _merge==1 *note, not "keep if _merge==3" drop _merge imdbtitle_raw save $bigstuff/directorcredits, replace *combine with the director-film file *note that the merge is lagged one year by merging the current t onto t+1 use $bigstuff/directorsum_allt.dta, clear merge 1:m director year using $bigstuff/directorcredits drop if imdbtitle=="" drop imdbtitle_merge _merge recode directornomsum .=0 collapse (sum) directornomsum, by ( year imdbtitle) lab var directornomsum "n nominations on director's cv by t" order imdbtitle year directornomsum compress sort imdbtitle save $bigstuff/pastnoms_d.dta, replace *=-=-=-=-=-=-=-=-=-=-=-=- *writers *=-=-=-=-=-=-=-=-=-=-=-=- *create running tally to date of oscar noms *create annual files forvalues t=1927/`enddate' { use $bigstuff/writers.dta, clear keep if year==`t' & Fwriter_nom==1 compress sort writer gen noms=1 collapse (sum) nom , by (writer) sort writer compress save $bigstuff/annual/writer_`t'.dta, replace } *create to-date writer CVs clear set obs 1 gen writer="" gen writernomsum=. gen year=. lab var writernomsum "n nominations to date" sort writer forvalues t=1927/`enddate' { merge writer using $bigstuff/annual/writer_`t'.dta drop _merge recode noms .=0 recode writernomsum .=0 replace writernomsum=writernomsum+noms drop noms replace year=`t'+1 lab var year "year +1 (for lag)" sort writer save $bigstuff/annual/writersum_`t'.dta, replace } clear *create single writer-year file forvalues t=1927/`enddate' { append using $bigstuff/annual/writersum_`t'.dta } drop if writer=="" sort writer year save $bigstuff/writersum_allt.dta, replace *get raw data import excel "$rawimdbfiles/writers_1970andup_noTVG.xlsx", sheet("writers_1970andup_noTVG.txt") firstrow clear ren Titles imdbtitle_raw ren C credit ren Name writer *drop innappropriate credits gen suspect=0 gen screenplay=0 replace screenplay=1 if regexm(credit,"screenplay") replace suspect=1 if regexm(credit,"episode") & regexm(imdbtitle_raw,"^_") replace suspect=1 if regexm(credit,"\(staff writer") & regexm(imdbtitle_raw,"^_") replace suspect=1 if regexm(credit,"^\(play") replace suspect=1 if credit=="(play)" replace suspect=1 if credit=="(novel)" replace suspect=1 if regexm(credit,"\(book") replace suspect=1 if regexm(credit,"\(novel") replace suspect=1 if regexm(credit,"\(short story") replace suspect=1 if regexm(credit,"\(short stories") gen year=real(regexs(2)) if regexm(imdbtitle_raw,"(.+) \(([0-9][0-9][0-9][0-9])") *drop if year>2006 gen credityear=real(regexs(1)) if regexm(credit,"\(([0-9][0-9][0-9][0-9]) screenplay") drop if credit=="(earlier screenplay)" drop if year>credityear+3 /*dropped rather than "suspect" because "YYYY screenplay" is common credit for writer of the original on a remake */ tab suspect screenplay drop if suspect==1 & screenplay==0 if 1==1 { *drop "uncredited" and similar credits, but not "originally uncredited" gen uncredited=0 replace uncredited=1 if regexm(credit,"\(uncredited\)") replace uncredited=1 if credit=="(uncredited)" replace uncredited=1 if credit=="(additional dialogue)" replace uncredited=1 if credit=="(characters)" replace uncredited=1 if credit=="(book)" replace uncredited=1 if credit=="(contributing writer) (uncredited)" replace uncredited=1 if credit=="(story) (uncredited)" replace uncredited=1 if credit=="(continuity)" replace uncredited=1 if credit=="(character)" replace uncredited=1 if credit=="(contributor to screenplay construction) (uncredited)" replace uncredited=1 if credit=="(additional dialogue) (uncredited)" replace uncredited=1 if credit=="(screenplay) (uncredited)" replace uncredited=1 if credit=="(contributor to treatment) (uncredited)" replace uncredited=1 if credit=="(short story)" replace uncredited=1 if credit=="(adaptation) (uncredited)" replace uncredited=1 if credit=="(novel) (uncredited)" replace uncredited=1 if credit=="(dialogue) (uncredited)" replace uncredited=1 if credit=="(contributor to dialogue) (uncredited)" replace uncredited=1 if credit=="(additional material)" replace uncredited=1 if credit=="(comic strip characters)" replace uncredited=1 if credit=="(play) (uncredited)" replace uncredited=1 if credit=="(writer) (uncredited)" replace uncredited=1 if credit=="(titles) (uncredited)" replace uncredited=1 if credit=="(characters) (uncredited)" replace uncredited=1 if credit=="(comic strip)" replace uncredited=1 if credit=="(article)" replace uncredited=1 if credit=="(musical play)" replace uncredited=1 if credit=="(dialogue continuity)" replace uncredited=1 if credit=="(story and continuity)" replace uncredited=1 if credit=="(television series)" replace uncredited=1 if credit=="(uncredited) (unconfirmed)" replace uncredited=1 if credit=="(additional story material)" replace uncredited=1 if credit=="(poem)" replace uncredited=1 if credit=="(comic book)" replace uncredited=1 if credit=="(Liberty Magazine serial chapter)" replace uncredited=1 if credit=="(treatment contributor) (uncredited)" replace uncredited=1 if credit=="(contributor to screenplay) (uncredited)" replace uncredited=1 if credit=="(screenplay contributor) (uncredited)" replace uncredited=1 if credit=="(based on a screenplay by)" replace uncredited=1 if credit=="(treatment) (uncredited)" replace uncredited=1 if credit=="(television series Star Trek)" replace uncredited=1 if credit=="(story) (unconfirmed)" replace uncredited=1 if credit=="(contributing writer) (uncredited) (unconfirmed)" replace uncredited=1 if credit=="(contributor) (uncredited)" replace uncredited=1 if credit=="(additional writer: story)" replace uncredited=1 if credit=="(book) (uncredited)" replace uncredited=1 if credit=="(television series Police Squad)" replace uncredited=1 if credit=="(screenplay construction contributor) (uncredited)" replace uncredited=1 if credit=="(Marvel comic book)" replace uncredited=1 if credit=="(character The Bride)" drop if uncredited==1 drop uncredited } *drop great authors who did not directly participate in screen adaptation foreach author in "Shakespeare, William (I)" "Dickens, Charles" "Sophocles" "Twain, Mark" "Austen, Jane (I)" "Dostoyevsky, Fyodor" "Tolstoy, Leo" { drop if writer=="`author'" } drop suspect screenplay credityear credit D E F merge m:1 imdbtitle_raw using $parentpath/rawmerge2 drop if _merge==1 drop _merge compress save $bigstuff/writercredits, replace use $bigstuff/writersum_allt.dta, clear merge 1:m writer year using $bigstuff/writercredits drop if imdbtitle=="" drop imdbtitle_merge _merge recode writernomsum .=0 collapse (sum) writernomsum, by ( year imdbtitle) lab var writernomsum "n nominations on writer's cv by t" order imdbtitle year writernomsum compress sort imdbtitle save $bigstuff/pastnoms_w.dta, replace *=-=-=-=-=-=-=-=-=-=-=-=- *actors *=-=-=-=-=-=-=-=-=-=-=-=- *create running tally to date of oscar noms *create annual files forvalues t=1927/`enddate' { use $bigstuff/actors.dta, clear keep if year==`t' & actor_nom==1 compress sort name gen noms=1 collapse (sum) nom , by (name) sort name compress save $bigstuff/annual/actor_`t'.dta, replace } *create to-date actor CVs clear set obs 1 gen name="" gen actornomsum=. gen year=. lab var actornomsum "n nominations to date" sort name forvalues t=1927/`enddate' { merge name using $bigstuff/annual/actor_`t'.dta drop _merge recode noms .=0 recode actornomsum .=0 replace actornomsum=actornomsum+noms drop noms replace year=`t'+1 lab var year "year +1 (for lag)" sort name save $bigstuff/annual/actorsum_`t'.dta, replace } clear *create single actor-year file forvalues t=1927/`enddate' { append using $bigstuff/annual/actorsum_`t'.dta } drop if name=="" sort name year save $bigstuff/actorsum_allt.dta, replace *get raw data clear gen x=. save $bigstuff/actorcredits, replace foreach file in actors_1970andup_noTVG_a actors_1970andup_noTVG_b actresses_1970andup_noTVG { import excel "$rawimdbfiles/`file'.xlsx", sheet("`file'.txt") firstrow allstring clear ren Titles imdbtitle_raw ren C credit ren Name name keep name imdbtitle_raw merge m:1 imdbtitle_raw using $parentpath/rawmerge2 keep if _merge==3 append using $bigstuff/actorcredits compress save $bigstuff/actorcredits, replace } drop x imdbtitle_raw _merge save $bigstuff/actorcredits, replace *combine with the actor-film file *note that the merge is lagged one year by merging the current t onto t+1 use $bigstuff/actorsum_allt.dta, clear merge 1:m name year using $bigstuff/actorcredits drop if imdbtitle=="" drop imdbtitle_merge _merge recode actornomsum .=0 collapse (sum) actornomsum, by ( year imdbtitle) lab var actornomsum "n nominations on director's cv by t" order imdbtitle year actornomsum compress sort imdbtitle save $bigstuff/pastnoms_a.dta, replace *create the big file use $bigstuff/pastnoms_a.dta, clear merge 1:1 imdbtitle using $bigstuff/pastnoms_w drop _merge merge 1:1 imdbtitle using $bigstuff/pastnoms_d drop _merge gen pl_pnom_a=ln(actornomsum+(actornomsum^2+1)^0.5) gen pl_pnom_w=ln(writernomsum+(writernomsum^2+1)^0.5) gen pl_pnom_d=ln(directornomsum+(directornomsum^2+1)^0.5) foreach var in actornomsum writernomsum directornomsum { gen `var'_dummy=`var' recode `var'_dummy 0=0 1/1000=1 } save $bigstuff/pastnoms, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *release dates *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- if `releasedates'==1 { import excel "$oscarsfolder/dataapril2010/releasedates_1970andup_noTVG.xlsx", sheet("releasedates_1970andup_noTVG.tx") clear ren A imdbtitle_raw ren B date_str ren C type gen d=date(subinstr(date_str,"USA:","",1),"DMY") head format d %td head codebook date_str keep if regexm(date_str,"^USA\:") sum d desc replace d=date(subinstr(date_str,"USA:","",1),"MY")+(runiform()*30) if d==. codebook imdbtitle_raw drop if regexm(type,"DVD premiere") gen festival=regexm(type,"[Ff]estival") tab festival , miss gen market=regexm(type,"[Mm]arket") tab market head merge m:1 imdbtitle_raw using $parentpath/rawmerge2.dta keep if _merge==3 *from genre_dates_ratings gen goodrow=0 foreach location in "(alternate version)" "(censored version)" "(theatrical release)" "(theatrical release) (limited)" "(silent version)" "(sound version)" "(sound version) (premiere)" "(premiere)" "(premiere) (Los Angeles, California)" "(limited)" "(complete verson)" "(Westwood, California)" "(Westwood, California) (premiere)" "(West Hollywood, California)" "(West Hollywood, California) (premiere)" "(Walnut Creek, Ca)" "(Venice, California)" "(USC premire screening)" "(USC)" "(USC) (premiere)" "(Thousand Oaks, California)" "(Thousand Oaks, California) (premiere)" "(San Fernando Valley, California)" "(South Pasadena, California) (premiere)" "(Santa Monica, California)" "(Santa Monica, California) (premiere)" "(Pasadena, California)" "(Pasadena, California) (premiere)" "(North Hollywood, California)" "(North Hollywood, California) (premiere)" "(Marina Del Rey, California)" "(Malibu, California)" "(Long Beach, California)" "(Long Beach, California) (premiere)" "(Los Angeles) (final cut)" "(Los Angeles) (premiere)" "(Los Angeles, California)" "(Los Angeles, California) (RKO roadshow version)" "(Los Angeles, California) (Redux version)" "(Los Angeles, California) (director's cut)" "(Los Angeles, California) (edited version) (premiere)" "(Los Angeles, California) (extended version)" "(Los Angeles, California) (final release version)" "(Los Angeles, California) (premeire)" "(Los Angeles, California) (premiere)" "(Los Angeles, California) (premiere) (Egyptian Theater, Hollywood)" "(La Canada, California) (premiere)" "(IMAX version)" "(IMAX version) (premiere)" "(IMAX version) (re-release)" "(Hollywood, California)" "(Hollywood, California) (premiere)" "(Hollywood, California) (re-release)" "(Hollywood, California) (silent version)" "(El Segundo, California)" "(Century City, California)" "(Century City, California) (premiere)" "(Burbank, California)" "(Burbank, California) (premiere)" "(Beverly Hills, California)" "(Beverly Hills, California) (premiere)" "(Academy of Television Arts and Sciences)" "(Academy of Television Arts and Sciences) (premiere)" "(Academy of Motion Picture Arts and Sciences)" "" { replace goodrow=1 if type=="`location'" } drop if d==. sort imdbtitle goodrow compress collapse (min) d, by(imdbtitle goodrow) replace d=. if goodrow==0 & imdbtitle==imdbtitle[_n+1] & goodrow[_n+1]==1 replace d=14289 if imdbtitle=="Windhorse (1998)" drop if d==. drop goodrow generate dm=month(d) generate dy=doy(d) mkspline dy_1 180 dy_2 320 dy_3 = dy lab var dy_1 "day of year, 1-180" lab var dy_2 "day of year, 180-320" lab var dy_3 "day of year, 320-365" sort imdbtitle save $bigstuff/releasedates.dta, replace } *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *lamdba & tau *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- use $bigstuff/imdb19752010_eligible_oscars, clear merge 1:m imdbtitle using $bigstuff/keywords.dta keep if _merge==3 drop _merge save $bigstuff/keywords_oscars, replace sort imdbtitle by imdbtitle: gen nkeywords=[_N] gen pl_totwins_scaled=pl_totwins/(nkeywords^0.5) gen pl_totnoms_scaled=pl_totnoms/(nkeywords^0.5) gen weightedfreq=1/(nkeywords^0.5) gen keywordfreq=1 collapse (sum) pl_totwins_scaled pl_totnoms_scaled weightedfreq keywordfreq, by(keyword) lab var weightedfreq "keyword, down-weighted by how many other kw appear on same films" * notation for keywordfreq: n_i(j) gen keywordratio=pl_totnoms_scaled/keywordfreq gen keywordratio_w=pl_totwins_scaled/keywordfreq save keyword_weightedfreq.dta, replace sort keywordratio tail drop pl_totwins_scaled pl_totnoms_scaled save keyword_scores, replace /*illustrative only, time-varying scores used for calculation*/ *create tau_j with lagged lambda_i forvalues i=`prestart'/`enddate' { use $bigstuff/keywords_oscars, clear local past=`i'-`window' local i1=`i'-1 keep if year>=`past' & year<`i' note: data from `past' - `i1', meant to serve as lagged for `i' compress save $bigstuff/annual/kwo_`i', replace use $bigstuff/keywords_oscars, clear keep if year==`i' compress save $bigstuff/annual/targetfile_`i', replace use $bigstuff/annual/kwo_`i', clear sum year sort imdbtitle by imdbtitle: gen nkeywords=[_N] gen pl_totwins_scaled=pl_totwins/(nkeywords^0.5) gen pl_totnoms_scaled=pl_totnoms/(nkeywords^0.5) gen weightedfreq=1/(nkeywords^0.5) gen keywordfreq=1 collapse (sum) pl_totwins_scaled pl_totnoms_scaled weightedfreq keywordfreq, by(keyword) lab var weightedfreq "keyword, down-weighted by how many other kw appear on same films" gen keywordratio=pl_totnoms_scaled/keywordfreq /* notation: lamdba_i */ gen keywordratio_w=pl_totwins_scaled/keywordfreq sum *histogram keywordratio sort keywordratio tail drop pl_totwins_scaled pl_totnoms_scaled save $bigstuff/annual/keyword_scores`i', replace merge 1:m keyword using $bigstuff/annual/targetfile_`i' recode keywordratio .=0 /* keywords that have newly entered usage */ recode keywordratio_w .=0 /* keywords that have newly entered usage */ sum year drop _merge lab var keywordratio "Oscar bait keywords" /* notation: lambda_it */ lab var keywordratio_w "Oscar bait keywords (wins)" notes keywordratio : sum over keywords in film, where each keyword in film is scored as sum of pseudo-log(keyword appears in nom film) scaled by baseline keyword frequency sort imdbtitle by imdbtitle: gen nkeywords=[_N] drop if imdbtitle=="" /*keywords that have fallen out of use since lag*/ save $bigstuff/annual/imdb`i'_eligible_oscars_kw, replace collapse (sum) keywordratio keywordratio_w, by (imdbtitle year eligible sstitle actor_wins actor_noms bp_oscar bp_nom Fdirector_nom Fdirector_oscar fwriter_oscar Fwriter_nom totalwins totalnoms pl_totwins pl_totnoms nkeywords) lab var keywordratio "tau_j - sum of Oscar association for keywords" /* tau_j */ lab var keywordratio_w "sum of Oscar association for keywords (wins)" save $bigstuff/annual/imdb`i'_eligible_oscars, replace } clear forvalues i=`prestart'/`enddate' { append using $bigstuff/annual/imdb`i'_eligible_oscars } save $bigstuff/imdb_eligible_oscars_lambda_it, replace *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *do tau for genre *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- use $bigstuff/imdb19752010_eligible_oscars, clear merge 1:m imdbtitle using $bigstuff/genre_long.dta keep if _merge==3 drop _merge save $bigstuff/genres_oscars, replace sum year sort imdbtitle by imdbtitle: gen ngenres=[_N] gen pl_totwins_scaled=pl_totwins/(ngenres^0.5) gen pl_totnoms_scaled=pl_totnoms/(ngenres^0.5) gen weightedfreq=1/(ngenres^0.5) gen genrefreq=1 collapse (sum) pl_totwins_scaled pl_totnoms_scaled weightedfreq genrefreq, by(genre) lab var genrefreq "genre, down-weighted by how many other genre appear on same films" gen genreratio=pl_totnoms_scaled/genrefreq gen genreratio_w=pl_totwins_scaled/genrefreq sum *histogram keywordratio sort genreratio tail drop pl_totwins_scaled pl_totnoms_scaled save $bigstuff/genre_scores, replace /*illustrative only, time-varying scores used for calculation*/ *create tau_j with lagged lambda_i forvalues i=`prestart'/`enddate' { use $bigstuff/genres_oscars, clear local past=`i'-`window' local i1=`i'-1 keep if year>=`past' & year<`i' note: data from `past' - `i1', meant to serve as lagged for `i' compress save $bigstuff/annual/go_`i', replace * "go"="genre + oscars" use $bigstuff/genres_oscars, clear keep if year==`i' compress save $bigstuff/annual/targetfile_g_`i', replace use $bigstuff/annual/go_`i', clear sum year sort imdbtitle by imdbtitle: gen ngenres=[_N] gen pl_totwins_scaled=pl_totwins/(ngenres^0.5) gen pl_totnoms_scaled=pl_totnoms/(ngenres^0.5) gen weightedfreq=1/(ngenres^0.5) gen genrefreq=1 collapse (sum) pl_totwins_scaled pl_totnoms_scaled weightedfreq genrefreq, by(genre) lab var genrefreq "genre, down-weighted by how many other genre appear on same films" gen genreratio=pl_totnoms_scaled/genrefreq /* notation: lamdba_i */ gen genreratio_w=pl_totwins_scaled/genrefreq sum *histogram keywordratio sort genreratio tail drop pl_totwins_scaled pl_totnoms_scaled save $bigstuff/annual/genre_scores`i', replace merge 1:m genre using $bigstuff/annual/targetfile_g_`i' recode genreratio .=0 /* keywords that have newly entered usage */ recode genreratio_w .=0 /* keywords that have newly entered usage */ sum year drop _merge sort imdbtitle by imdbtitle: gen ngenres=[_N] drop if imdbtitle=="" /*genres that have fallen out of use since lag*/ save $bigstuff/annual/imdb`i'_eligible_oscars_g, replace collapse (sum) genreratio genreratio_w, by (imdbtitle year eligible sstitle actor_wins actor_noms bp_oscar bp_nom Fdirector_nom Fdirector_oscar fwriter_oscar Fwriter_nom totalwins totalnoms pl_totwins pl_totnoms ngenres) lab var genreratio "sum of Oscar association for keywords" /*notation: tau_j */ save $bigstuff/annual/imdb`i'_eligible_oscars_g, replace } clear forvalues i=`prestart'/`enddate' { append using $bigstuff/annual/imdb`i'_eligible_oscars_g } save $bigstuff/imdb_eligible_oscars_lambda_g_it, replace *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- *stack the enchilada *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- use eligible_19752010.dta, clear *attach lambda -- keyword merge 1:1 imdbtitle using $bigstuff/imdb_eligible_oscars_lambda_it drop _merge *attach lambda -- genre merge 1:1 imdbtitle using $bigstuff/imdb_eligible_oscars_lambda_g_it drop _merge *attach genre merge 1:1 imdbtitle using $bigstuff/genre.dta drop _merge compress desc *attach past noms variables merge 1:1 imdbtitle using $bigstuff/pastnoms drop _merge *attach distributor (major coding only, not string for dist) merge 1:1 imdbtitle using $bigstuff/distributors drop _merge *attach mpaa merge 1:1 imdbtitle using $bigstuff/mpaa drop _merge *attach film noms merge 1:1 imdbtitle using $bigstuff/imdb19752010_eligible_oscars, update drop _merge *attach release dates merge 1:1 imdbtitle using $bigstuff/releasedates, update drop _merge *clean up variables recode mpaa .=5 recode major .=0 recode indymajor .=0 recode pl_totnoms .=0 recode pl_totwins .=0 drop if imdbtitle=="" gen mpaa_r=mpaa recode mpaa_r 0/3=0 4=1 5/100=0 save $bigstuff/enchilada, replace desc sum tab year *have a nice day
The second do-file takes the output of the first and uses it to calculate Oscar appeal as the predicted values of a regression based on lagged data.
*predictedvalues.do *Stata do-file for Rossman & Schilke "Close But No Cigar" *Please direct any questions to rossman@soc.ucla.edu *This do-file must be run after mergeorama.do *It calculate the "Oscar appeal" variable at the center of the paper as the predicted values from a regression over rolling windows with lagged betas * (ie, the data point being predicted is not included in the estimation of the betas used to calculate it) *Our file structure is unfortunately somewhat complicated but we always refer to paths through globals. There are no hard-coded paths so changing these globals to fit your own file system should solve everything global parentpath "~/Dropbox/oscarsperformativity/studiosystem" global oscarsfolder "~/Documents/oscars" global bigstuff "$oscarsfolder/studiosystemoverflow" global recent_dir "$oscarsfolder/dataapril2010/maria" global recent_eligibles "$recent_dir/filmtitles_Apr2010.txt" global recent_oscars "$recent_dir/oscars20062010.txt" global tables "$parentpath/tables" capture log close log using $parentpath/predictedvalues.log, replace *These switches turn batches of code on and off. This both facilitates code-folding and debugging. For final run though make sure everything is turned on local window 5 local startdate 1985 local enddate 2009 local mpaalist "mpaa_r" local pastnoms "actornomsum_dummy writernomsum_dummy directornomsum_dummy" local tauspec "gr_* kr_*" local majorspec "major indymajor" local datespec "dy_1 dy_2 dy_3" local xvector "`majorspec' `tauspec' `pastnoms' `mpaalist' `datespec' `fyc' `rtscore'" disp "`xvector'" capture program drop tauspline program define tauspline quietly sum genreratio, detail mkspline gr_lowmed `r(p50)' gr_high=genreratio quietly sum keywordratio, detail mkspline kr_lowmed `r(p50)' kr_high=keywordratio end use $bigstuff/enchilada, clear /* load the data cleaned by mergeorama.do */ local gooddatavarlist "major indymajor keywordratio genreratio actornomsum_dummy writernomsum_dummy directornomsum_dummy mpaa_r dy_1 dy_2 dy_3" sum `gooddatavarlist' foreach var in `gooddatavarlist' { disp "`var'" drop if `var'==. disp "-----" } keep imdbtitle save $bigstuff/gooddatalist.dta, replace *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= *estimation with the full sample, across all years *this estimation is not used for calculations of Oscar Appeal, but just for parsimonious illustration in Table 1 use $bigstuff/enchilada, clear sum `gooddatavarlist' merge m:1 imdbtitle using $bigstuff/gooddatalist keep if _merge==3 drop _merge tauspline sum `xvector' tab year *table 1 nbreg totalnoms `xvector', iterate(500) esttab using $tables/nbregtotalnoms_pooled.txt, se b(3) se(3) scalars(ll) nodepvars nomtitles label title(NEGATIVE BINOMIAL REGRESSION OF TOTAL NOMINATIONS) replace fixed *=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= *estimations based on rolling time windows *these are used to calculate Oscar appeal eststo clear forvalues i=`startdate'/`enddate' { /*loop over years*/ disp "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" _n "`i'" _n "=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-" use $bigstuff/enchilada, clear local past=`i'-`window' keep if year>=`past' & year<=`i' /*keep only the focal year and it's recent past*/ gen totnoms=totalnoms /*make a backup of Y before recoding to missing for "predict" purposes*/ replace totalnoms=. if year==`i' /*recode the focal year to missing (so as to not fit on noise for y-hat betas)*/ tauspline disp _n _n "nbreg total number of nominations ..." nbreg totalnoms `xvector', iterate(500) eststo predict totalnoms_hat_xb, xb /*calculate Oscar Appeal*/ predict totalnoms_hat estat ic keep if year==`i' /*keep only the focal year (ie, drop the predictions for it's recent past)*/ save $bigstuff/annual/predictions`i', replace } *table of time-varying betas, summarized in Appendix C esttab using $tables/oscarbait.txt, se b(3) se(3) scalars(ll) nodepvars label title(NEGATIVE BINOMIAL REGRESSION OF OSCAR NOMS BY 5 YEAR WINDOWS) replace fixed nomtitles *stitch together the annual Oscar appeal files, combine w covariates clear forvalues i=`startdate'/`enddate' { append using $bigstuff/annual/predictions`i' } drop bp_oscar bp_nom totalwins totalnoms _est_est* save $parentpath/predictions, replace use $bigstuff/enchilada, clear keep bp_oscar bp_nom imdbtitle merge 1:1 imdbtitle using $parentpath/predictions drop _merge lab var totalnoms_hat_xb "Oscar Appeal" save $parentpath/predictions, replace sum *have a nice day
Entry filed under: Uncategorized. Tags: cleaning, IMDB, Stata.
1.
Jim | August 1, 2013 at 1:17 pm
Congratulations on a fine paper. I was hoping for this outcome. Will look forward to reading the final version.
2.
gabrielrossman | September 4, 2013 at 11:40 am
thanks much
3.
kavya | September 4, 2013 at 11:34 am
can u plz send me complete perl code for my project :
Comparison of term frequency and document frequency based feature selection methods in text categorization
4.
gabrielrossman | September 4, 2013 at 11:41 am
This is what I have documented, the rest isn’t sufficiently organized to share. However these two scripts establish the gist of it
https://codeandculture.wordpress.com/2010/07/26/imdb_personnel-pl/
https://codeandculture.wordpress.com/2010/03/31/gross-pl/
5.
kavya | September 4, 2013 at 11:43 am
sir plzzz help me . i have to sub met my project and save my life sir.
plz send me complete perl code for my project :
Comparison of term frequency and document frequency based feature selection methods in text categorization