Archive for September, 2017

Strange Things Are Afoot at the IMDb

| Gabriel |

I was helping a friend check something on IMDb for a paper and so we went to the URL that gives you the raw data. We found it’s in a completely different format than it was last time I checked, about a year ago.

The old data will be available until November 2017. I suggest you grab a complete copy while you still can.

Good news: The data is in a much simpler format, being six wide tables that are tab-separated row/column text files. You’ll no longer need my Perl scripts to convert them from a few dozen files that are a weird mish mash of field-tagged format and the weirdest tab-delimited text you’ve ever seen. Good riddance.

Bad news: It’s hard to use. S3 is designed for developers not end users. You could download the old version with Chrome or “curl” from the command line. The new version requires you to create an S3 account and as best I can tell, there’s no way to just use the S3 web interface to get it. There is sample Java code, but it requires supplying your account credentials which gives me cold sweat flashbacks to when Twitter changed its API and my R scrape broke. Anyway, bottom line being you’ll probably need IT to help you with this.

Really bad news: A lot of the files are gone. There’s no country by country release dates, no box offices, no plot keywords, there are only up to three genres, no distributor or production company, etc. These are all things I’ve used in publications.

Advertisements

September 8, 2017 at 2:29 pm 3 comments


The Culture Geeks