Strange Things Are Afoot at the IMDb

September 8, 2017 at 2:29 pm 3 comments

| Gabriel |

I was helping a friend check something on IMDb for a paper and so we went to the URL that gives you the raw data. We found it’s in a completely different format than it was last time I checked, about a year ago.

The old data will be available until November 2017. I suggest you grab a complete copy while you still can.

Good news: The data is in a much simpler format, being six wide tables that are tab-separated row/column text files. You’ll no longer need my Perl scripts to convert them from a few dozen files that are a weird mish mash of field-tagged format and the weirdest tab-delimited text you’ve ever seen. Good riddance.

Bad news: It’s hard to use. S3 is designed for developers not end users. You could download the old version with Chrome or “curl” from the command line. The new version requires you to create an S3 account and as best I can tell, there’s no way to just use the S3 web interface to get it. There is sample Java code, but it requires supplying your account credentials which gives me cold sweat flashbacks to when Twitter changed its API and my R scrape broke. Anyway, bottom line being you’ll probably need IT to help you with this.

Really bad news: A lot of the files are gone. There’s no country by country release dates, no box offices, no plot keywords, there are only up to three genres, no distributor or production company, etc. These are all things I’ve used in publications.

Advertisements

Entry filed under: Uncategorized. Tags: .

The more things change Blue upon blue

3 Comments Add your own

  • 1. lseltzer  |  September 8, 2017 at 5:07 pm

    Related to this?
    SAG-AFTRA Seeks to Join IMDb Suit Over Actor-Age Law
    http://variety.com/2017/digital/news/sag-aftra-imdb-suit-actor-age-law-1201957242/

    Reply
    • 2. gabrielrossman  |  September 8, 2017 at 5:10 pm

      I doubt it, especially as those variables are still in the S3 version. I assume it’s about some third party sites querying the data too often so they want to put the bandwidth on the meter.

      Reply
      • 3. lseltzer  |  September 8, 2017 at 5:13 pm

        You’re right I’m sure. I just saw ‘IMDb’ and remembered there was some sort of action in the age case recently.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


The Culture Geeks


%d bloggers like this: