SNLDB Update: Now contains impressions, characters, sketches and more
Earlier this year I uploaded my project snldb to GitHub and my blog. It aims to scrapy a database containing information about the longlasting TV show Saturday Night Live from the web. The resulting dataset can be found on kaggle. I try to update it as fast as possible after a new episode aired.
Recently the Colin Morris forked the repository and iterated on the code. Amongst other things he added additional entities and refactored the scraping code to use the boilerplate project structure from scrapy. He provided a pull-request to my repository and I now the snldb project is even better. So I encourage you to check out the new data. Maybe it is fitting that the new season of SNL just started. Here is a list of entites that the dataset concludes as of now (new entities are marked with a star):
- Appearances (*) (formerly actor_title)
- Casts (*)
- Characters (*)
- Episode Ratings
- Hosts (*)
- Impressions (*)
- Sketches (*)
Colin also analysed the data and uploaded a very interesting notebook. Check it out. It contains a lot more ideas for analyzing the data than the initial analysis I did.