Goblins, Ghosts and Ghouls


Yesterday a very fun competition over at kaggle.com finished: Goblins, Ghosts and Ghouls was this years halloween competition. It was a competition targeted at beginners and therefor right up my alley. The task was to classify three types of monsters: goblins, ghosts and ghouls. In this blog post I will talk about how I went about predicting the type of monsters.

If you do not want to wait, you can directly open my notebooks and look at the code. You can find them over at my notebooks page

The first thing I do, when a new competition starts is look at the data. This should generally be your first step with every machine learning problem. Is the data useful? Which features are there? Are there enough examples? Of course, in this competition the data was useful because it was artificial. There are four features: bone_length, rotting_flesh, hair_length and has_soul. The following pairplot shows how the different classes correspond to the features.


As you can see the features are pretty much spread out. The only thing I could observe was a slight correlation between hair_length and has_soul. It seems as if the longer your hair is the more soul you have. That is something to take away from this competition. I tried to increase my score by using artificial features (multiplying hair_length and has_soul for example) but I could not increase my score with that.

The first notebook I created was a data analysis and classifier comparison notebook. I prepared the data and tried out several algorithms provide by scikit-learn. At the end I submitted my results using a VotingClassifier and got to a leaderboard score around 0.73. This means, that my classifier was correct in 73% of the previously unseen data. That was okay but I saw people getting around 75%. This post on the discussion board gave me the idea to try a neural network approach. So I created a second notebook and played around with the MultiLayerPerceptron classifier. Using grid search and a little bit of luck with the initial parameters I could increase my score to 0.74669. This was my endresult, too.

Overfitting bazookas

In the last days of the competition more and more people seemed to get very good scores (up to 1.0) which seemed very unlikely. The explanation for that is easy: the competitions used only a public leaderboard. This means, that the leaderboard score is calculated on the whole unseen dataset. This means, that by only changing 1 classification in your submission you could observe the change in the score and learn from that. Here is an example: let's say I classify the first monster as a goblin and submit my results. In the next example I change my classification to ghost and leave all the other examples unchanged. There are three possible outcomes:

  1. the leaderboard score increases: ghost is correct
  2. the leaderboard score decreases: goblin was correct
  3. the leaderboard score stays the same: both ghost and goblin are false, ghoul is correct

Using this technique a lot of kagglers got really good scores on the leaderboard. I think that this is sad because the goal of machine learning is to predict the class and not to overfit as much as possible. Of course this was a training competition but I would really like for kaggle to prevent this kind of "cheating" in the future.