Multi label classification

Growing Instability: Classifying Crisis Reports

  • 579

  • £20,000

  • Finished

Growing Instability: Meet our challenge winners

With all entries assessed and checked, the time has come to announce the winners of our inaugural data science competitions.

Our first challenge was titled, Growing Instability: Classifying crisis reports. Here, competitors had to make use of enormous amounts of test data to come up with a way to predict topic tags for news articles (no less than 1.6 million articles) in order to classify them and pick out the most relevant.

The real-world applications of this are enormous. With every major world event now generating gargantuan amounts of data through both traditional and social media, the need to cut through the noise and gain real insight has never been greater.

According to Leo Borrett of the Dstl:

“The diversity of text data in a variety of formats presents a huge challenge for Defence & Security, requiring a multifaceted approach. We were therefore really pleased that a number of different techniques were tested by competitors as part of this challenge. The outputs of this will help us to understand the most appropriate technologies to apply operationally in the future, to aid with the triage of large volumes of text data.”

Introducing our first place winner

Qingchen Wang Our competitions have attracted some of the brightest data science talent around, and the winners who ended up topping the leaderboard for this challenge demonstrated the kind of innovation and expertise needed to get the job done.

In first place is Qingchen Wang. With extensive data science experience, including a bachelor’s in computer science and a master’s in machine learning from UCL, he entered this competition while also pursuing a PhD in marketing analytics at the Amsterdam Business School.

Having seen the competitions advertised on LinkedIn, it was a combination of a big cash prize and huge creative freedom that piqued his interest:

“Two things were interesting to me. One was that the competition prize pool was quite significant, so I thought it would be really cool if I could win. And also, it's one of those competitions where there isn't a lot to start with, which means that for people who want to participate, they have to build a system from scratch.”

What kinds of approaches win data science challenges?

Building that system took a lot of fine-tuning. Indeed, that was the approach Qingchen took. With his first few submissions (he estimates he ended up making between 50 and 60 in total) being used to gauge feedback and decide what was working best.

This trial and error approach obviously worked wonders for Qingchen, but what we’ve found particularly interesting is the scope that the challenges afforded competitors.

Our second-place winner, Mario Filho, used a binary approach, theorising that machine learning models perform well with only two classes. Meanwhile, our third-place winner, Chirag Mahapatra, blended conventional supervised learning approaches with his own way of correlating data with tags in order to get the job done.

It’s this blend of backgrounds and approaches that lent such a competitive element, and which our community was keen to embrace. For those looking to get involved with data science challenges like these in the future, Qingchen has this to say:

“Every day, I see people posting things relating to data science which are interesting, but those are things that I'd never thought of. I come from a computer science background, and therefore I have a very specific focus of how I approach data science problems.

But, for somebody who's come from a statistics background, then their approach would be completely different. I think in terms of doing a project, or working as a team, it's important to have people from different backgrounds collaborating, rather than doing things one specific way.”

Huge congratulations to all our winners, and we wish them all the best in their future data science adventures.

Competition Timeline

This competition has finished.

This competition started on Monday 3rd April 2017 and ran for 6 weeks.


Imagine this scenario: A region’s stability is in decline due to unrest, crime and terrorism. We need a better understanding of this humanitarian crisis to decide how best to support the situation, gained through the information contained within a set of reports.

The challenge: We have acquired news articles containing potentially relevant information. Using these, we need you to use historical reports to determine the topics for new articles so that they can be classified and prioritised. This will allow analysts to focus on only the most pertinent details of this developing crisis.

Challenge Data

The data for this challenge has been acquired from a major international news provider, the Guardian. The training data represents the past historical record, and the test data represents new documents that require classification.

The datasets consist of:

  • Training data ( All the news articles published between 1999 and 2014. [2.3GB]
  • Test data ( A sample of the news articles published between 2015 and 2016. [13.8MB]
  • Topic dictionary (topicDictionary.txt): A list of topics for classifying articles that improve awareness of the developing crisis. [2.1KB]
  • Sample submission (sampleSubmission.csv): A sample submission file with the correct format but random topic predictions. [2.4MB]

Your Solution

You are required to classify each test article by predicting its topics. Your solution must:

  • Classify each test article by predicting the presence or absence of only those topics that are provided in the topic dictionary.
  • For each test article, predict a ‘1’ or ‘0’ for each topic in the dictionary where ‘1’ predicts the topic is present and ‘0’ predicts it is absent.

Each article may be classified by predicting that is has multiple topics, only one topic, or no topics from the tag dictionary.

The training data can be used in any way you wish (subject to the data terms in the Official Rules) in order to build your solution and predict topics for the test articles as accurately as you can.



Submitted solutions for the test articles will be evaluated with respect to the ground truth for those articles, which is exactly known. This assessment will be done using the well-known F1 score to produce an overall measure of performance.

The submitted results will be scored using the F1 score, defined as:

2×TP / (2×TP + FP + FN)


  • TP are the true positives
  • FP are the false positives
  • FN are the false negatives

There are two main methods in practice for averaging TPs, FPs and FNs to calculate an F1 score for multi-label classification problems. For this challenge we will use the micro-averaged F1 score. This is obtained by summing the TPs, FPs and FNs over each individual decision for the test examples, to produce a global average in which each test example (document) is weighted equally.

Submission file

The submission file must be a CSV file (standardised for upload to the website), structured as follows:

  • A header row containing a label for the test article reference (labelled ‘id’) and column labels for each topic in the order they are listed in the dictionary
  • Id column: This contains the unique article reference id’s for each article in the test dataset, e.g. TestData_000003
  • Topic columns: These contain your prediction for each topic. You must enter either a ‘1’ or ‘0’ for each topic with respect to each test article.

Submissions will need to be ordered by unique article reference ‘id’.

The predictions for each test article could contain any combination of ‘0’ and ‘1’s, including multiple ‘1’ or all ‘0’.

The number of submissions will be limited to 3 submissions per day.

Public/Private Leader board

The public leader board will display scores which have been calculated for a statistically representative subsample (30%) of the test articles.

A private leader board will calculate the score for the remaining (70%) test articles. The private score will be used to assess the competition winner.


Top 10 entries

Rank User Public Private Date Trend Entries
1 qingchenwang 0.6792 0.6724 17 May 2017, 10:31PM BST 52
2 mariofilho 0.6727 0.6677 17 May 2017, 4:08PM BST 63
3 chirag.mahapatra 0.6626 0.6498 17 May 2017, 11:40PM BST 81
4 abhishek 0.6409 0.6403 15 May 2017, 10:44PM BST 33
5 eyadsibai 0.6403 0.6377 17 May 2017, 6:24PM BST 2
6 DataGeek 0.6380 0.6334 17 May 2017, 5:08PM BST 26
7 ololo 0.6358 0.6313 17 May 2017, 9:35PM BST 20
8 petecog 0.6329 0.6294 17 May 2017, 8:39PM BST 18
9 Andras 0.6172 0.6173 17 May 2017, 3:11PM BST 22
10 acardoso 0.6172 0.6173 17 May 2017, 3:41PM BST 33