Frequently Asked Questions

Introducing Data Science Challenge

Who is driving this project?

This project has been driven by the Defence Science and Technology Laboratory (Dstl) and is being jointly sponsored by three other government departments – the Government Office for Science, MI5 and the Secret Intelligence Service.

The Strategic Defence and Security Review talked about ‘advantage through innovation.’ What does that mean?

In short ‘advantage through innovation’ seeks to drive innovation that will help to keep the UK safe and prosperous in the future.

Advances in technology hold enormous potential for the United Kingdom’s security and prosperity whilst also posing risks as adversaries might seek to use them against us. The global landscape has shifted with the private sector driving today’s rapid pace of technological, social and cultural change. Innovation is therefore important to maintaining our defence and security advantages into the future. We must adapt to stay ahead and achieve our goal of maintaining strategic edge.

What challenges does the government currently face in terms of defence, intelligence and accessing data science expertise?

Modern technology generates huge amounts of data. This is particularly true in the defence and security sectors.

Take a developing humanitarian crisis, for example. The national and regional media will cover it, NGO’s will start to think about how they can best react, satellites will take images, diplomats and various domestic and overseas Governments will start discussing how they can help and the list goes on. This will generate huge amounts of material. We need to be able to quickly analyse this, sifting out the relevant from the irrelevant so that we can focus our efforts on understanding and interpreting the most important material.

Dealing with big data isn’t, of course, a problem that is unique to Governments and the pool of data scientists is relatively small. So we are always competing with other organisations. This challenge will help us engage with new data scientists and organisations who haven’t so far worked with us. This will increase the expertise that we can call on now and in the future.

How will this site support the government's strategic goals in the long term?

The data science Challenge seeks to position the UK’s defence and security organisations in order to better engage with data scientists in the future. Through better engagement and improved understanding of requirements the communities (defence and security as well as data science) may collaborate and communicate more effectively.

The success of this project will be reviewed before a decision is made about future challenges.

What do you see as the biggest benefits of crowdsourcing data science expertise to support the government's strategic goals?

The challenges will:

  1. Help us expand our supplier base. Although we already work with many great data scientists in industry and in academia (undergrads, postgrads, postdocs, lecturers etc). We are keen to grow our supplier base and harness new talent. We are hoping that new suppliers will be excited by the challenges and apply their skills to solving them.
  2. Help to drive innovation. We are supplying data, based on real world situations that participants might not normally have access to.
  3. Help to create an enduring community working on defence and security problems?

How would you sum this site up in one sentence?

This is an exciting, novel and unprecedented way to work with the UK’s defence and security organisations.

What do you see as the key benefits for participants in these challenges?

There are hopefully a number of benefits for participants in the challenges. The first is the opportunity to work on representative data. Although the data has all been synthesised or procured – it is representative of data generated or used within the Defence and Security sector. The second benefit is the opportunity to build experience and contacts with the Defence and Security sectors that could help them secure contracts in the future. Thirdly, there is prize money available for the top three solutions and the potential for future engagement and interaction with the sponsors. Finally there is the kudos associated of coming in the top three places in these difficult challenges.

Are the challenges new?

This is the first time Dstl and our government partners have launched and run their own challenges under the Data Science Challenge brand. Dstl has just run a separate challenge on Kaggle, a platform for data science competitions. This challenge was about Satellite Imagery Feature Detection.

What devices can I access the website from?

The website is designed to be accessible from the majority of commonly used mobile devices i.e. smartphones and tablets (If you have any difficulty accessing or seeing the website on a mobile device, please ensure you are using the latest version of your operating system). It is intended that people participating in the challenges will predominantly do so using a desktop computer, rather than a mobile device, because of the data download and upload sizes involved in participation and submission.

Is there a Data Science Challenge app that I can download?

There is no mobile app for the Data Science Challenge because the website is designed to be accessible from the majority of commonly used mobile devices i.e. smartphones and tablets.

What are the challenges that the MoD face on a daily basis when dealing with data?

We have access to vast amounts of unstructured data from a variety of sources and we always need to find ways to improve our analysis.

What specific challenges are the MoD facing now?

The MOD faces a variety of specific challenges related to processing and exploitation of information, which are being addressed by the wider S&T programme.

What sort of information feeds into operational decision making at the MoD?

We use a wide range of sources. This includes open sources such as media reports as well as data available to us from defence assets.

Do the MoD have the capabilities covered in the challenges already?

We aren’t going to discuss specific capabilities for obvious security reasons. The results of the challenges will hopefully help us improve and enhance our existing capabilities.

How much raw data does the MoD get on operations?

We get terabytes of data on operations. A single aircraft sortie, for example, can generate tens of terabytes of data. Realistically this cannot be analysed by humans so we need data science to triage data and focus our analysts on the highest priority data.

How data driven will future operations be?

The MoD recognises the importance of data to inform operations. Data is critical to decision making on operations and the amount of data that is available will only increase in the future.

General Challenge FAQs

Getting in touch

What should I do if a spot an error in the data?

First, please use the challenge forum to communicate with other participants and see if they have observed the same error. Our challenge experts will also be on the forum to respond to queries about the data.

If I have a particular concern about the content of the data, who should I speak to?

Please contact us using our Feedback form.

Can I get involved with forums?

Yes! We actively encourage you to participate in the community forum. There is a forum for each of the two challenges. Please add any queries or comments relating to challenge materials or objectives to share with the community.

What is a Challenge Master?

The Challenge Master is our expert for a specific challenge and will be available to offer guidance through the challenge forum. The Challenge Master is also responsible for leading the evaluation of candidates winning solutions by the judging panel, to validate that they comply with challenge rules.

How will I stay up to date with the challenge developments?

Once registered at datasciencechallenge.org, you will be kept up to date via email as to key milestones during the Data Science Challenge. You can also follow the Data Science Challenge on Twitter @D_S_challenge and LinkedIn.

Participation

Who can get involved?

This competition is open to individuals over the age of 18. See the Official Rules for further details.

Entries made by or on behalf of corporate entities will not be accepted. Additionally officers, directors, employees and their immediate families of the Sponsoring Agencies, BAE Systems, Capgemini UK PLC, Roke Manor Research Limited and their respective group companies, contractors and agents may not participate in Challenges.

Foreign nationals can also participate. However, it should be noted that no payment shall be made (whether directly or via a third party/country) to:

  • any bank account registered and maintained in any country with a score of 37 or less according to Transparency International's Corruption Perceptions Index 2014; or
  • an individual who is a national and/or resident of, or located in, any country with a score of 37 or less according to Transparency International's Corruption Perceptions Index 2014.

Solution

Can I keep the data after the challenge finishes and re-use it for other purposes, including commercial purposes?

No. Challenge data and materials must only be used for participation in the competition. They must be deleted at the end of a challenge. See the official rules for full details.

Do I need certain software to enter?

There is no specific restriction on the software used. The only restriction is that the final model (but not training of the model) must be a standalone solution capable of being discretely evaluated in an independent environment without internet access.

Can I use my university’s computer?

There is no specific restriction on IT infrastructure used. The only restriction is that the final model (but not training of the model) must be a standalone solution capable of being discretely evaluated in an independent environment without internet access.

Can I use my company’s software?

There is no specific restriction on commercial software used. However, please see Official Rules for terms of use for COTS software and COTS IPR. All software must be authorised for use in the challenges in accordance with its licencing terms.

Can I use cloud computing services such as Amazon Web Services, Heroku, Microsoft Azure, Google Compute etc?

There is no specific restriction on the use of cloud computing. Online or cloud tools can be used for training models. The only restriction is that the final model (but not training of the model) must be a standalone solution capable of being discretely evaluated in an independent environment without internet access.

Do I have to code in a particular language to enter the challenges?

There is no specific restriction on software language. However, you must use technologies recognised in industry so that the solution can be validated by our experts.

Challenge Scoring

How will submissions be scored?

Submissions for each challenge are scored by the metric specified for that challenge. For the Growing Instability challenge it can be found here, and for the Safe Passage challenge it can be found here.

How will I know when my submission has been successfully scored?

You will see confirmation of The Submission in the challenge upon the successful upload of each submission. You will then be able to see the position of your most recent submission on the public leader board.

What is the difference between the public and private leader board?

You can view your performance on the public leader board. This is generated from a subset of the test data (typically 30%) that is specific to each challenge. This leader board is visible to anyone who visits the website.

A private leader board is used to determine the winner and is not visible to challenge entrants.

Can I view the scores of my previous submissions?

Yes, this available in The Submissions page in challenges.

What happens if more than one competitor gets the same score?

Each submission will be ranked by score on the public leader board. For two submissions with equal scores, the one that was first submitted will be ranked higher.

When can I see the leader board?

The public leader board is visible from the start of the challenge.

When can I enter my first submission?

You can upload submissions any time over the duration of the competition. We encourage you to upload the sample submission, which has a very low score, to trigger appearance on the leader board for the first time.

When will the final leader board be available?

The public leader board will be frozen at a competition’s end. The private leader board will be visible following evaluation of winning solutions to display the final ranking of entrants and announce the winner.

How many times can I enter?

You may submit up to three entries per day. This is to limit feedback on submissions and mitigate gaming the system.

What are the judging criteria for each of the challenges?

Submissions are scored according to a metric specific to each challenge which will be published on the challenge website. Winners will be selected according to the top scores on the private leader board. The public leader board will be visible to entrants during the competition, but the private leader board will determine the actual competition leaders. The top three leaders on the private leader board will be asked to submit their solution and documentation for validation, before confirming them as prize winners. Please view the judging criteria outlined in the Official Rules.

Winning and prizes

How are challenge winners selected?

Winners will be selected according to the top scores on the private leader board. The top three entrants on the private leader board will be asked to provide their solutions and documentation for validation, before being confirmed as prize winners.

Who will evaluate winning solutions?

A judging panel of subject matter experts, led by challenge masters, will assess the winning solution in an independent evaluation environment.

When will you announce the winners?

Winners will be announced within four weeks of the closing date of the relevant challenge.

What is the prize?

There will be a winner for each of the Data Science Challenges – one for the Safe Passage: Detecting and Classifying Vehicles in Ariel Imagery challenge and one for the Growing Instability: Classifying Crisis Reports challenge. Cash prizes will be awarded for each competition. The first place prize is £20,000, the second placed entrant will receive £12,000 whilst the third will get £8,000.

Where solutions are particularly interesting or innovative, the UK Government may look to engage with the participants in the future.

Note that the cash prizes will only be paid out to a bank account which is not in a country with a score of 37 or less according to Transparency International’s Corruption Perceptions Index 2014. Similarly they will not be paid out to an individual who is a national/located in one of these countries too or whom the UK Government is not reasonably satisfied as to the potential recipient’s identity.

If I am a winner at the end of the competition, what happens next?

The three entrants winners will be invited to reproduce their results obtained during the competition in an independent evaluation environment. You will be required to commission and set up the evaluation environment and allow access to the judging panel.

Once you have installed your solution and provided access to the judging panel, they will follow your instructions for running the winning solution against the test dataset and compare these results to your submitted score. We will then run your solution against an unseen hold-out dataset and score your solution. Finally, we will perform a review of your solution documentation and code to complete validation before awarding the prize.

What happens with the winning solution, who owns it?

Please see Competition Official Rules for terms of use for software and IPR. The winner will make all software and intellectual property in the winning solution (to the extent that it is not commercial-of-the-shelf or open source) available either generally on an MIT open source licence or by granting UK Government a permissive licence on the terms set out in the Official Rules.

How long does it take to get the challenge results?

Winners will be contacted within four weeks of the closing date of the relevant challenge.

Growing Instability: Classifying Crisis Reports FAQs

How much data is available?

There are 7581 test articles (in a 14MB zip file) and 1.6M training articles (in a 2.4GB zip file). The breakdown of the number of articles within each training file is as follows:

  • 1999a_TrainingData.json contains 25096 articles
  • 1999b_TrainingData.json contains 26958 articles
  • 2000a_TrainingData.json contains 32193 articles
  • 2000b_TrainingData.json contains 35244 articles
  • 2001a_TrainingData.json contains 40325 articles
  • 2001b_TrainingData.json contains 44132 articles
  • 2002a_TrainingData.json contains 44842 articles
  • 2002b_TrainingData.json contains 45159 articles
  • 2003a_TrainingData.json contains 47804 articles
  • 2003b_TrainingData.json contains 46822 articles
  • 2004a_TrainingData.json contains 48954 articles
  • 2004b_TrainingData.json contains 48924 articles
  • 2005a_TrainingData.json contains 50466 articles
  • 2005b_TrainingData.json contains 49338 articles
  • 2006a_TrainingData.json contains 52970 articles
  • 2006b_TrainingData.json contains 54181 articles
  • 2007a_TrainingData.json contains 60094 articles
  • 2007b_TrainingData.json contains 61985 articles
  • 2008a_TrainingData.json contains 65998 articles
  • 2008b_TrainingData.json contains 70312 articles
  • 2009a_TrainingData.json contains 67378 articles
  • 2009b_TrainingData.json contains 44232 articles
  • 2010a_TrainingData.json contains 44129 articles
  • 2010b_TrainingData.json contains 55071 articles
  • 2011a_TrainingData.json contains 55584 articles
  • 2011b_TrainingData.json contains 54224 articles
  • 2012a_TrainingData.json contains 55452 articles
  • 2012b_TrainingData.json contains 52452 articles
  • 2013a_TrainingData.json contains 52734 articles
  • 2013b_TrainingData.json contains 56211 articles
  • 2014a_TrainingData.json contains 58291 articles
  • 2014b_TrainingData.json contains 52907 articles

How much storage space do I need?

The total download is approximately 3GB. Additional storage may be required for processing, and will be dependent on your solution.

Can I use other data for training purposes?

Yes. You may use other additional data for training purposes, on the condition that:

  • you have permission from the data owner to use the data for these purposes.
  • the additional data is not acquired from the same news data provider (or any of it’s derivatives) that has been used for challenge training and test data. This is so that additional metadata marked up in source articles (that has not been made available in the provided challenge datasets) is not used for unfair advantage.

When can I enter my first submission?

You can submit your solutions as soon as the competition starts and you have registered to enter at datasciencechallenge.org.

Safe Passage: Detecting and Classifying Vehicles in Aerial Imagery FAQs

How much data is available?

600 images are provided as training. Each image covers 100m by 100m. A further 600 images (each covering 100m by 100m) are provided as the test dataset. The total dataset size on disk is approximately 1.3GB. Please see the official rules for full details on the use of this dataset.

How much storage space do I need?

The total downloads is approx. 1.3GB. Additional storage will be required for processing, and will be dependent on your solution.

I think there is an observation in training image that should belong to a vehicle class but doesn’t. Why?

It may be classified as low confidence and therefore not provided as a training sample. However, you can use the training data in any way you see fit (subject to terms of use) to produce the best solution.

Can I use other data for training purposes?

Yes, any additional data can be used for training purposes on the condition that you have permission from the data owner to use the data for these purposes.

When can I enter my first submission?

You can submit your solutions as soon as the competition starts and you have registered to enter at datasciencechallenge.org.