Safe Passage: Spotlight on our winners
Following a phenomenal response to our data science challenges, it’s time to shine the spotlight on our first talented winners.
Safe Passage: Detecting and classifying vehicles in aerial imagery was the title of our second challenge. In this challenge, competitors had to use real aerial imagery to detect and classify different types of vehicles.
In a real-world security situation, this kind of solution might be used to guide people to safety through a dangerous conflict zone. It might also be used to scan the seas for suspect vessels, or even identify survivors of a capsized craft.
According to Leo Borrett of the Dstl:
“We were surprised with the high quality of entries for this challenge and the level of accuracy delivered by the competitors. The ability to not just classify feature types such as vehicles, but to achieve classification of vehicle types allows us to rapidly build situational awareness of a developing crisis.”
One thing uniting all these applications is the high-stakes nature of the work. So, whichever approach came out on top would need to be all but airtight.
Meet out first place winner
Lucky for us, then, that our vibrant data science community contained world-class talent. Our first-place prize winner, Guillermo Barbadillo Villanueva, was perfectly equipped to handle this challenge.
Guillermo’s background is in computer visuals, ideal for this brief, and he’s spent a long time sharpening his skills in other data science competitions run by the likes of Kaggle, and also previous iterations of Dstl challenges.
It’s this competitive element that seems to have driven Guillermo to success. He pulled into an early lead, but had to remain focused to stay there. In fact, he found himself bumped to the second-place spot with only two days left until the competition closed, only just managing to regain the lead.
He did this by making the most of our submission limits, estimating that he submitted around 90 entries to refine and perfect his score. To Guillermo, competitions like this are a fantastic way to develop yourself:
“Competitions are, more or less, a precise way to measure your abilities. They have a metric, and with that metric you can measure how you are in front of other people in the world. I think it's the most clear sign of how good you are at something.”
Meanwhile, second-place was awarded to Vladimir Iglovikov who came up with a flexible, scalable algorithm and a cunning use of network architecture, while third place was won by an entrant who wishes to remain anonymous, using a clever proprietary solution.
The fact that such diverse approaches could all get the job done is a brilliant testament to the depth of these challenges.
What made this challenge stand out?
Guillermo was quick to praise the quality of the data we used for our challenge, less ‘noisy’ data makes it easier to develop models and pinpoint what’s working, but it wasn’t all plain sailing. Tiny details like motorbikes proved far harder to define than cars or vans, and coming up with a unified way to distinguish between types of vehicles had him scratching his head.
So, what would he says to newcomers who might be drawn to this type of challenge?
“My advice is to enter as many competitions as possible. On the first competition, I ended up in a very bad position, but you learn from your mistakes. Each competition you do a little bit better than the previous, and in the end, if you're lucky, like me in this one, you can win a prize. It's just a matter of practice.”
Looking forward, Guillermo hopes to turn his attention to competitions which involve text, rather than his image-based comfort zone, in order to develop his skillset even further by testing it against this competitive community.
Enormous congratulations to all the winners of our data science challenges. It’s great to see such an enthusiastic response, and we wish you all the best for the future.
This competition has finished.
This competition started on Monday 3rd April 2017 and ran for 6 weeks.
Imagine this scenario: Circumstances in a foreign nation have become increasingly unstable, requiring safe passage for British nationals out of the crisis region.
The challenge: We have received aerial imagery data that allows us to identify friendly and suspicious vehicle types. Now, we need you to detect and classify vehicles in these images so that we can assess potential risks along key convoy routes.
The data for this challenge is high resolution aerial imagery. For the purpose of this challenge, a UK city has been used as the geographical location to represent part of a foreign nation.
The datasets consist of:
- Training images (training.zip): A set of 600 images (JPEG) at 5cm resolution, each one covering an area of 100m by 100m. [580MB]
- Training observations (trainingObservations.csv): The label and location of the vehicles of interest corresponding to the training images. [252.9KB]
- Test images (test.zip): A separate set of 600 images (JPEG) at 5cm resolution, each one covering an area of 100m by 100m. [567MB]
- Sample submission (sampleSubmission.csv): A sample submission file with the correct format, but with random detections. [145.5KB]
You are required to detect and classify the vehicles of interest within high-resolution aerial images. Your submission must:
- Find and record all vehicles in the test images that match one of the 9 vehicle classes of interest. Vehicles cut off by the edge of the image can be ignored.
- Record the image, vehicle class, and centre pixel position of the vehicle for each observation of interest in the submission CSV format. The exact centre pixel position is not vital, but it must be within the ground truth acceptance boundary.
The training images and training observations can be used in any way you decide (subject to the data terms in the Official Rules) to help find the vehicles of interest in the test images.
The submitted results file will be scored using the Jaccard Index, defined as:
- TP are the true positives
- FP are the false positives
- FN are the false negatives
The submission file format is similar to the training observations format. It is a CSV formatted file (standardised for upload to the website), but only requires the id and detection columns. These columns are formatted to be the same as those in the training observations and are defined as:
- Id: The unique id for this row. This is a combination of the image name (e.g. TQ2378_0_0.jpg) with file extension (.jpg) removed, an underscore (_), and the vehicle class (e.g. A). For this example the Id would be TQ2378_0_0_A.
- Detections: The complete set of detections for the image and class combination. Each detection is the centre pixel position of the vehicle for the observation, separated by a colon (:), e.g. xPixel:yPixel, and a pipe (|) is used to separate detections. When there are no detections, “None” is used. Pixel coordinates are integers and defined using standard convention (i.e. with the image origin at top left).
Submissions will need to be ordered by unique ‘id’ (image name then class).
All combinations of image name and vehicle classes need to be included, even if the images do not contain any vehicles of interest. ‘None’ is a valid and possible correct result.
The number of submissions will be limited to 3 submissions per day.
Public/Private Leader board
The public leader board will show results during the competition for a subset (33%) of the test images.
The private leader board will calculate the score for the remaining (67%) test images and will be used to assess the competition winner.
Top 10 entries
|1||gbarbadillo||0.8662||0.8713||17 May 2017, 6:55PM BST||84|
|2||ternaus||0.8581||0.8569||17 May 2017, 4:47PM BST||31|
|3||kit1||0.8633||0.8527||16 May 2017, 8:12PM BST||36|
|4||jane.ostin||0.8490||0.8477||14 May 2017, 8:40PM BST||22|
|5||Xi_Lian||0.8493||0.8455||16 May 2017, 11:33PM BST||8|
|6||Kyle||0.8320||0.8369||16 May 2017, 4:05PM BST||48|
|7||vkassym||0.8333||0.8325||14 May 2017, 8:22PM BST||17|
|8||dlrocks||0.8142||0.8189||17 May 2017, 10:46PM BST||20|
|9||cogitae||0.7964||0.8054||17 May 2017, 9:03PM BST||48|
|10||codewarrior||0.8043||0.8000||04 May 2017, 11:09PM BST||16|