Challenge

Jump to: navigation, search

MoReBikeS: 2015 ECML-PKDD Challenge on "Model Reuse with Bike rental Station data"

Welcome to the main page of the ECML PKDD 2015 Challenge on Model Reuse with Bike rental Station data!


NEWS AND RESULTS

FINAL RESULTS ON FULL TEST DATA

Among the 23 teams of the small test data challenge 10 teams participated in the full test data challenge. The results are as follows.

Rank Team Team members Mean absolute error
#1 Hao Hao Song, Peter Flach 2.0143
#2 Yu Chen Yu Chen, Peter Flach 2.0515
#3 Arun B S Arun Bala Subramaniyan, Rong Pan 2.0667
#4 DCC_UFLA_15 Fernando Simeone, Diego S. Mendes, Ahmed A. A. Esmin 2.0794
#5 Farhan Chowdhury Farhan Ahmed 2.1027
#6 Denis Denis Moreira dos Reis 2.1725
#7 thelastone Víctor Núñez Monsálvez 2.2098
#8 arp Andrés Ramos, Francisco Rangel 2.3092
#9 Tom&Niall Tom Diethe, Niall Twomey, Peter Flach 2.3921
#10 Dmlab Gergo Barta 2.4155

Top 3 participants have received the prizes of ECML/PKDD 2015 free registration.

Full test data are available here: http://reframe-d2k.org/File:Morebikes_full_test_data.csv.zip

CHALLENGE WORKSHOP

The challenge workshop was held on the 11th Sept in Porto at ECML-PKDD 2015.

The schedule is available here: http://users.dsic.upv.es/~flip/LMCE2015/MoReBikeS_Schedule.html.

The slides of Meelis Kull presenting the summary of the challenge are available here: http://reframe-d2k.org/File:2015_09_11_morebikes_summary.pdf

CHALLENGE WORKSHOP PAPERS

Here are the challenge workshop papers from all 10 full test data challenge participants.

Team Authors Title PDF
arp Andrés Ramos, Francisco Rangel Autoritas participation at MoReBikeS: Model Reuse with Bike rental Station data MoReBikeS 2015 paper 8.pdf
Arun B S Arun Bala Subramaniyan, Rong Pan Prediction of Bike Rental using Model Reuse Strategy MoReBikeS 2015 paper 5.pdf
DCC_UFLA_15 Fernando Simeone, Diego S. Mendes, Ahmed A. A. Esmin Nearest-Neighbor Distance Method Applied to Model Reuse With Bike Rental Station Data MoReBikeS 2015 paper 4.pdf
Denis Denis Moreira dos Reis Selecting Training Data By Evaluating Existing Models: Reusing models for the MoReBikeS Challenge]] MoReBikeS 2015 paper 1.pdf
Dmlab Gergo Barta Bike sharing model reuse framework for tree-based ensembles MoReBikeS 2015 paper 6.pdf
Farhan Chowdhury Farhan Ahmed Reframing Bike Challenge Problem using Model Selection MoReBikeS 2015 paper 3.pdf
Hao Hao Song, Peter Flach Model Reuse with Subgroup Discovery MoReBikeS 2015 paper 9.pdf
thelastone Víctor Núñez Monsálvez Reusing Models Using K-Nearest Neighbors And Weighted Arithmetic Mean To Predict Future Use Of Bike Stations For The MoReBikeS Challenge 2015 MoReBikeS 2015 paper 7.pdf
Tom&Niall Tom Diethe, Niall Twomey, Peter Flach Gaussian Process Model Re-Use MoReBikeS 2015 paper 10.pdf
Yu Chen Yu Chen, Peter Flach SVR-based Modelling for the MoReBikeS Challenge: Analysis, Visualisation and Prediction MoReBikeS 2015 paper 2.pdf

BEST STUDENT AWARD

MoReBikeS best student award is decided based on the final leaderboard on small test data (see below). The results are as follows:

Award Rank among all participants Name University Country
Best Student Award 1 Yu Chen University of Bristol UK
Best Student Runner-up 2 Victor Nuñez Monsalvez Universitat Politècnica de València Spain

The special prize of a free one year Cyclocity subscription goes to Victor Nuñez Monsalvez because the winner has kindly passed the prize to the runner-up due to Cyclocity not operating in the UK.

FINAL LEADERBOARD OF ALL PARTICIPANTS ON SMALL TEST DATA (June 8, 2015)

Earlier leaderboards are available at http://reframe-d2k.org/Challenge_Leaderboards

Rank Name Submission number Mean absolute error
1 Yu Chen 10 2.376
2 thelastone 5 2.416
3 Hao 3 2.454
4 Arun B S 9 2.502
5 DCC_UFLA_15 1 2.532
6 lmontes 2 2.536
7 Farhan 2 2.554
8 arp 2 2.556
9 Denis 7 2.604
10 VLC8 3 2.690
11 masfworld 1 2.736
12 Dmlab 12 2.829
13 ComUnTir 1 2.922
14 Reem 2 3.068
15 Bikes 3h ago Baseline 3.288
16 emakumea 2 3.288
17 kafka 3 3.288
18 AEslava 2 3.422
19 LMAF 2 3.422
20 GLN 4 3.458
21 BigBones 1 4.162
22 Robin 1 4.402
23 yeha 1 4.460
24 iseddel 1 4.518
25 VMM 1 4.612
26 All zeros Baseline 7.550


FINAL LEADERBOARD OF ALL SUBMISSIONS ON SMALL TEST DATA (June 8, 2015)

Rank Name Submission number Mean absolute error
1 Yu Chen 10 2.376
2 thelastone 5 2.416
3 thelastone 3 2.430
4 thelastone 6 2.430
5 thelastone 4 2.434
6 thelastone 7 2.444
7 Hao 3 2.454
8 Yu Chen 8 2.460
9 Yu Chen 7 2.461
10 Yu Chen 6 2.469
11 Hao 5 2.478
12 Hao 8 2.478
13 Hao 4 2.484
14 Hao 7 2.492
15 Hao 6 2.494
16 Yu Chen 3 2.496
17 Arun B S 9 2.502
18 Arun B S 12 2.514
19 Hao 9 2.514
20 Arun B S 8 2.519
21 Yu Chen 4 2.520
22 Yu Chen 9 2.520
23 Arun B S 11 2.526
24 Yu Chen 11 2.528
25 DCC_UFLA_15 1 2.532
26 lmontes 2 2.536
27 Hao 12 2.540
28 Farhan 2 2.554
29 arp 2 2.556
30 arp 3 2.558
31 Arun B S 10 2.560
32 Yu Chen 12 2.562
33 Hao 10 2.564
34 Farhan 1 2.572
35 Hao 11 2.590
36 Denis 7 2.604
37 Denis 4 2.608
38 Yu Chen 5 2.612
39 Yu Chen 2 2.625
40 Hao 1 2.634
41 Denis 3 2.642
42 Denis 9 2.642
43 Denis 12 2.646
44 Denis 11 2.658
45 Denis 6 2.680
46 VLC8 3 2.690
47 Hao 2 2.700
48 thelastone 2 2.722
49 Arun B S 4 2.724
50 masfworld 1 2.736
51 Denis 10 2.756
52 Denis 8 2.764
53 Denis 2 2.768
54 Arun B S 5 2.774
55 Denis 5 2.774
56 Denis 1 2.778
57 Dmlab 12 2.829
58 lmontes 1 2.912
59 thelastone 1 2.912
60 ComUnTir 1 2.922
61 Dmlab 10 2.966
62 Arun B S 6 3.068
63 Arun B S 7 3.068
64 Reem 2 3.068
65 Dmlab 5 3.099
66 Dmlab 9 3.199
67 Bikes 3h ago Baseline 3.288
68 emakumea 2 3.288
69 kafka 3 3.288
70 masfworld 3 3.288
71 Dmlab 11 3.311
72 Dmlab 8 3.315
73 Dmlab 7 3.316
74 Dmlab 4 3.390
75 Yu Chen 1 3.414
76 AEslava 2 3.422
77 LMAF 2 3.422
78 Dmlab 3 3.451
79 GLN 4 3.458
80 Dmlab 2 3.471
81 Arun B S 3 3.556
82 Dmlab 1 3.606
83 GLN 3 3.632
84 Arun B S 1 3.636
85 Arun B S 2 3.640
86 kafka 6 3.652
87 kafka 5 3.678
88 GLN 2 3.762
89 Dmlab 6 3.791
90 AEslava 4 3.854
91 LMAF 4 3.854
92 VLC8 2 3.924
93 AEslava 3 4.014
94 LMAF 3 4.014
95 AEslava 5 4.050
96 LMAF 5 4.050
97 BigBones 1 4.162
98 Reem 1 4.332
99 Robin 1 4.402
100 kafka 1 4.412
101 yeha 1 4.460
102 GLN 1 4.468
103 iseddel 1 4.518
104 emakumea 1 4.522
105 VLC8 1 4.538
106 kafka 4 4.580
107 VMM 1 4.612
108 arp 1 4.640
109 BigBones 2 4.664
110 BigBones 3 4.674
111 AEslava 1 4.776
112 LMAF 1 4.776
113 kafka 2 4.846
114 VMM 2 5.166
115 masfworld 4 5.476
116 yeha 2 6.568
117 masfworld 2 6.898
118 All zeros Baseline 7.550

ABOUT THE CHALLENGE

INTRODUCTION AND MOTIVATION

The task in this challenge is to predict the number of available bikes in every bike rental stations 3 hours in advance. There are at least two use cases for such predictions. First, a user plans to rent (or return) a bike in 3 hours time and wants to choose a bike station which is not empty (or full). Second, the company wants to avoid situations where a station is empty or full and therefore needs to move bikes between stations. For this purpose they need to know which stations are more likely to be empty or full soon. In both these cases the prediction can be based on what time of the day, week, or year it is and what the weather conditions are. Also, information about the current status in the station can be used. A successful predictor needs to take into account all of these aspects, as well as the profile of bike availability in this station, learned from historical information. The quality of predictions can be the better the more historical information is available. In this challenge we explore a setting where there are 200 stations which have been running for more than 2 years and 75 stations which have just been open for a month. The task is to reuse the models learned on 200 "old" stations in order to improve prediction performance on the 75 "new" stations. Hence, this challenge evaluates prediction performance on the 75 stations. If we would give full historical data about the 200 stations then we would be evaluating model building and model reuse performance at the same time. Therefore, we have decided to build models ourselves and provide the models without the full data that they have been trained on. Still, full training data for 10 stations is provided in order to facilitate the analysis about how a model can be reused in other stations and in later times. For the rest of the 190 training stations we provide only data for one month, also to help in deciding how the models can be reused.

PARTICIPATION

This challenge is open for everyone to participate by submitting predictions to the public leaderboard which is refreshed on May 4, 18, 25 and June 1. The results of the last leaderboard will be immediately published as the final results of the small test set challenge.

We encourage everyone to participate in the full test set challenge as well. For this it is required to submit the code and a paper describing the chosen prediction method by June 15 (changed from June 8) and the predictions on full test data by June 22. The main focus of the paper should be to explain the solution to other participants and interested people, comparison to other existing methods is not required. The accepted papers are presented at the challenge workshop at ECML PKDD 2015 on September 11, 2015. The winner of the MoReBikeS challenge is the presenting author with lowest mean absolute error predictions on the full test data.

TASK

The task is to predict the number of bikes in the stations 3 hours in advance.

DESCRIPTION OF DATA AND MODELS

The challenge is to reuse the models learned in 200 training stations (numbered from 1 to 200) for 75 deployment stations (numbered from 201 to 275). The linear models have been trained on the data of the training stations from the period June 2012-September 2014. The deployment data covers all the 275 stations and is about October 2014. The test data is about 75 test stations from the period November 2014-January 2015. Test data for the leaderboard is about 25 test stations from the period November 2014-December 2014. Full test data about 50 other test stations from the period November 2014-January 2015 is given to participants after paper submission. The training and deployment datasets cover all hours of the respective periods, however some timepoints have some missing values, also in the target variable. All the data and models together with detailed information are available here: http://reframe-d2k.org/Challenge_Download.

CHALLENGE TIMELINE (UPDATED!)

  • March 31, 2015: Training and deployment data, linear models, and leaderboard test data on-line
  • May 4, 18, 25 and June 1, 2015: Leaderboard refreshed for submissions up to that time
  • June 8, 2015 (NEW!): Final leaderboard refreshed for submissions up to that time
  • June 15, 2015 (extended from June 8): Deadline to submit paper and source code
  • June 16, 2015 (extended from June 9): Full test data available
  • June 22, 2015: Deadline to submit predictions on the full test set
  • July 6, 2015: Notification of acceptance
  • August 3, 2015: Deadline to submit camera-ready version
  • September 11, 2015: Challenge workshop at ECML PKDD 2015, Final results announced

All deadlines are 11:59pm in the latest timezone (American Samoa).

SUBMISSIONS (UPDATED!)

  • A leaderboard submission is a single CSV file with 3 columns: station number, timestamp, and the predicted number of bikes, see the file example_leaderboard_submission.csv at http://reframe-d2k.org/Challenge_Download. This file has to be sent by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge leaderboard submission <1-12> from <Your Name>'. Each participant can submit up to 12 files (increased from 10 files) before or on June 8 (extended from June 1). The submissions after the 12th are ignored. On May 4, 18, 25, June 1 and June 8 all leaderboard submissions are evaluated for mean absolute error and the results are published on this site, together with the participant's name and submission number (1-12).
  • Paper in PDF format should be uploaded to Easychair https://easychair.org/conferences/?conf=morebikes2015 and source code sent as a single compressed file by e-mail attachment to meelis DOT kull AT bristol DOT ac DOT uk with the subject 'Challenge source code from <Your Name>'.
  • Full test prediction submission is a single CSV file, formatted the same as leaderboard submission and submitted to the same e-mail address with the subject 'Challenge full test submission from <Your Name>'.


EVALUATION AND RULES

The predictions are evaluated according to the mean absolute error between the predicted and true values. The winner is the participant who submitted the predictions with the lowest mean absolute error. In case of tie, the approach (generality, efficiency) will be take into account. All predictions have to be programmatically generated (not manually entered). The prediction for each test time-point is allowed to use only the given features of this instance in the test dataset (NOT THE FEATURES OF THE OTHER TEST TIME-POINTS) and all provided training and deployment data and the models. Other data sources are not allowed. The prediction on the full test data set must be obtained by running the submitted code without any changes and without any parameters other than the test file name.

PRIZE

Three participants who provided the best predictions on the full test set are awarded one free registration to the ECML-PKDD 2015 conference each. CATEDRA INNDEA, ECML-PKDD and REFRAME sponsor these free registrations (at the early rate).

A special prize is awarded to the best student in the final leaderboard : a free one year subscription in one city having self-service bicycles operated by Cyclocity http://en.cyclocity.com/Cities/Cyclocity-in-the-world.

ORGANISING COMMITTEE

  • Nicolas Lachiche, University of Strasbourg, France (nicolas DOT lachiche AT unistra DOT fr)
  • Meelis Kull, University of Bristol, UK (meelis DOT kull AT bristol DOT ac DOT uk)
  • Adolfo Martínez-Usó, Universitat Politècnica de Valencia, Spain (admarus AT upv DOT es)

ACKNOWLEDGEMENTS

Organising committee would like to thank to Altocumulo weather station for their help in collecting the weather information.

SPONSORS

INNDEA-Logo.png Cyclocity.png UPV logo color300.jpg Ecml2015.jpg Reframe logo.png