By Jenny Shan, VI Form
Casual Bike Rental Volume Prediction via Artificial Neural Network
Aim: This study aimed to build a predictive model for casual bike rental volume using artificial neural network and compare its performance with traditional regression method, linear regression.
Method: The data set under study is related to 2-year usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA. There were some external sources that corresponding historical environmental values such as weather conditions, weekday and holidays are extractable. All the records were randomly assigned into 2 groups: training sample (50%) and testing sample (50%). Two models were built using training sample: artificial neural network and linear regression. For artificial neural network, the input layer has 11 inputs, the two hidden layers have 3 and 2 neurons and the output layer has a single output. Mean squared errors (MSE) were calculated and compared between both models. A cross-validation was conducted using a loop for the neural network and the cv.glm() function in the boot package for the linear model. A package called “neuralnet” in R was used to conduct neural network analysis.
Results: For a testing sample, the MSE was 798 for the linear regression and 265 for the artificial neural network. Artificial neural network performed better clearly. In cross-validation, the average MSE for the neural network (268) is lower than the one of the linear model (806) although there seems to be a certain degree of variation in the MSEs of the cross-validation. This may depend on the splitting of the data or the random initialization of the weights in the net.
Conclusions: In this study, we built a predictive model for casual bike rental volume using neural network and compared its performance with a more popular approach, linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for casual bike rental volume.
Key words: Artificial Neural Network, Bike Sharing, Prediction
Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership to rental and return back has become automatic. Through these systems, a user is able to easily rent a bike from a particular position and return back at another position. Currently, around the world there are about over 500 bike-sharing programs composed of over 500 thousands bicycles.
Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues. Presently, the top three bike-friendly countries are Spain (132 programs), Italy (104 programs) and China (79 programs). The number of major cities that are becoming bike-friendly is growing day-by-day. It is expected that in a near future, most major cities provide this service along with their other public transport services. Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure, and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of the important events in the city could be detected via monitoring these data. Some few kinds of research have already addressed bike sharing data analysis mostly via spatiotemporal analysis to aid operation-oriented decisions.
An artificial neural network (ANN), often just called a “neural network” (NN), is a mathematical model or computational model based on biological neural networks, in other words, is an emulation of biological neural system. This model has been used in other medical areas but not used to predict retinopathy among diabetes patients to our best knowledge.
This study aimed to build a predictive model for casual bike rental volume using artificial neural network and compare its performance with traditional regression method, linear regression.
2 Data and Methods:
The dataset under study is related to usage log of a bike sharing system namely Capital Bike Sharing (CBS) at Washington, D.C., USA.
In the CBS system when a rental occurs, the operation software collects basic data about the trip such as duration, start date, end date, start station, end station, bike number and member type. The historical data set of such trip transactions is available online via. To avoid trend issues, we select only corresponding data to years 2011. There exist several weather data sources, however, most of them provide only forecasting data and do not contain historical weather reports. There is another group of forecasting sources that contain historical weather reports for specific last days (e.g. 14 days). Another group also contains weather historical report but in a daily scale. They got from this source some attributes such as weather temperature, apparent temperature, wind speed, wind gust, humidity, pressure, dew point and visibility for each hour from the period 1 January 2011 to 31 December 2011 for Washington, D.C., USA.
Table 1: Variables Available In This Data
|season||1:springer, 2:summer, 3:fall, 4:winter)|
|mnth||month ( 1 to 12)|
|hr||hour (0 to 23)|
|holiday||weather day is holiday or not|
|weekday||day of the week|
|workingday||if day is neither weekend nor holiday is 1, otherwise is 0|
|weathersit||1: Clear, Few clouds, Partly cloudy, Partly cloudy; 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist; 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds; 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog|
|temp||Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)|
|atemp||Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)|
|hum||Normalized humidity. The values are divided to 100 (max)|
|windspeed||Normalized wind speed. The values are divided to 67 (max)|
|casual||count of casual users|
Mean squared error (MSE)
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors or deviations—that is, the difference between the estimator and what is estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or quadratic loss. The difference occurs because of randomness or because the estimator doesn’t account for information that could produce a more accurate estimate. The MSE is a measure of the quality of an estimator—it is always non-negative, and values closer to zero are better. An MSE of zero, meaning that the estimator predicts observations of the parameter with perfect accuracy, is the ideal, but is typically not possible.
Values of MSE may be used for comparative purposes. Two or more statistical models may be compared using their MSEs as a measure of how well they explain a given set of observations: An unbiased estimator (estimated from a statistical model) with the smallest variance among all unbiased estimators is the best unbiased estimator or MVUE (Minimum Variance Unbiased Estimator).
Both linear regression techniques such as analysis of variance estimate the MSE as part of the analysis and use the estimated MSE to determine the statistical significance of the factors or predictors under study. The goal of experimental design is to construct experiments in such a way that when the observations are analyzed, the MSE is close to zero relative to the magnitude of at least one of the estimated treatment effects. MSE is also used in several stepwise regression techniques as part of the determination as to how many predictors from a candidate set to include in a model for a given set of observations.
The sample size is 4322 in the test sample and 4323 in training sample, a total of 8645 records from year 2011.
Table 2: Descriptive Information Of Test Sample And Training Sample
|Training sample||Test sample|
Figure 1: Distribution of Casual Bike Rentals in Training Sample
Figure 2: Distribution of Casual Bike Rentals in Test Sample
According to the linear regression, season, hour, holiday or not, working day or not, temperature and humidity were significant predictors for casual rental volume.
Table 3: Linear Regression To Predict The Volume Of Casual Bike Rental
|Estimate||Std. Error||t value||Pr(>|t|)|
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Figure 3: Neural Network of Casual Bike Rental Volume in Training Sample
The black lines show the connections between each layer and the weights on each connection while the blue lines show the bias term added in each step. The bias can be thought as the intercept of a linear model. The net is essentially a black box so we cannot say that much about the fitting, the weights and the model. Suffice to say that the training algorithm has converged and therefore the model is ready to be used.
For testing sample, the MSE was 798 for the linear regression and 265 for the artificial neural network. Artificial neural network performed better clearly.
Figure 4: Observed vs Predicted Casual Bike Rental Volume In Artificial Neural Network And Linear Regression Model
By visually inspecting the plot we can see that the predictions made by the neural network are (in general) more concentrated around the line (a perfect alignment with the line would indicate a MSE of 0 and thus an ideal perfect prediction) than those made by the linear model.
Cross-validation is another very important step of building predictive models. In cross-validation, the average MSE for the neural network (268) is lower than the one of the linear model (806) although there seems to be a certain degree of variation in the MSEs of the cross-validation. This may depend on the splitting of the data or the random initialization of the weights in the net.
Figure 5: MSE for Artificial Neural Network for Testing Group
The number of major cities that are becoming bike-friendly is growing in recent years. It is expected that in the near future, most major cities will provide this service along with their other public transport services. How to better predict the casual rental volume is a key challenge to the business.
In this study, we built a predictive model for casual bike rental volume using neural network and compared its performance with a more popular approach: linear regression. This study suggests that it is possible to develop a reproducible and transportable predictive instrument for casual bike rental volume.
Artificial neural network clearly performed more effectively in the testing sample. In cross-validation, the average MSE for the neural network is much lower than the one of the linear model, indicating a better performance. There seems to be a certain degree of variation in the MSEs of the cross-validation, which may depend on the splitting of the data or the random initialization of the weights in the net. Meanwhile, predictions made by the neural network are (in general) more concentrated around the line (a perfect alignment with the line would indicate a MSE of 0 and thus an ideal perfect prediction) than those made by the linear model.
According to the linear regression, season, holiday or not, working day or not, temperature, and humidity were significant predictors for casual rental volume. Even the model from neural network is ready to interpret like linear regression, but the model structure is ready to use. Given a particular combination of this information, the business owner could easily predict the casual bike volume and make arrangement ahead of time if necessary to make sure all the customer needs are met. This could be done by linear regression model as well as neural network while neural network provides a better performance.
There are limitations of this study. One of them was associated with artificial neural network method. This method employed deep machine learning method to explore the nonlinear association between casual bike rental volume and weather factors; however the nonlinear association makes it very hard to interpreter the results, especially the association between the rate and individual predictors. Other predictors of casual bike rental volume were not available in this database.
In conclusion, we used both artificial neural network and linear regression model to predict the casual bike rental volume. We found that artificial neural network performed better than linear regression which is the traditional method when building a predictive model. We believe that deep machine learning could be used in casual bike rental volume. This study is of great importance considering the fast growth of bike rental business in major cities around the world.
Jenny Shan is a VI Form boarding student from Hangzhou, China. She enjoys reading news, playing squash, and visiting museums.