A deep learning architecture for energy service demand estimation in transport sector for Shared Socioeconomic Pathways

A deep learning architecture for energy service demand estimation in transport sector for Shared Socioeconomic Pathways

In this study we explored and analysed the application of TrebuNet by projecting transport demands

In this study we explored and analysed the application of TrebuNet by projecting transport demands for OECD countries. We designed transport demand projection models for Air, Road, and Rail mode of transport for passenger and freight classes. Further we applied TrebuNet to global maritime freight transport demand projection. TrebuNet architecture was evaluated for its accuracy with multivariate linear regression, ANN model, Recurrent Neural Networks (RNN), Gradient Boosted machine learning methods.

Demand and driver relationship

We developed individual TrebuNet models for a combination of transport modes and classes to mitigate the complexity around representing different transport configurations in a single projection architecture. The historical transport demand data for OECD countries was collected for six data series Aviation– Passenger (AP) and Freight/Goods (AG), Road-Passenger (RoP) and Freight/Goods (RoG), and Rail-Passenger (RaP) and Freight/Goods (RaG). The historical transport demand data for global countries was collected for Maritime Goods data-series (MI), Table 1. Spread of the collected data is shown in Fig. 2.

Table 1 Description of data-series and their sources.
Figure 2
figure 2

Boxplots showing the spread of the collected transpot demands. Boxplot shows the minimum, maximum, mean and interquartile range for the transport demand data collected for different data-series. The unit on x-axis is either PKM or TKM in billions depending on the data-series. The number of observations per data-series is shown on top of the respective boxplot.

The historical demand data was collected from the United Nations and International Transport Outlook’ 2018 reports. For drivers we used historical socioeconomic metrices in the form of Population31 and GDP32 (Purchasing Power Parity (PPP), 2005$). We chose these drivers as they have sound methodology for their future estimation and have shown high correlation with historical transport demand17,33,34,35. The relationship between historical transport demand and drivers was established using TrebuNet architecture. A visualisation of trends between transport demand, GDP and Population is shown in Supplementary Figs. S.1–S.3. Statistical description of collected datasets is shown in Supplementary Tables S.1 and S.2.

Future projections of transport demands were based on the Shared Socioeconomic Pathway (SSP)36,37,38 database. The SSPs39,40,41 were created by an international team of climate researchers, economists, and energy systems modelers to generate internally consistent pathways for future worlds across the physical sciences, impacts and mitigation research fields. SSPs are a framework to quantify and represent via socio-economic metrics the different worlds that are possible into the future. The framework for SSPs starts with a narrative defining five different worlds based on challenges to adaptation and mitigation. SSP1 is the sustainable world, SSP2 is the middle ground, SSP3 is the world under regional rivalry having highest challenges to mitigation and adaptation, SSP4 is the world of inequality with highest challenge to adaptation, and SSP5 is the fossil fuelled world with highest challenge to mitigation. The SSP narratives are quantified into different socioeconomic driver metrics, of which we use GDP and population metrics which are proxy for economic and social development respectively to project transport demands using the TrebuNet architecture. SSP pathways were also used to generate scenarios for IPCC 6th Assessment Report (AR6) reports42.

TrebuNet deep learning architecture

The Trebuchet Network deep learning architecture or TrebuNet was designed to deliver on two main goals (1) to construct a model that was capable of learning from a pool of transport demand and driver data for a set of countries and (2) the trained model should be capable of accurately projecting transport demand for each country based on its drivers. While achieving the aims, the TrebuNet architecture was designed to work on sparse datasets that have incomplete temporal datapoints.

The inspiration for TrebuNet architecture was drawn from the physical process of hitting a target using trebuchet. In a trebuchet, the human operator initially learns by trial and error the different projectile pathways to hit a target and then chooses an optimal pathway to hit the target. To simplify, the physical process of firing a trebuchet can be divided into learning phase where the operator learns different projectile firing pathways and firing phase where the operator uses the learned knowledge to choose best pathway for the target. Correspondingly, the TrebuNet architecture was constructed in two cascading phases—Learning and Firing (Fig. 3).

Figure 3
figure 3

TrebuNet Framework. (a) process flow diagram depicting the configuration of the TrebuNet architecture along with flow of independent and dependent variables. (b) presents the zoomed in view of the configuration of custom quantile loss function.

The Learning phase of the TrebuNet comprised of densely connected ANN layers, custom built for each data-series e.g. Aviation Passenger (AP) and Aviation Freight (AG). The densely connected layers had historical socioeconomic driver metrics as inputs (independent variables) and nine outputs (dependent variable) representing transport demands in quantiles from 0.1 to 0.9. The independent variables in this phase were GDP and Population metrics and the corresponding dependent variable were the transport demand metric. The Learning phase had either five or six densely connected input layers, each having neurons ranging from 100 to 3000. Each layer used uniform activation with ‘ReLu’43 transfer function for tracking non-linearities. In between each densely connect layer, BatchNormalization44 layers were added to increase learning rates and provide basic generalising tendencies. The output layer had 9 nodes corresponding to quantiles ranging from 0.1 to 0.9. The learning phase was trained on historical driver and demand data to reduce a custom quantile loss function. The custom loss (Et) for the Learning phase is defined as

$${E}_{\tau }=\frac{1}{N}\stackrel{N}{\sum_{t=1}}{\rho }_{\tau }(y(t)-{\hat{y}}_{\tau }(t))$$



$${\rho }_{\tau }\left(u\right)=\left\{\begin{array}{cc}\tau u& \text{if }u\ge 0\\ \left(\tau -1\right)u& \text{if }u<0\end{array}\right.$$

\({\hat{y}}_{\tau }(t)\) is the predicted value for the datapoint, \(\tau\) is the quantile.

The Firing phase of the TrebuNet was constructed as a five-fold cross-validated architecture that takes the output of the learning phase as its input. The output was the projected transport demand. Here, the structure comprised of four densely connected layers with neurons ranging from 100 to 3000. Between each densely connected layer ReLU activation was used. The densely connected layers in this phase were devoid of BatchNormalization layers due to processing overhead and relative uniformity and likeness of data points. The application of five-fold cross-validation methodology helps in reducing overfitting and subsequently in selecting the best model for projection. A combined learning and firing phases were optimised in a single step to realise the best possible model with lowest average Mean Squared Error (MSE) across five folds of the firing phase. Thus, each data-series had two models saved i.e. Learning and Firing arranged virtually as stacked model. A flowchart of the combined stacked TrebuNet model is shown in Fig. 4, and Supplementary Figs. S.4 and S.5.

Figure 4
figure 4

TrebuNet Model flowchart. Learning and Firing phase of TrebuNet is connected in a stacked architecture. Each densely connected layer in both the phases have ReLU transfer function. BatchNormalisation is implemented between densely connected layers of Learning phase. The inputs to Learning phase are GDP and Population with output being constructed as nine nodes representing quantiles from 0.1 to 0.9. The Firing phase has nine inputs, and the output is represented by a single node which is the transport demand projection.

TrebuNet was implemented using python programming language with Pandas45, Keras46 and Scipy47 modules. Keras module was used to implement the densely connected layers and Pandas was used to store, pre-process, and normalise the input driver and transport demand data. The physical process of firing a trebuchet was represented programmatically in the TrebuNet using a feed-forward and back-propagating architecture48,49,50. This method of learning customised the internal weights and bias of the TrebuNet architecture for countries in different quantiles of demand growth trajectory and helps in overcoming the problem of mean projections observed when learning to reduce the global Mean Squared Error (MSE) metric51. This method also aided in mitigating the problems caused by non-uniform variance in data across the time series and increases the dimensionality of features.

Hyper-parameter optimization and model selection

The model optimisation and selection were performed by iteratively testing the hyperparameters and selecting the hyperparameters that showed the lowest MSE at the output of the firing phase. The entire TrebuNet architecture was optimised as a single entity and the hyper-parameters were selected to obtain the best model with the least amount of average Mean Square Error from the five folds of the firing phase using Hyperopt module52 of python. In total the combined stacked architecture was optimised for (1) two losses—custom loss in learning phase and MSE in firing phase (2) number of layers in learning phase (3) number of neurons in learning and firing phase (4) number of epochs for both learning and firing phase. The Hyperopt module was run for 500 iterations for each data-series using dual GPU running in parallel comprising of Nvidia 2080Ti and Nvidia 2060Ti (See Supplementary document). The firing phase of the TrebuNet was designed for five-fold cross validation with the average MSE loss of the five-folds acting as the metric to be reduced by the Hyperopt module. The dual GPU configuration aided in memory sharing between GPU which enabled the parallel five-fold cross validation in the firing phase. As the firing phase loss was internally connected to learning phase via virtual back propagation, the custom loss of learning phase was also optimised with the optimization of MSE. Details of the hyperparameters of the models are documented in Supplementary Tables S.3, S.4, S.5 and S.6.

TrebuNet model generation and evaluation

We generated six different models based on TrebuNet framework for AP, AG, RoP, RoG, RaP, RoG data series using respective historical transport demand and historical driver datasets for OECD countries. Additionally, a global maritime freight model (MI) was also created. Except for aviation and marine freight data-series (see “Application in global marine freight projections”) which have 15 years (2000–2015) worth of historical data, all other data-series had 35 years (1980–2015) of historical data to learn from. Each of the data-series generate custom TrebuNet models after choosing the best models during the training and hyperparameter optimisation steps of the TrebuNet architecture. At this stage all the models had learned the historical relationship between their respective transport demand and driver data and stored it inside the architecture as custom weights and bias. The trained models were then further evaluated for their short-term accuracy on unseen data for 2016 and 2017, followed by evaluation of their medium-term accuracy on unseen data from 2020 to 2050. To enable faster convergence to stability while training, both the historical drivers and transport demands were normalised between 0 and 1 for their respective data-series. Similar normalisation was done for the SSP derived future drivers using the upper and lower bounds from the historical drivers. During evaluation process the projections from the models were rescaled based on their original demand data bounds, Fig. 5.

Figure 5
figure 5

Flow of data and hyperparameter optimisation. Flowchart depicting the data pre-processing, normalization, hyperparameter optimization and prediction steps in a TrebuNet architecture. The training process starts with normalisation of historical drivers and demands. The normalisation information from historical data is saved and used for further normalising future driver values. Hyperparameter optimisation is performed to select the best performing architecture, which is saved as two models- one for learning phase and one for firing phase. The saved models are then used to project transport demands using normalised future drivers.

Due care was taken to filter out countries from projections where the respective transport medium was not present e.g. land locked countries for marine freight transport projections. One caveat of this approach of projections is that some countries with limited historical data points during the learning phase will result in over or under projected results. One can circumvent this caveat by dropping the country from projection aggregations, but for this study, we kept the countries in the aggregation as it would have resulted in underestimation of regional transport metrics.

Maritime Freight model (MI) was a variant of TrebuNet where instead of OECD, global dataset was used for learning. Transport demand dataset for marine freight is not readily available and the geographical resolution is often limited to aggregated global level with yearly data points. International Energy Agency (IEA)53 provides extended energy balances at regional and country level that has an energy metric that tracks total energy used in international marine freight transport. The data from IEA is available in Exa Joule (EJ) and kilo-tonne of oil equivalent (ktoe) variants. The historical global transport demand was mapped down to country level transport demands using global tonne nautical mile to global ktoe relationship, where tonne mile metric was provided by United Nations Conference on Trade and Development (UNCTAD) and ktoe metric by IEA. Gaps are present in IEA data wherein the aggregation after down mapping does not add-up to global tonne mile metric. This model assumes that the down mapped aggregations are the one that TrebuNet learns and projects from. The TrebuNet model for marine freight also follows the same data pre-processing and hyper optimization steps as the other six TrebuNet models used for short-term evaluations.

The transport demand projections generated by the TrebuNet architecture-based models were compared with the results of multivariate linear regression, a densely connected ANN, RNN networks like Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), BiDirectional LSTM (BiLSTM), BiDirectional GRU (BiGRU), and finally with gradient boosted method—XGBoost. Details of these models and their flowcharts are shown in Table 2, Supplementary Fig. S.6.

Table 2 Attributes of models used for evaluation.

These comparisons were performed for (1) the ability of the models to learn from historical data and project historical transport demand, thereby objectively evaluating the training potentials of models (2) short term projection for unseen future drivers for 2016 and 2017. For comparison of historical learning accuracies between results from different methods, we used R2, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) metrices. For short term projection comparisons, we used Absolute Error (AE) between recorded and projected transport demands, Absolute Percentage Error (APE) between recorded and projected transport demands, and MAE. These evaluation metrices are defined as follows:

$$AE={\sum }_{i=1}^{n}abs\left({y}_{i}-\lambda \left({x}_{i}\right)\right)$$


$$MAE=\frac{{\sum }_{i=1}^{n}abs\left({y}_{i}-\lambda \left({x}_{i}\right)\right)}{n}$$


$$RMSE=\sqrt{\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-\lambda \left({x}_{i}\right)\right)}^{2}}{n}}$$


$$APE=\frac{abs({\sum }_{i=1}^{n}\lambda \left({x}_{i}\right)-{\sum }_{i=1}^{n}{y}_{i})}{{\sum }_{i=1}^{n}{y}_{i}}$$


$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\left({y}_{i}-\lambda \left({x}_{i}\right)\right)}^{2}}{{\sum }_{i=1}^{n}{\left({y}_{i}-\frac{1}{n}{\sum }_{i=1}^{n}{y}_{i}\right)}^{2}}$$


where yi is the actual transport demand for test instance xi, λ(xi) is the projected transport demand for test instance xi, n is the number of test instances.

Comparison of TrebuNet based models with RNN based models is an involved task. The way RNN models are designed and trained allows for projection of only a single country’s transport demand. Additionally, RNN based models requires the training data and future drivers in a uniform time step e.g., when training data has yearly temporal resolution, the future predictions of transport demands can only be made at a per year basis. This would effectively mean that if the future drivers are available at a five-year timestep (which is the default case for SSPs) then RNN based models cannot be used.

Nonetheless, RNN are currently the state of art in time-series projection and to enable a comparison between TrebuNet and RNN based models we aggregated the country-wise transport demand and driver data into a single regional timeseries. This single region time series was then used to train the RNN models. Thus, when comparing results with RNN, the transport demand projections from all the other models had to be aggregated on a per year basis. For regression-based models we used Ordinary Least Square multivariate linear regression model from SKLearn54 module. For RNN and ANN models we used Keras module, with XGBoost being implemented via XGBoost library for Python.

In addition to the previously created models. TrebuNet architecture was also used to construct three different models for AP, RoP, and RaP data series to evaluate the decadal projection accuracy. Here, historical drivers and energy service demand till 2005 instead of 2015 was used to train and tune the three models. The three TrebuNet models also followed the same data pre-processing and hyperoptimisation steps as the other six TrebuNet models generated for short-term projection evaluation. Projections of transport demand for one decade (2006–2015) were done using unseen recorded GDP and Population drivers for the decade. The projections and actual data were then compared to evaluate the performance of the models. A second set of projections were also generated for the decade which included all the OECD countries for the entire decade with all the transport modes aggregated to compare the results with the IAM models. A summary of the models generated in this study and their evaluation paradigm is shown in Table 3.

Table 3 Models and their evaluation paradigm.

Relationship surface

The concept of a Relationship Surface was formulated to visually represent the inner working of the TrebuNet architecture. The idea behind this visualization was to highlight the learnings and historical demand trends inside the saved models over the real historical data. The relationship surface was generated using historical GDP, population driver data and predictions from a trained model for transport demand based on historical drivers. We generated a new series of input driver data points that covered the entire historical GDP and population metrics by taking the minimum of each driver metric and incrementing it by one unit to reach maximum of the driver metric. We then ran these driver data points through the trained model and generated the 3D surface called a relationship surface. The three axes of the relationship comprised of GDP, Population, and projected transport demand metrics. Historical transport demands from countries in different stages of economic development inside the OECD region were projected on to the relationship surface for visualization. These visualizations can help in understanding the approximation and smoothening being done while learning from historical trends and to compare the model against trends expected from other modelling frameworks. This visualization also aids in circumventing the “black box” notion of deep learning algorithms by closely monitoring trends with respect to each of the input driver metrics.


Due to the computationally exhaustive nature of machine learning training process, some randomness can always be attributed to the trained models. To minimise the uncertainties in the TrebuNet architecture, all the coding steps involving generation of random numbers both implicit and explicit were seeded with a constant value of 17 during model saving part of the algorithm. Even with setting of random number seeds inside the code, variance was observed with each subsequent runs of the model. To speed up the cross-validation part of the firing phase, each cross-validation fold was designed to run as a parallel framework of original architecture to be analysed rather than the standard sequential framework. This method also included some randomness into the model as all the five neural networks representing each fold in the cross validation were processed in parallel with simultaneous back propagation. During model retraining after hyperparameter optimization rounds, multiple runs were performed and the best model with highest R2 value was chosen as the representative model. Due to the nature of transport demands, this randomness provides us with an opportunity to understand a spread of future values which was more practical rather than a set value which tends towards determinism. All other forms of uncertainties e.g. due to parallelization in workload, limited precision GPU based multiplication, and random weights initiation have been kept out of the scope of this research and can form base for future research work.