Statistical Modeling: Ensembles: How Models Vote for a Final Output



 


What are Ensembles

Ensemble methods in predictive modeling are like gathering the opinions of multiple experts to make a final decision. Instead of relying on just one model to predict future market trends or ETF performance, ensemble techniques combine several different models to improve accuracy and reduce the risk of errors. Using a group of models, instead of just one, Ensembles "vote" on what should be the final output, leading to more reliable and balanced predictions. In regression models, ensemble methods typically use averaging to arrive at a final prediction. The final prediction is the average of all the individual predictions from these models.

In boosting techniques, models are built sequentially, each improving upon the errors of the previous ones, and their predictions are also averaged to yield a final result. By combining multiple models, ensemble methods smooth out any individual errors, leading to more accurate and reliable predictions in tasks such as forecasting market trends or predicting asset prices. These models may include Decision trees, Linear regressions, or even Ensemble of Neural Networks. Ensembles help smooth out any weaknesses or biases in individual models, producing results that are typically more robust.


In the Table above, I use the Prices (Open, High, Low, with Close as output variable) of the US Dollar Index (Ticker Symbol DXY) to initiate building of a predictive model. I first needed to see which type of regression model suits the data of DXY. I use Mean Absolute Deviation (MAD) as my Error Measure. The Table above shows the types of regression models that were trialed: Linear Regression, Bagging Linear Regression, Boosted Linear Regression, Nearest Neighbour, Random Trees, Decision Trees and Neural Network. Bagging Linear Regression, Boosted Linear Regression and Random Trees are Ensembles. You can see that simple Linear Regression had the lowest MAD followed by Bagging Linear Regression and Boosted Linear Regression.

#Neural Network (NN) was by a large margin the worst performing model probably because while NNs are great for capturing subtle and complex non-lineartiies, in this case where the inter-realtionship between the variables can be reduced to simple linearities NNs overfit the data and thus have poor predictive value.BTW, the NN I sued here, I didn't even specify a Hidden Layer, and the Activation function was a simple Sigmoid for Input and a Hyperbolic Tangent at the Output- or else the NN's MAD would be even worse. 

We can show that in the case of DXY, with 365 data points in our model, the raw Probability Distribution Function (PDF) already approximates a Normal (Gaussion) Distribution as shown below.  (Actually, before I started trialling the various models and deciding which type of models should be Boosted or Bagged, I took a look at this PDF, and saw that Linear Regression and Boosted, and Bagged Linear Regression would be ideal). If the PDF had been more irregular, I may have used Boosted and Bagged versions of Neural Networks or Decision Trees.


What is Boosting

Boosting is a process that focuses on improving model accuracy by paying close attention to the errors made in previous predictions. In simple terms, boosting builds models in a step-by-step (sequentiallly) manner, where each new model corrects the mistakes of the earlier ones. It works by giving more weight to the data points that were predicted incorrectly in previous rounds, so the model learns from its errors and improves over time. This approach helps create stronger, more refined predictions, and boosting is often used when an investor wants a model that is highly focused on accuracy. For example, predicting how an ETF might react to sudden market shifts. Our models use Gradient Boosting. Gradient Boosting improves model performance by optimizing a loss function, focusing on the difference (or gradient) between the actual and predicted values. Each new model is trained to reduce this error, leading to improved accuracy over time. to other methods.

What is Bagging

Bagging, short for Bootstrap Aggregating, takes a different approach. Instead of building models one after another, bagging works by training several models independently using different subsets of the data. Each model is trained separately, and then their predictions are averaged or "voted" on to reach a result. The idea is that by using multiple models trained on different samples of data, the overall prediction becomes more stable and less sensitive to fluctuations in the data. Bagging is particularly useful when you want to reduce the chance of overfitting, which happens when a model becomes too focused on the specific patterns in the training data and doesn't generalize well to new data.

Benefits of Ensembles

Both boosting and bagging help improve model performance by reducing errors and making predictions more stable. Boosted ensembles, which correct errors over multiple steps, are highly accurate but can be more complex. Bagged ensembles, by averaging predictions across different models, provide more stable and generalizable forecasts. Together, these methods make ensemble modeling a powerful tool for predicting ETF performance, giving investors more reliable insights and reducing the risk of relying on a single, possibly flawed, model. By combining strengths from different models, these techniques allow for more informed and confident investment decisions.




Comments

Popular posts from this blog

A Comparison of Four Noise Reduction Algorithms as Applied to the BSE Sensex index.

USD and Gold provide a more accurate insight into the true state of the US economy than the SP500

Markov Regime Switching Model for Risk‑On/Risk‑Off Dashboards of Stock Indices