In 1962, Ed Thorp, famed hedge fund manager and mathematics professor, wrote the book Beat the Dealer in which he proved that it was mathematically possible to overcome house odds in blackjack by properly “counting cards.” Ever since then, entrepreneurial would-be gamblers have learned his and other methods to stack the odds in their favor when playing blackjack. Done properly, it is a known fact that these “gamblers” have a true statistical edge^{1}.

Additionally, assuming proper bankroll management, they also know that the only way to monetize their edge is to continue playing even when recent performance has not been successful. Given enough time, the edge will work in their favor and they will earn back their losses and generate profits. The key here is time; however, the stress of mounting losses can make minutes or hours seem like days or weeks (or even years), and may lead a player to give in before they are able to earn back their losses.

One method for minimizing the time needed to earn back losses is to play more hands. Since a single player is limited in the number of hands they can actually play, a more effective strategy is to put together a team of similarly skilled players that each play individually, but share equally in gains and losses. By doing so, chances are increased that when a single player is on a losing streak, those losses will be offset by someone else in the group having better-than-expected outcomes. This should limit the depth of losses and decrease the time needed to earn back lost capital.

Similarly, properly constructed quantitative investment models provide a statistical edge to investors, allowing them to systematically take advantage of known market anomalies while reducing the emotional impact of needing to make constant discretionary decisions. However, the outcome of these models is also subject to uncertainty, as they are inherently probabilistic systems. This means that, like their card-counting counterparts, they are subject to the randomness of negative outcomes even when they hold a strong statistical advantage.

Just as blackjack players can spread the risk among many players by forming a team with similar skill level, quantitative models can reduce the probability of experiencing extended losses by forming a “team” of similar, but separate models; a methodology we refer to as model ensembling.

As we continually challenge ourselves to identify ways to improve model prediction and minimize exposure to randomness, it seemed natural that we should research the idea

of “ensembling,” smoothing out the potential randomness and idiosyncrasies of any one model by combining that model with many similar models.

**Research Application**

In 1902, J. Willard Gibbs introduced the idea of an ensemble as a large number of virtual

copies of a system in which each represents a possible state of what the real system

might be, essentially creating a probability distribution for the state of the system. Thus,

for the purposes of this research, our definition of ensembling is the idea of grouping

similar systems, or different states of the same system.

Broadly speaking we have broken up the types of financial model ensembling into three,

non-exhaustive, categories:

1. Multi-model ensembling

2. Parameter value ensembling

3. Model input ensembling

**Multi-Model Ensembling**

The main idea behind “multi-model” ensembling is that an investor may have a strong thesis, but it could be expressed in many different ways. As an example, you may believe and have evidence that markets tend to overreact in the short-term, which should lead to profitable mean-reversion opportunities. One way of taking advantage of this inefficiency is to define the types of “overreactions” that can occur and then construct a suite of models, with each model honing in on specific situations. We have implemented one such suite, or ensemble, of models in a portfolio of short-term counter-trend models across global markets. The underlying models within this portfolio could be thought of as a multi-model ensemble. Each underlying model is trying to capitalize on the thesis that global equity markets overreact in the short-term; however, each model takes a different approach to these overreactions. This reduces the risk of any one model underperforming by spreading the risk across many varying models.

**Parameter Value Ensembling**

When looking at a single model an investor still has options for ensembling. One such methodology is to use multiple parameter values to generate trading signals, rather than a single, potentially over-fit parameter set. For example, imagine a simple long-only trend-following moving average crossover model on high yield bonds, with two parameters: a short term value of one day and a long-term value of 10 days. These values mean that when the current close is above the 10-day average close, the model will be long, otherwise it will remain in cash. This parameter set may work great in backtests and even be rather robust out-of-sample. However, one is still subject to substantial pricing sensitivity as you approach the crossover of these two values, where a small change in price may be the difference between a long position and a cash position. One way to minimize this sensitivity would be to use varying parameter sets. For example, a simple ensemble using 5, 10, 15 and 20 days for long-term moving average values, and 1 and 2 days for short-term values was constructed. This ensemble’s signals would then be aggregated to create an overall signal that becomes graduated near the crossover points, creating less reliance on the exact final price. The results of this ensemble are shown below. As can be seen, drawdowns are truncated while maintaining a similar overall return profile, making for a more palatable return sequence.

*click the chart to see enlarged version*

**Model Input Ensembling**

An additional method for ensembling a single model is to vary the inputs rather than (or in addition to) the parameters. This can be achieved in a variety of ways, but a simple example would be to create a distribution of prices used as inputs at the time of signal generation. For instance, using the above high yield model, instead of taking a single actual closing price and generating one signal, you could take that price as the mean of a distribution of prices and then sample prices from that distribution to create a variety of signals based on each of those prices. Just as above, this will have the effect of creating a graduated signal when approaching price levels near the boundary of signal changes. Again, while the overall return profile wasn’t changed, the tail events (drawdowns) were smoothed.

*click the chart to see enlarged version*

**Conclusion**

Like their card-counting counterparts, the vagaries of day-to-day, hour-to-hour and minute-to-minute data can create substantial random variation for any quantitative model. Even models with a large statistical edge can suffer (or benefit) from arbitrary minute-to-minute data movements. It takes time for this randomness to wash out in the results for a given model, often leading to difficult periods of trying to decipher a model’s true edge. Having randomness drive potential decisions regarding a model’s efficacy can lead to very poor conclusions, typically at precisely the wrong time^{2}.

One method for dealing with this problem is to implement some form of model ensembling. We have given a brief overview of three such methodologies, though there are certainly many other effective methods for ensembling models. At the end of the day, no matter which method or combination of methods is utilized, our findings suggest that model ensembling can reduce the inherent randomness that quantitative models are inevitably subject to, which should reduce large fluctuations in potential model performance and the associated decision-making stress that result from these fluctuations.

^{1} Ed Thorp, Beat the Dealer, Knopf Doubleday, 1966.

^{2} If a model has actually maintained an edge but produced poor results recently there should be some expectation that it would revert to its longer-term mean. However, if a model is discarded at a point of underperformance the reversion that would have happened will fail to be realized. This can lead to a cycle of discarding strong models and replacing them with models that have produced better results recently and are also likely to mean-revert to more normalized performance. The end result is a decay in realized return.