NGDP Forecast Model Refresh – Taking the Forecast to “11”

We’ve recently overhauled our forecasting system. It was a good time to do this, as the recent GDP revisions could have altered the statistical relationships between NGDP and our market-derived factors.

Updating the forecast system

The starting point for our forecasting approach is to assume that Market Monetarism is true enough to be useful. We take it as given that a weak form Efficient Market’s hypothesis is in play, and that an important determinant of many traded financial market prices is the expected path of US nominal GDP.

It is easy enough to point out that market prices assume certain near term path for the economy, but can we extract this signal? Our methodology is to pull the common statistical “messages” from a broad basket of market prices and yield spreads, using Principal Components Analysis. This yields, if you like, pseudo indices, or scores, which represent some common signal from a given basket. We pair these factors with a time series of NGDP growth and estimate a time series model which recursively forecasts both (Vector Auto-Regression).

There are an enormous number of variations on this basic approach: we could vary the number of lags in the model, we could change the number of Principal Components scores included, we could alter the mix of prices and we could pair companion macro time series along with NGDP to give perhaps more information to the model. Given this multiplicity of options, how do we pick a model?

In the world of statistics, there rarely exists a single “perfect model”. Bringing up this fact causes distress at times, but this is just a fact. Even if there were a perfect model specification, NGDP data are “lumpy”, that is they are in quarterly frequency so we only get four observations per year. This limits our ability to be confident in a model’s forecast performance. We can tell if a model is total rubbish, but it’s not a good idea to make much of out of small differences in forecast performance.

The solution to this problem is to average over a large number of simple models. We coded a model search algorithm that fits several tens of thousands of potential model specifications. It then does an iterative set of forecast performance tests.

These tests were done by first hiding all data after 2014Q1, estimate the model, forecasting ahead to 2015Q1, and recording the year-ahead forecast performance by comparing to measured NGDP for that quarter. Next, we add a quarter of data, estimate the mode up to 2014Q2 and forecast to 2015Q2, repeating this process and producing a set of year-ahead forecast performance stats for a huge number of models. Note that we are always a quarter behind in terms of NGDP data, for 2014Q1 we have market prices, but only have NGDP up to 2013Q4, mirroring how our forecast dataset looks in the last two months of a quarter.

This approach is vulnerable to being “Fooled by Randomness”. It’s entirely possible that within a set of over 100,000 models, a few truly poor models will perform well for a few years just by chance.

Moreover, it is entirely likely that a few mediocre models will perform well by chance. This means that the top ranked models over our three and a half year test time frame were by no means the best models. To address this possibility, we allowed ourselves to pick through the top models using human judgment, we reviewed the models for basic reasonableness, did their data baskets have a reasonable mix of financial prices (not just all commodities for example)? Do the statistical diagnostics look good? It turned out that this was the case for the top 30 or so models, they all looked credible.  It also turned out that the top 16 models shared a few features in common: they all used six lags of NGDP, none of them used nominal domestic spending as a companion variable to NGDP (something we thought could be useful).

The main difference between this latest batch of models and our previous array is that we only considered the period starting with 2014Q1 in our forecast performance tests, whereas before we were after models that did well through the 2008Q1 to 2015Q1 time frame.

This is probably the reason for the lack of diversity in lag structure this time around. Markets did a poor job of forecasting the 2008-2009 NGDP crash, even though they were clearly highly interested in NGDP, they were mostly reacting in-time to data releases, no one thought the Fed could be so incompetent and/or malicious. As a result, models with a shorter lag structure performed better in that period, which greatly influenced their overall performance, and thus which models we picked.

With the current batch, we are more interested in smaller changes in NGDP; we wanted to catch the swings from 2014 through 2018, especially the slump in 2015-2016 and the “Trump Bounce” in 2017/2018.

There’s no obvious way to know how many models to pick and how to weight them. We have hundreds of specifications that look reasonable and perform more or less as well as any other. We decided to pick the top 10 models, but due to a coding error, selected the top 11.

We decided to keep the coding “error” and go with 11 models, evenly weighted, both as an homage to the 1984 comedy film This is Spinal Tap, and to highlight for ourselves and our readers, the inherent arbitrariness of any choice here.

It’s not that we don’t take this effort seriously, rather that macroeconomics is not precision science. The datasets are small and noisy, precluding a more sophisticated model assembling approach that might be used in a conventional Machine Learning context. To bolster your confidence, we can say that the current forecast of 4.7% NGDP growth does not change much regardless of whether we use 10 models, 11, 15, 20 or 30.

The current forecast

The forecast now stands at 4.7%, having soared nearly a full percentage point following the model update and the introduction of the monster Q2 NGDP growth rate of 7.3%, annualized.

Many have pointed out how the addition of tariff induced one time rush deliveries boosted exports in Q2, but this was also associated with a drop in inventory investment from the previous quarter of similar magnitude.

On top of this, the sum of fixed investment and consumption rose at an annualized pace of over 5%, suggesting a good deal of momentum behind NGDP. In light of this, 4.7% seems a reasonable number, despite being the highest we’ve ever foretasted. As a result, we are not making any effort to normalize the signal from the latest NGDP print, as we had contemplated doing, as we have done in the past when natural disasters have distorted NGDP.


Leave a Reply

Your email address will not be published. Required fields are marked *