Learning Risk and the "Limits to Forecasting and Prediction" With the Santa Fe Institute

Last October, I had the privilege to attend Santa Fe Institute and Morgan Stanley's Risk Conference, and it was one of my most inspiring learning experiences of the year (read last year's post on the conference, and separately, my writeup of Ed Thorp's talk about the Kelly Criterion). It's hard not to marvel at the brainpower concentrated in a room with some of the best practitioners from a variety of multi-disciplinary fields ranging from finance to physics to computer science and beyond and I would like to thank Casey Cox and Chris Wood for inviting me to these special events.  

I first learned about the Santa Fe Institute (SFI) from Justin Fox's The Myth of the Rational Market. Fox concludes his historical narrative of economics and the role the efficient market hypothesis played in leading the field astray with a note of optimism about the SFI's application of physics to financial markets. Fox highlights the initial resistance of economists to the idea of physics-based models (including Paul Krugman's lament about "Santa Fe Syndrome") before explaining how the profession has in fact taken a tangible shift towards thinking about markets in a complex, adaptive way.  As Fox explains:

These models tend to be populated by rational but half-informed actors who make flawed decisions, but are capable of learning and adapting. The result is a market that never settles down into a calmly perfect equilibrium, but is constantly seeking and changing and occasionally going bonkers. To name just a few such market models...: "adaptive rational equilibrium," "efficient learning," "adaptive markets hypothesis," "rational belief equilibria." That, and Bill Sharpe now runs agent-based market simulations...to see how they play out.

The fact that Bill Sharpe has evolved to a dynamic, in contrast to equilibrium-based perspective on markets and that now Morgan Stanley hosts a conference in conjunction with SFI is telling as to how far this amazing multi-disciplinary organization has pushed the field of economics (and importantly, SFI's contributions extend well beyond the domain of economics to areas including anthropology, biology, linguistics, data analytics, and much more). 

Last year's focus on behavioral economics provided a nice foundation upon which to learn about the "limits to forecasting and prediction." The conference once again commenced with John Rundle, a physics professor at UC-Davis with a specialty in earthquake prediction, speaking about some successful and some wrong natural disaster forecasts (Rundle operates a great site called OpenHazards). Rundle first offered a distinction between forecasting and prediction. Whereas prediction is a statement validated by a single observation, forecasting is a statement for which multiple observations are required for a confidence level.

He then offered a permutation of risk into its two subcomponents. Risk = Hazard x exposure.  The hazard component relates to your forecast (ie the potential for being wrong) while the exposure relates to the magnitude of your risk (ie how much you stand to lose should your forecast be wrong). I find this a particularly meaningful breakdown considering how many colloquially conflate hazard with risk, while ignoring the multiplier effect of exposure.

As I did last year, I'll share my notes from the presentations below. Again, I want to make clear that my notes are geared towards my practical needs and are not meant as a comprehensive summation of each presentation. I will also look to do a second post which sums up some of the questions and thoughts that have been inspired by my attendance at the conference, for the truly great learning experiences tend to raise even more questions than they do offer answers.

Antti Ilmanen, AQR Capital

With Forecasting, Strategic Beats Tactical, and Many Beats Few

Small, but persistent edges can be magnified by diversification (and to a lesser extent, time). The bad news is that near-term predictability is limited (and humility is needed) and long-term forecasts which are right might not setup for good trades. I interpret this to mean that the short-term is the domain of randomness, while in the long-term even when we can make an accurate prediction, the market most likely has priced this in.

Intuitive predictions inherently take longer time-frames. Further, there is performance decay whereby good strategies fade over time. In order to properly diversify, investors must combine some degree of leverage with shorting. Ilmanen likes to combine momentum and contrarian strategies, and prefers forecasting cross-sectional trades rather than directional ones.

When we make long-term forecasts for financial markets, we have three main anchors upon which to build: history, theory, and, current conditions. For history, we can use average returns over time, for theory, we can use CAPM, and for current conditions we can apply the DDM. Such forecasts are as much art as they are science and the relative weights of each input depend on your time-horizon (ie the longer your timeframe, the less current conditions matter for the inevitable accuracy of your forecast).

Historically the Equity Risk Premium (ERP) has averaged approximately 5%, and today's environment the inverse Schiller CAPE (aka the cyclically adjusted earnings yield) is approximately 5%, meaning that 4-5% long run returns in equity markets are justifiable, though ERPs have varied over time. Another way to look at projected returns is through the expected return of a 60/40 (60% equities / 40% bonds) portfolio. This is Ilmanen's preferred methodology and in today's low-rate environment the prospects are for a 2.6% long-run return.

In forecasting and market positioning, "strategic beats tactical." People are attracted to contrarian signals, though the reality of contrarian forecasting is disappointing. The key is to try and get the long-term right, while humbly approaching the tactical part of it. Value signals like the CAPE tend to be very useful for forecasting. To highlight this, Ilmanen shared a chart of the 1/CAPE vs. the next five year real return.

Market timing strategies have "sucked" in recent decades. In equity, bond and commodity markets alike, Sharpe Ratios have been negative for timing strategies. In contrast, value + momentum strategies have exhibited success in timing US equities in particular, though most of the returns happened early in the sample and were driven more by the momentum coefficient than value. Cheap starting valuations have resulted in better long-run returns due to the dual forces of yield capture (getting the earnings yield) and mean reversion (value reverting to longer-term averages). 

Since the 1980s, trend-following strategies have exhibited positive long-run returns. Such strategies work best over 1-12 month periods, but not longer. Cliff Asness of AQR says one of the biggest problems with momentum strategies is how people don't embrace them until too late in each investment cycle, at which point they are least likely to succeed. However, even in down market cycles, momentum strategies provided better tail-risk protection than did other theoretically safe assets like gold or Treasuries.  This was true in eight of the past 10 "tail-risk periods," including the Great Recession.

In an ode to diversification, Ilmanen suggested that investors "harvest many premia you believe in," including alternative asset classes and traditional capital markets. Stocks, bonds and commodities exhibit similar Sharpe Ratios over long time-frames, and thus equal-weighting an allocation to each asset class would result in a higher Sharpe than the average of the constituent parts. We can take this one step farther and diversify amongst strategies, in addition to asset classes, with the four main strategies being value, momentum, carry (aka high yield) and defensive.

Over the long-run, low beta strategies in equities have exhibited high returns, though at the moment low betas appear historically expensive relative to normal times.  That being said, value as a signal has not been useful historically in market-timing.

If there are some strategies that exhibit persistently better returns, why don't all investors use them? Ilmanen highlighted the "4 c's" of conviction, constraints, conventionality and capacity as reasons for opting out of successful investment paths.

 

Henry Kaufman, Henry Kaufman & Company

The Forecasting Frenzy

Forecasting is a long-term human endeavor, and the forecaster in the business/economics arena is from the same vein as soothsayers and palm readers. In recent years, the number of forecasters and forecasts alike has grown tremendously. Sadly, forecasting continues to fail due to the following four behavioral biases:

  1. Herding--forecasts minimally fluctuate around a mean, and few are ever able to anticipate dramatic changes. When too many do anticipate dramatic changes, the path itself can change preventing such predictions from coming true.
  2. Historical bias--forecasts rest on the assumption that the future will look like the past. While economies and markets have exhibited broad repetitive patterns, history "rhymes, but does not repeat."
  3. Bias against bad news--No one institutionally predicts negative events, as optimism is a key biological mechanism for survival. Plus, negative predictions are often hard to act upon. When Kaufman warned of interest rate spikes and inflation in the 1970s, people chose to tune him out rather than embrace the uncomfortable reality. 
  4. Growth bias--stakeholders in all arenas want continued expansion and growth at all times, even when it is impractical.

Collectively, the frenzy of forecasts has far outpaced our ability to forecast. With long-term forecasting, there is no scientific process for making such predictions. An attempt to project future geopolitical events based on the past is a futile exercise. In economics, fashions contribute to unsustainable momentums, both up and down, that lead to considerable challenges in producing accurate forecasts.

Right now, Kaufman sees some worrying trends in finance. First, is the politicization of monetary policy, and he fears this will not reverse soon. The tactics the Fed is undertaking today are unprecedented and becoming entrenched. The idea of forward guidance in particular is very dangerous, for they rely entirely upon forecasts. Since it's well established that even expert forecasts are often wrong, then logic dictates that the entire concept of forward guidance is premised on a shaky foundation. Second, monetary policy has eclipsed fiscal policy as our go-to remedy for economic troubles. This is so because people like the quick and easy fixes offered by monetary solutions, as opposed to the much slower fiscal ones. In reality, the two (fiscal and monetary policy) should be coordinated. Third, economists are not paying enough attention to increasing financial concentration. There are fewer key financial institutions, and each is bigger than what used to be regarded as big. If/when the next one fails, and the government runs it through the wind-down process, those assets will end up in the hands of the next remaining survivors, further concentrating the industry.

The economics profession should simply focus on whether we as a society will have more or less freedom going forward. Too much of the profession instead focuses on what the next datapoint will be. In the grand scheme of things, the next datapoint is completely irrelevant, especially when the "next" completely ignores any revisions to prior data.  There is really no functional, or useful purpose for this type of activity.

 

Bruce Bueno de Mesquita, New York University

The Predictioneer's Game

The standard approach to making predictions or designing policy around questions on the future is to "ask the expert." Experts today are simply just dressed up oracles. They know facts, history and details, but forecasts require insight and methods that experts simply don't have. The accuracy of experts is no better than throwing darts. 

Good predictions should use logic and evidence, and a better way to do this is using game theory. This works because people are rationally self-interested, have values and beliefs, and face constraints. Experts simply cannot analyze emotions or account for skills and clout in answering tough geopolitical questions. That being said, game theory is not a substitute for good judgment and it cannot replace good internal debate.

People in positions of power have influencers (like a president and his/her cabinet). In a situation with 10 influencers, there are 3.6 million possible interactions that exist in a complex adaptive situation (meaning what one person says can change what another thinks and does). In any single game, there are 16 x (N^2-N) possible predictions, where N is the number of players.

In order to build a model that can make informed predictions, you need to know who the key influencers are. Once you know this, you must then figure out: 1) what they want on the issue; 2) how focused they are on that particular problem; 3) how influential each player could be, and to what degree they will exert that influence; and, 4) how resolved each player is to find an answer to the problem.  Once this information is gathered, you can build a model that can predict with a high degree of accuracy what people will do.  To make good predictions, contrary to what many say, you do not need to know history. It is much like a chessmaster who can walk up to a board in the middle of a game and still know what to do next.

With this information, people can make better, more accurate predictions on identified issues, while also gaining a better grasp for timing. This can help people in a game-theory situation come up with strategies to overcome impediments in order to reach desired objectives.

Bueno de Mesquita then shared the following current predictions:

  • Senkaku Island dispute between China and Japan - As a relevant aside, Xi Jinping's power will shrink over the next three years. Japan should let their claims rest for now, rather than push. It will take two years to find a resolution, which will most likely include a joint venture between Japan and China for expropriation of the natural gas reserves.
  • Argentina - The "improvements" in today's business behavior are merely aesthetic in advance of the key mid-term elections. Kirshner is marginalizing political rivals, and could make a serious move to consolidate power for the long-term.
  • Mexico - There is a 55% chance of a Constitutional amendment to open up energy, a 10% chance of no reform, and a 35% chance for international oil companies to get deep water drilling rights.  Mexico is likely to push through reforms in fiscal policy, social security, energy, labor and education, and looks to have a constructive backdrop for economic growth.
  • Syria with or without Assad will be hostile to the Western world.
  • China will look increasingly inward, with modest liberalization on local levels of governance and a strengthening Yuan.
  • The Eurozone will have an improving Spain and a higher likelihood that the Euro currency will be here to last.
  • Egypt is on the path to autocracy.
  • South Africa is at risk of turning into a rigged autocracy.

 

Aaron Clauset, University of Colorado and SFI

Challenges of Forecasting with Fat-Tailed Data

(Please note: statistics is most definitely not my strong suit. The content in Clauset's talk was very interesting, though some of it was over my head. I will therefore try my best to summarize the substance based on my understanding of it)

In attempting to predict fat-tail events, we are essentially trying to "predict the unpredictable." Fat tails exhibit high variance, so the average of a sample of data does not represent what is seen numerically. In such samples, there is a substantial gap between the two extremes of the data, and we see these distributions in book sales (best-sellers like Harry Potter), earthquakes (power law distributions), market crashes, terror attacks and wars. With earthquakes, we know a lot about the physics behind them, and how they are distributed, whereas with war we know it follows some statistical pattern, but the data is dynamic instead of fixed. This is true with war, because certain events influence subsequent events, etc.

Clauset approached the question of modeling rare events through an attempt to ascertain how probable 9/11 was, and how likely another one is. The two sides of answering this question are building a model (to discover how probable it was) and making a prediction (to forcast how likely another would be). For the purposes of the model, one would care only about large events because they have disproportionate consequences. When analyzing the data, we don't know what the distribution of the upper tail will look like because there simply are not enough datapoints. In order to overcome these problems, the modeler needs to separate the tail from the body, build a multiple tail model, bootstrap the data and repeat.

In Clauset's analysis of the likelihood for 9/11, he found that it was not an outlier based on both the model, and the prediction. There is a greater than 1% chance of such an event happening. While this may sound small, it is within the realm of possible outcomes, and as such it deserves some attention. This has implications for policymakers, because considering it is a statistical possibility, we should pursue our response within a context that acknowledges this reality.

There are some caveats to this model however. An important one is that terrorism is not a stationary process, and events can create feedback loops which drive ensuing events. Further, events themselves that in the data appear independent are not actually so. When forecasting fat tails, model uncertainty is always a big problem. Statistical uncertainty is a second one, due to the lack of enough data points and the large fluctuations in the tails themselves. Yet still, there is useful information within the fat tails which can inform our understanding of them. 

 

Philip Tetlock, University of Pennsylvania

Geopolitical Forecasting Tournaments Test the Limits of Judgment and Stretch the Boundaries of Science

I summarized Tetlock's talk at last year's SFI Risk Conference, so I suggest checking out those notes on the IARPA Forecasting Tournament as well. IARPA has several goals/benefits: 1) making explicit one's implicit theories of good judgment; 2) getting people in the habit of treating beliefs like testable hypothesis; and, 3) helping people discover the drivers of probabilistic accuracy. (All of the above are reasons I would love to participate in the next round). With regard to each area there are important lessons. 

There is a spectrum that runs from perfectly predictable on the left to perfectly unpredictable on the right, and no person or system can perfectly predict everything. In any prediction, there is a trade-off between false positives and correct hits. This is called the accuracy function. 

With the forecasting tournament, people get to put their pet theories to the test. This can help improve the "assertion-to-evidence" ratios in debates between opposing schools of thought (for example, the Keynesians vs the Hayekians). Predictions would be a great way to hold opposing schools of thought accountable to their predictions, while also eliciting evidence as to why events are expected to transpire in a given way.

In the tournament, the participants are judged using a Brier Score, a measure that originated in weather forecasting to determine accuracy on probabilistic predictions over time. The people who perform best tend to have a persistence in good performance. The top 2% of performers from one year demonstrated minimal regression to the mean, leading to the conclusion that predictions are 60% skill and 40% luck on the luck/skill spectrum.

There are tangible benefits of interaction and collaboration. The groups with the smartest, most open-minded participants consistently outperformed all others. Those who used probabilistic reasoning in making predictions were amongst the best performers. IARPA concentrated the talent of some of the best performers in order to see if these "super teams" could beat the "wisdom of crowds." Super teams did win quite handily. Ability homogeneity, rather than a problem, was an enhancer of successes. Elitist algorithms were used to generate forecasts by "extremizing" the forecasts from the best forecasters, and weighting those most heavily (5 people with a .7 Brier would upgrade to approximate a .85 based on the non-correlation of their success. Slight digression: it was interesting sitting behind Ilmanen during this lecture and seeing him nod his head, as this theme resonated perfectly with his points on diversifaction in a portfolio resulting in the portfolio's Sharpe Ratio being above the average of its constituent parts)

There are three challenges when thinking about the value of a forecasting tournament. First, automation from machines is getting better, so why bother with people? While this is important, human judgment is still a very valuable tool and can actually improve the performance of these algorithms. Second, the efficient market theory argues that what can be anticipated is already "priced in" so there should be little economic value to a good prediction anyway. Yet markets and people alike have very poor peripheral vision and good prediction can in fact be valuable in that context. Last, game theory models like Buena de Mesquita's can distill inputs from their own framework. While this may be a challenge, it's probably even better as a complementary endeavor.