OPINION

Bottom-up gets two thumbs down

Providers of multi-factor indices have recently been debating the respective merits of the top-down and bottom-up approaches to multi-factor portfolio construction. Top-down approaches assemble multi-factor portfolios by combining distinct silos for each factor. Bottom-up methods build multi-factor portfolios in a single pass by choosing and/or weighting securities based on a composite measure of multi-factor exposures.

The top-down approach is simple and transparent and investors can control allocations across factors easily. Top-down multi-factor portfolios usually avoid being concentrated in a few stocks because they are typically assembled from reasonably diversified factor sleeves.

The bottom-up approach, in contrast, has been used to concentrate portfolios in ‘factor champions’, emphasising stocks that score highly, on average, across multiple factors. This allows interactions across factors to be taken into account and avoids diluting exposures (such as to value when tilting to high profitability).

It has been argued that bottom-up approaches produce additional performance; however, studies that document such increased returns are typically based on selected combinations of factors. They also do not test for significance or robustness, and do not scrutinise risks, stability of exposures, or implementation issues such as heightened turnover. Moreover, in a recent study, researchers have shown that accounting for the cross-sectional interaction effects of factors does not necessarily require a bottom-up approach but can be addressed in a top-down framework.

Here, we’ll contrast the claims of the proponents of bottom-up approaches with relevant findings in the academic literature. First, we’ll review general insights on return estimation and factor models that are relevant for multi-factor portfolio construction. Then, we’ll discuss recent literature that specifically addresses issues with bottom-up approaches.

Does it make sense to account for fine-grain differences in factor exposures?

A key idea behind bottom-up approaches is to account precisely for stock-level differences in terms of exposure to multiple factors. While it is understandable for computational technicians to try to account for factor exposures with the highest possible precision, there are two findings in empirical asset pricing that question the relevance of the type of over-engineering present in bottom-up approaches.

Empirical evidence on factor premia overwhelmingly suggests the relationships between factor exposures and expected returns do not hold with precision at the individual stock level. Indeed, factor scores are used as proxies for expected returns, which are notoriously difficult to estimate and inherently noisy at the stock level.

Rather than trying to determine differences in returns between individual stocks, researchers have created groups of stocks and tested broad differences in returns across these. This ‘portfolio method’ ensures robustness by ignoring stock-level differences and refraining from modelling multivariate interactions. For this reason, studies that document factor premia rely on portfolio-sorting approaches.

Former Goldman Sachs & Co. partner Fischer Black emphasised, “I am especially fond of the ‘portfolio method’ […]. Nothing I have seen […] leads me to believe that we can gain much by varying this method.”

There is ample evidence suggesting that factor characteristics do not provide an exact link with individual stock returns. Thus, fine-grain differences in factor exposures may not translate into return differences.

To illustrate the lack of precision in the relationship between factor exposure and returns, we provide results for fine-grain portfolio sorts. In particular, we first sort into quintiles by factor characteristics (such as book-to-market for value), then each quintile is subdivided into sub-quintiles according to the same factor score. If the relationship between factor exposure and returns were highly precise, the second sort for stocks with broadly similar characteristics should lead to meaningful return differences. To be more specific, even when looking at stocks in the same book-to-market quintile, the distinction by sub- should lead to a positive value premium for those stocks that are more value-oriented (higher book-to-market ratio) within their respective quintile.

Instead, as can be seen from Table 1, the sub-quintile premia are negative in most cases. Especially in the winning quintile (Q5), distinguishing between stocks based on factor scores does not add any value. In fact, for four out of the six factors we analysed, selecting the highest-exposure stocks in the top-quintile leads to lower returns than selecting the stocks with the lowest exposures in the top quintile. In other words, among stocks with high exposure to a given factor (top-quintile stocks), making a finer distinction between those that are most strongly exposed and the rest does not lead to higher returns. This clearly shows that even though the risk premium appears in broadly diversified portfolios, it disappears if we start accounting for differences at the stock level or create narrow portfolios according to precise differences in exposures.

Table 1: Intra-quintile premiums of each factor:

Analysis is based on daily total returns in US$ from December 31, 1975 to December 31, 2015, based on the 500 largest stocks in the US. For each factor, the universe is divided into 5-by-5 double sorting based on the corresponding factor score, forming 25 equally weighted portfolios. The difference in returns between the fifth and first quintiles, from first sort to second sort, across each quintile, is reported.

31/12/1975 to 31/12/2015 Size Value Mom. Low Vol. Low Investment High Profitability
(Q5-Q1)
Q1 (Low-exposure stocks) 0.53% 0.11% 6.23% 5.50% 9.06% 4.38%
Q2 -0.91% -0.31% 4.45% 1.20% 1.11% -3.08%
Q3 -0.77% -0.39% -1.58% 1.58% 1.63% 2.49%
Q4 -1.18% 2.68% 0.25% 0.75% -0.05% 0.93%
Q5 (High-exposure stocks) -0.38% -0.06% 1.62% -0.94% 1.61% -0.28%

 

 

 

Single factor relationships may break down at the multi-factor level

While there is ample evidence that portfolios sorted on a single characteristic are related to robust patterns in expected returns, such patterns may break down when incorporating many different exposures at the same time.

Hedge fund manager Cliff Asness, for example, observes: “Value works, in general, but largely fails for firms with strong momentum. Momentum works, in general, but is particularly strong for expensive firms.” As a result, “Increasing both momentum and value simultaneously has a significantly weaker effect on stock returns than the average of the marginal effects of increasing them separately.”

This weakening would affect securities favoured by composite scoring methods.

A more drastic failure is discussed in a study where the authors show that, even though the low volatility anomaly exists in the broad cross-section of stocks, low-volatility stocks underperform when considering only stocks that rank well on a composite multi-factor score. Building bottom-up multi-factor portfolios on the basis of factors that have been documented in a top-down framework thus lacks relevance.

Ultimately, engineering multi-factor portfolios under the assumption of a deterministic dependence of returns on security-level multi-factor scores means attempting to exploit information that is not reliable.

Could the backtest performance of bottom-up approaches be over-stated?

A backtest is a simulation of a portfolio’s performance as if it were implemented historically. It is not rare to find strategies that provide stellar performance in backtests but fail to deliver robust live performance. There are several reasons for this. First, backtests are sensitive to the sample period of the tests. This problem arises simply because returns are highly sample-specific. Second, the results of backtests could be contaminated by data mining and over-fitting.

Andrew Lo and Craig MacKinlay wrote in 1990 that, “[…] The more scrutiny a collection of data is subjected to, the more likely will interesting (spurious) patterns emerge.”

Over-fitting occurs when more and more degrees of freedom are added to the model until it might be capturing sample-specific noise, rather than structural information. Over-fitted models tend to fail miserably out of sample. If one generates and tests enough strategies, one will eventually find a strategy that works well in the backtest.

The bottom-up approach to multi-factor investing has opened up a platform for computational technicians to come up with several possibilities for selecting and weighting factor metrics for multivariate composite scores. Such combinations made after the fact exacerbate data-mining problems by introducing over-fitting and selection biases. Knowing that the bottom-up approaches are, by design, prone to selection bias, an important question worth exploring is whether the claims of bottom-up proponents could be due to statistical flukes. A simple way to do that is by adjusting the results for the inherent biases. The discussion below explores this question in detail by summarising results from a recent study published last year in European Financial Management titled “The Mixed vs the Integrated Approach to Style Investing: Much Ado About Nothing?”, by Markus Leippold and Roger Rüegg.

Even though the multiple testing bias has been analysed extensively in the literature, studies claiming that bottom-up approaches provide better risk-adjusted returns than top-down approaches do not account for this issue. Moreover, tests are done on short time periods, such as 15 years, while a reasonable empirical assessment of factor investing approaches requires a substantially longer time period (40 years or more) to account for the cyclical nature of risk factors.

Leippold and Rüegg’s work re-assesses claims that a bottom-up approach to multi-factor portfolio construction leads to superior results. When applying proper checks of statistical robustness, and adjusting for relative risk, they find that there is no such superiority.

The authors account for the fact that there are numerous variations one could employ to conduct such tests and any reported superiority of the bottom-up approach could be the result of picking a favourable combination that happens to work simply due to chance. The authors test a large variety of factor combinations and portfolio construction methods, and compare the bottom-up and top-down approach in each case. They use advanced statistical tools to adjust for the fact that a fluke can easily result in apparently significant benefits if the number of combinations is large enough.

This analysis shows there is no evidence that bottom-up approaches perform better than corresponding top-down approaches. Thus, the findings reported by promoters of bottom-up approaches do not withstand rigorous analysis and could instead be explained by the choice of a particular selection of factors, and failure to adjust for data-mining possibilities.

Table 2 presents a summary of the results. Leippold and Rüegg created 78 different multi-factor portfolios using all possible combinations of up to five popular factors (value, momentum, low investment, profitability and low volatility) and three different portfolio construction methods. Only 13 per cent of the possible variations led bottom-up portfolios to have significantly higher Sharpe ratios than top-down approaches, when adjusting for multiple testing. Moreover, when adjusting the top-down portfolios to match the levels of relative risk of the bottom-up portfolios, none of the bottom-up portfolios have significantly higher Sharpe ratios than their top-down counterpart. This finding invalidates the claims of superiority made by proponents of bottom-up approaches.

Table 2: Bottom-up vs. top-down approaches – Sharpe ratio comparison. The table presents the summary of results discussed in Leippold and Rüegg [2017] based on a long history of US stock returns (1963 to 2014).

78 Portfolios

(3 Portfolio Construction Methods * 26 possible combinations of 5 Factors)

Bottom-up portfolios with higher Sharpe ratio
Number of portfolios Statistically significant at 5% when adjusted for multiple hypotheses
Difference in Sharpe ratio 67 (86%) 10 (13%)
Difference in Sharpe ratio at similar relative risk 35 (45%) 0 (0%)

For investors, it is important to keep in mind the potential data-mining pitfalls associated with backtests. Leippold and Rüegg note: “Given the increasing computational power for conducting multiple backtests and given the fact that financial institutions have incentives to deliver extraordinary results, it is crucial to apply the most advanced statistical testing frameworks. Ignoring the available tools can lead to hasty conclusions and misallocation of capital to investment strategies that are false discoveries.”

While providers are entitled to rely on short-term backtests to support the superiority of their approach, investors would be well advised to consider the findings in the academic finance literature and to use advanced statistical tools when they evaluate the benefits of bottom-up approaches.

Felix Goltz is research director, and Sivagaminathan Sivasubramanian is quantitative analyst, at ERI Scientific Beta.