Much of modern financial theory is based upon the assumption that a portfolio containing a diversified set of equities can be used to control risk while achieving a good rate of return. The basic idea is to choose equities that have high expected returns, but are unlikely to move together. Identifying a portfolio of equities that remain well diversified over a future investment period is difficult.
In our work, we investigate how to use machine learning techniques and data mining to learn cross-sectional patterns that can be used to design diversified portfolios. Specifically, we model the connections among equities from different perspectives, and propose three different methods that capture the connections in different time scales. Using the "correlation" structure learned using our models, we show how to build selective but well-diversified portfolios. We show that these portfolios perform well on out of sample data in terms of minimizing risk and achieving high returns.
We provide a method address the shortcomings of correlation in capturing events such as large losses (tail risk). Portfolios constructed using our method significantly reduce tail risk without sacrificing overall returns. We show that our method reduces the worst day performance from -15% to -9% and increases the Sharpe ratio from 0.79 to 0.83. We also provide a method to model the relationship between the equity return that is unexplained by the market return (excess return) and the amount of sentiment in news releases that hasn't been already reflected in the price of equities (excess sentiment). We show that a portfolio built using this method generates an annualized return of 34% over a 10-year time period. In comparison, the S&P 500 index generated 5% return in the same time period.
Thesis Supervisors: Profs. John Guttag and Andrew Lo