MIS: Sharpe Ratio Maximization
A well-known application of QUBO solving is the Maximum Independent Set (MIS) graph problem - the task of finding the largest independent set in a graph, where an independent set is a set of vertices such that no two vertices are adjacent. Here, we explore a similar method of solving a QUBO, but with a different application.
In this notebook, we will demonstrate an example portfolio optimization problem by looking at Sharpe ratio maximization. To that, we will formulate the problem as a QUBO and try to find optimal weights for assets in a given portfolio. We will get many results using simulated annealing for our QUBO and then classically post-process to find the one that gives the actual highest Sharpe ratio.
Begin by importing the necessary packages:
[1]:
try:
import general_superstaq as gss
import pypfopt
import qubovert as qv
import yfinance as yf
except ImportError:
print("Installing missing packages...")
%pip install --quiet general-superstaq[examples]
print("You may need to restart the kernel to import newly installed packages.")
import general_superstaq as gss
import pypfopt
import qubovert as qv
import yfinance as yf
[2]:
from datetime import date, timedelta
import numpy as np
import pandas as pd
Users should either store their API token in an environment variable or run the following cell below, replacing "key"
with their own token retrieved from https://superstaq.infleqtion.com
.
[3]:
# Run this to use your token as an environment variable
# import os
# os.environ["SUPERSTAQ_API_KEY"] = "key"
Portfolio Risk Background
Portfolio risk is measured by the standard deviation, \(\sigma\). The greater the standard deviation, the greater the risk. Given a portfolio, \(P\), with two assets, \(A\) and \(B\), we represent the weights of the assets in the portfolio with \(w_{i}\) (with \(i =\) {A,B}), and the corresponding portfolio standard deviation as:
\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}}}\)
Given three assets (\(A,B,\) and \(C\)) in \(P\), our portfolio standard deviation would be:
\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + w_{C}^{2}σ_{C}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}} + 2w_{B}w_{C}σ_{B}σ_{C}ρ_{{BC}} + 2w_{A}w_{C}σ_{A}σ_{C}ρ_{{AC}}}\)
where \(\rho\) is a symmetric matrix that contains the correlation coefficient between an asset \(i\) and asset \(j\). The product \({\sigma_{a_i}}{\sigma_{a_j}}\rho_{ij}\) = \(\text{Cov}_{ij}\) is also called the covariance of the assets \(a_{i}\) and \(a_{j}\).
In general, for \(N\) assets = {\(a_1,a_2,...,a_N\)} in a portfolio \(P\), the square of the portfolio standard deviation, in other words, the variance, is given by the following formula:
subject to \(\sum_{i=1}^{N} w_{i} = 1\) and \(0 \leq w_{i}\).
If however, we wanted to choose a subset, M (with \(0 < M \leq N\)), from the N assets, our portfolio standard deviation would then be:
where binary variables, \(y_{a_i}\), are introduced to control which assets are included in the portfolio.
Sharpe Ratio
A useful measure to consider then, is the Sharpe Ratio – which measures a portfolio’s “reward to risk” ratio. To do this, we need to define the portfolio return, \(R_{P}\):
where \(r_{a_i}\) is the return on asset \(a_i\). Given that, a portfolio manager who is reliant on the Sharpe ratio seeks to find a portfolio where the weights of each asset (i.e. the percent of the portfolio that each asset represents of the portfolio) maximize the ratio.
The Sharpe ratio is traditionally given in the form:
where \(E[R_{P}]\) is the expectation of \(R_{P}\). However, this is problematic to formulate as a QUBO. Dividing with binary variables means dividing by zero, so we’ll seek to minimize, instead, the following expression:
such that
Given that understanding, we will go through an example of Sharpe ratio optimization. As an example dataset, we can use stocks from the S & P 500 Companies (available on Wikipedia) for our portfolio:
[4]:
pd.options.display.float_format = "{:,.2f}".format
# Get dataset to sample some companies
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
table[0] # Show dataset in a dataframe format
[4]:
Symbol | Security | GICS Sector | GICS Sub-Industry | Headquarters Location | Date added | CIK | Founded | |
---|---|---|---|---|---|---|---|---|
0 | MMM | 3M | Industrials | Industrial Conglomerates | Saint Paul, Minnesota | 1957-03-04 | 66740 | 1902 |
1 | AOS | A. O. Smith | Industrials | Building Products | Milwaukee, Wisconsin | 2017-07-26 | 91142 | 1916 |
2 | ABT | Abbott Laboratories | Health Care | Health Care Equipment | North Chicago, Illinois | 1957-03-04 | 1800 | 1888 |
3 | ABBV | AbbVie | Health Care | Biotechnology | North Chicago, Illinois | 2012-12-31 | 1551152 | 2013 (1888) |
4 | ACN | Accenture | Information Technology | IT Consulting & Other Services | Dublin, Ireland | 2011-07-06 | 1467373 | 1989 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
497 | XYL | Xylem Inc. | Industrials | Industrial Machinery & Supplies & Components | White Plains, New York | 2011-11-01 | 1524472 | 2011 |
498 | YUM | Yum! Brands | Consumer Discretionary | Restaurants | Louisville, Kentucky | 1997-10-06 | 1041061 | 1997 |
499 | ZBRA | Zebra Technologies | Information Technology | Electronic Equipment & Instruments | Lincolnshire, Illinois | 2019-12-23 | 877212 | 1969 |
500 | ZBH | Zimmer Biomet | Health Care | Health Care Equipment | Warsaw, Indiana | 2001-08-07 | 1136869 | 1927 |
501 | ZTS | Zoetis | Health Care | Pharmaceuticals | Parsippany, New Jersey | 2013-06-21 | 1555280 | 1952 |
502 rows × 8 columns
For demonstrative purposes, we can consider a portfolio of \(N=10\) stocks that are a pre-defined subset of the S & P 500 dataset for repeatability:
[5]:
NUM_STOCKS = 10
assets = [
"NEM",
"KEYS",
"WM",
"CE",
"SYF",
"GIS",
"AAL",
"D",
"APH",
"AMGN",
] # List of s & p tickers for assets
Getting the Correlation Matrix
Given the selection of assets, we can then get the correlation matrix, \(\rho_{ij}\), between the chosen assets. In this case, we take a list of stock tickers as input and return the matrix for the monthly data in the last 2000 days:
[6]:
time = 2000
# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)
# Reformat date range
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")
df = pd.DataFrame()
print("Loading correlation matrix:")
for ticker in assets:
# Get monthly stock prices over date range
prices_yf = pd.DataFrame(
yf.download(ticker, start=start, end=end, interval="1mo", auto_adjust=False, progress=False)
)
df[ticker] = prices_yf["Adj Close"]
df = df.reindex(sorted(df.columns), axis=1)
corr_matrix_df = df.corr(method="pearson")
corr_matrix_df
Loading correlation matrix:
[6]:
AAL | AMGN | APH | CE | D | GIS | KEYS | NEM | SYF | WM | |
---|---|---|---|---|---|---|---|---|---|---|
AAL | 1.00 | -0.52 | -0.40 | 0.15 | 0.35 | -0.55 | -0.14 | -0.09 | 0.08 | -0.39 |
AMGN | -0.52 | 1.00 | 0.87 | 0.28 | -0.54 | 0.55 | 0.31 | -0.12 | 0.55 | 0.83 |
APH | -0.40 | 0.87 | 1.00 | 0.36 | -0.52 | 0.54 | 0.52 | -0.15 | 0.80 | 0.95 |
CE | 0.15 | 0.28 | 0.36 | 1.00 | -0.12 | 0.08 | 0.47 | 0.21 | 0.45 | 0.40 |
D | 0.35 | -0.54 | -0.52 | -0.12 | 1.00 | -0.41 | -0.16 | 0.57 | -0.17 | -0.51 |
GIS | -0.55 | 0.55 | 0.54 | 0.08 | -0.41 | 1.00 | 0.63 | -0.03 | 0.23 | 0.67 |
KEYS | -0.14 | 0.31 | 0.52 | 0.47 | -0.16 | 0.63 | 1.00 | 0.12 | 0.60 | 0.61 |
NEM | -0.09 | -0.12 | -0.15 | 0.21 | 0.57 | -0.03 | 0.12 | 1.00 | 0.01 | -0.19 |
SYF | 0.08 | 0.55 | 0.80 | 0.45 | -0.17 | 0.23 | 0.60 | 0.01 | 1.00 | 0.74 |
WM | -0.39 | 0.83 | 0.95 | 0.40 | -0.51 | 0.67 | 0.61 | -0.19 | 0.74 | 1.00 |
Getting expected return and volatility
Next, we get the expected return and the corresponding volatility for our sample of assets for a given time frame (for example, 1 month). They are put into a dataframe as columns:
[7]:
returns_and_vols = {}
time = 2000
# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")
print("Loading expected returns and risks:")
for ticker in assets:
prices_yf = pd.DataFrame(
yf.download(ticker, start=start, end=end, interval="1mo", auto_adjust=False, progress=False)
)
prices_yf.rename(columns={"Adj Close": ticker}, inplace=True)
returns_yf = pypfopt.expected_returns.returns_from_prices(prices_yf)
exp_return = pypfopt.expected_returns.mean_historical_return(
returns_yf, returns_data=True, compounding=True, frequency=12
)
vol = returns_yf.std() * np.sqrt(12)
returns_and_vols[ticker] = {"ret": exp_return.iloc[0], "vol": vol.iloc[0]}
ret_and_vol = pd.DataFrame(returns_and_vols).transpose()
pd.DataFrame(ret_and_vol)
Loading expected returns and risks:
[7]:
ret | vol | |
---|---|---|
NEM | 0.02 | 0.35 |
KEYS | 0.10 | 0.28 |
WM | 0.13 | 0.20 |
CE | -0.07 | 0.38 |
SYF | 0.17 | 0.41 |
GIS | 0.07 | 0.18 |
AAL | -0.08 | 0.45 |
D | -0.02 | 0.22 |
APH | 0.26 | 0.26 |
AMGN | 0.08 | 0.25 |
Integration of Weight Discretization and Constraints
For formulation as a QUBO, we introduce discretization of our weights:
[8]:
N = len(assets)
discretization = 9 # Parameter used in the discretization process
x = {}
for ticker in assets:
for power in range(discretization):
x[f"w_{ticker}_{power}"] = qv.boolean_var(f"w_{ticker}_{power}")
We then reformulate the expected return and volatility with the binary variables, imposing the weight constraint stated in the beginning:
[9]:
N = len(assets)
expected_return = 0
weight_constraint = 0
print("Creating weight constraint...")
for variable in x:
ticker, power = variable.split("_")[1], int(variable.split("_")[2])
weight = 1 / (2 ** (power + 2))
expected_return += weight * x[variable] * ret_and_vol["ret"][ticker]
weight_constraint += weight * x[variable]
weight_constraint -= 1
volatility = 0
print("Creating objective function...")
for i in range(N):
asset_i_weight_expression = 0
asset_i_vol = ret_and_vol["vol"][assets[i]]
for variable in x:
ticker, power = variable.split("_")[1], int(variable.split("_")[2])
if ticker == assets[i]:
weight = 1 / (2 ** (power + 2))
asset_i_weight_expression += weight * x[variable]
volatility += asset_i_weight_expression**2 * asset_i_vol**2
for j in range(i + 1, N):
asset_i_weight_expression = 0
asset_i_vol = ret_and_vol["vol"][assets[i]]
asset_j_weight_expression = 0
asset_j_vol = ret_and_vol["vol"][assets[j]]
correlation = corr_matrix_df[assets[i]][assets[j]]
for variable in x:
ticker, power = variable.split("_")[1], int(variable.split("_")[2])
weight = 1 / (2 ** (power + 2))
if ticker == assets[i]:
asset_i_weight_expression += weight * x[variable]
elif ticker == assets[j]:
asset_j_weight_expression += weight * x[variable]
volatility += (
2
* asset_i_weight_expression
* asset_j_weight_expression
* asset_i_vol
* asset_j_vol
* correlation
)
print("Done.")
Creating weight constraint...
Creating objective function...
Done.
Using the submit_qubo()
Function
For our Sharpe ratio optimization as a QUBO, we create an objective function of the form:
with a hyperparameter, \(k\), that is preset and tuned to find the optimal Sharpe ratio.
[10]:
k = 0.8
obj = k * expected_return - (1 - k) * volatility
obj *= -1
M = max(obj.values())
# M = 1
obj = obj.add_constraint_eq_zero(weight_constraint, lam=10)
obj_QUBO = (
obj.to_qubo()
) # Converts our objective function into a QUBO. We then set up the client connection in the following cell.
We then set up the client connection in the following cell:
[11]:
# Setting up client connection
client = gss.Service()
Finally, we utilize the functionality in our submit_qubo()
function to solve the optimization problem of our objective function as a QUBO using multiple techniques. In this case, we employ Superstaq’s internal simulator (which uses the simulated annealing technique) by specifying target="ss_unconstrained_simulator"
. Use client.get_targets(supports_submit_qubo=True)
to see a complete list of available targets supporting QUBO submission.
[12]:
res = client.submit_qubo(obj_QUBO, target="ss_unconstrained_simulator")
Portfolio Construction: Best Sharpe Ratio
Based on the solution set obtained from simulated annealing, we update the best Sharpe ratio using the portfolio information,
[13]:
best_sharpe_ratio = float("-inf") # Set Sharpe ratio at the lowest at the start to compare
best_portfolio = None
for (
sol
) in (
res
): # Loops over the simulated annealing solutions to classically post-process the best Sharpe ratio
solution = obj.convert_solution(sol)
portfolio_weights = {ticker: 0 for ticker in assets}
N = len(assets)
for variable in solution:
if solution[variable] == 1:
ticker, power = variable.split("_")[1], int(variable.split("_")[2])
weight = 1 / (2 ** (power + 2))
portfolio_weights[ticker] += weight
total = sum(portfolio_weights.values())
for stock in portfolio_weights:
portfolio_weights[stock] /= total
portfolio_expected_return = 0
portfolio_risk = 0
for ticker in portfolio_weights:
portfolio_expected_return += ret_and_vol["ret"][ticker] * portfolio_weights[ticker]
for i in range(N):
asset_i_vol = ret_and_vol["vol"][assets[i]]
portfolio_risk += (portfolio_weights[assets[i]]) ** 2 * (asset_i_vol) ** 2
for j in range(i + 1, N):
asset_j_vol = ret_and_vol["vol"][assets[j]]
correlation = corr_matrix_df[assets[i]][assets[j]]
portfolio_risk += (
2
* portfolio_weights[assets[i]]
* portfolio_weights[assets[j]]
* asset_i_vol
* asset_j_vol
* correlation
)
portfolio_risk = np.sqrt(portfolio_risk)
portfolio_sharpe_ratio = portfolio_expected_return / portfolio_risk
portfolio = {
"weights": portfolio_weights,
"return": portfolio_expected_return,
"risk": portfolio_risk,
"sharpe": portfolio_sharpe_ratio,
}
if portfolio["sharpe"] > best_sharpe_ratio:
best_sharpe_ratio = portfolio["sharpe"]
best_portfolio = portfolio
and output the value of the expected return, risk, and value of the Sharpe ratio:
[14]:
print("The expected return value is:", round(best_portfolio["return"], 2))
print("The portfolio risk is:", round(best_portfolio["risk"], 2))
print("The Sharpe ratio for the best portfolio", round(best_portfolio["sharpe"], 2))
The expected return value is: 0.16
The portfolio risk is: 0.19
The Sharpe ratio for the best portfolio 0.84
Lastly, we can view what the optimal stocks and their corresponding weights are in our portfolio:
[15]:
fin_portfolio = pd.DataFrame(
best_portfolio["weights"], ["weights"]
) # Output portfolio as a dictionary with asset tickers as keys with corresponding asset weight in portfolio
[16]:
# Printing our output portfolio
fin_portfolio
[16]:
NEM | KEYS | WM | CE | SYF | GIS | AAL | D | APH | AMGN | |
---|---|---|---|---|---|---|---|---|---|---|
weights | 0.00 | 0.11 | 0.09 | 0.00 | 0.06 | 0.06 | 0.12 | 0.00 | 0.50 | 0.05 |
[17]:
sum(best_portfolio["weights"].values()) # Check to see the weight constraint is satisfied
[17]:
1.0