MIS: Sharpe Ratio Maximization

A well-known application of QUBO solving is the Maximum Independent Set (MIS) graph problem - the task of finding the largest independent set in a graph, where an independent set is a set of vertices such that no two vertices are adjacent. Here, we explore a similar method of solving a QUBO, but with a different application.

In this notebook, we will demonstrate an example portfolio optimization problem by looking at Sharpe ratio maximization. To that, we will formulate the problem as a QUBO and try to find optimal weights for assets in a given portfolio. We will get many results using simulated annealing for our QUBO and then classically post-process to find the one that gives the actual highest Sharpe ratio.

Begin by importing the necessary packages:

[1]:

try:
    import pypfopt
except ImportError:
    print("Installing PyPortfolioOpt...")
    %pip install --quiet PyPortfolioOpt
    print("Installed PyPortfolioOpt.")
    print("You may need to restart the kernel to import newly installed packages.")
    import pypfopt

try:
    import yfinance as yf
except ImportError:
    print("Installing yfinance...")
    %pip install --quiet yfinance
    print("Installed yfinance.")
    print("You may need to restart the kernel to import newly installed packages.")
    import yfinance as yf

try:
    import qubovert as qv
except ImportError:
    print("Installing qubovert...")
    %pip install --quiet qubovert
    print("Installed qubovert.")
    print("You may need to restart the kernel to import newly installed packages.")
    import qubovert as qv

[2]:

from datetime import date, timedelta
import warnings
import numpy as np
import os
import pandas as pd
import sympy
from tqdm import tqdm

warnings.filterwarnings("ignore", category=DeprecationWarning)

Users should either store their API token in an environment variable or run the following cell below, replacing "key" with their own token retrieved from https://superstaq.infleqtion.com.

[3]:

# Run this to use token as an environment variable
# os.environ["SUPERSTAQ_API_KEY"] = "key"

Portfolio Risk Background

Portfolio risk is measured by the standard deviation, \(\sigma\). The greater the standard deviation, the greater the risk. Given a portfolio, \(P\), with two assets, \(A\) and \(B\), we represent the weights of the assets in the portfolio with \(w_{i}\) (with \(i =\) {A,B}), and the corresponding portfolio standard deviation as:

\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}}}\)

Given three assets (\(A,B,\) and \(C\)) in \(P\), our portfolio standard deviation would be:

\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + w_{C}^{2}σ_{C}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}} + 2w_{B}w_{C}σ_{B}σ_{C}ρ_{{BC}} + 2w_{A}w_{C}σ_{A}σ_{C}ρ_{{AC}}}\)

where \(\rho\) is a symmetric matrix that contains the correlation coefficient between an asset \(i\) and asset \(j\). The product \({\sigma_{a_i}}{\sigma_{a_j}}\rho_{ij}\) = \(\text{Cov}_{ij}\) is also called the covariance of the assets \(a_{i}\) and \(a_{j}\).

In general, for \(N\) assets = {\(a_1,a_2,...,a_N\)} in a portfolio \(P\), the square of the portfolio standard deviation, in other words, the variance, is given by the following formula:

\[\begin{split}\begin{align} \sigma_{P}^2 &= \sum_{i=1}^{N} {w_{a_i}}^2{\sigma_{a_i}}^2 + 2\sum_{j=1}^{N}\sum_{i<j}^{N}{w_{a_i}}{w_{a_j}}{\sigma_{a_i}}{\sigma_{a_j}}\rho_{ij} \\ &= \sum_{i=1}^{N} {w_{a_i}}^2{\sigma_{a_i}}^2 + 2\sum_{j=1}^{N}\sum_{i<j}^{N}{w_{a_i}}{w_{a_j}}\text{Cov}_{ij} \end{align}\end{split}\]

subject to \(\sum_{i=1}^{N} w_{i} = 1\) and \(0 \leq w_{i}\).

If however, we wanted to choose a subset, M (with \(0 < M \leq N\)), from the N assets, our portfolio standard deviation would then be:

\[\sigma_{P}^2 = \sum_{i=1}^N y_{a_i}w_{{a_i}}^2\sigma_{a_i}^2 + 2\sum_ {i=1}^N \sum_{i<j}^N y_{a_i} y_{a_j} w_{{a_i}}w_{{a_j}}\text{Cov}_{ij}\]

where binary variables, \(y_{a_i}\), are introduced to control which assets are included in the portfolio.

Sharpe Ratio

A useful measure to consider then, is the Sharpe Ratio – which measures a portfolio’s “reward to risk” ratio. To do this, we need to define the portfolio return, \(R_{P}\):

\[R_{P} = \sum_{i=1}^N w_{a_i}r_{a_{i}}\]

where \(r_{a_i}\) is the return on asset \(a_i\). Given that, a portfolio manager who is reliant on the Sharpe ratio seeks to find a portfolio where the weights of each asset (i.e. the percent of the portfolio that each asset represents of the portfolio) maximize the ratio.

The Sharpe ratio is traditionally given in the form:

\[\frac{E[R_{P}]}{\sigma_{P}}\]

where \(E[R_{P}]\) is the expectation of \(R_{P}\). However, this is problematic to formulate as a QUBO. Dividing with binary variables means dividing by zero, so we’ll seek to minimize, instead, the following expression:

\[\sigma_{P}^2 - E[R_{P}]\]

such that

\[\sum_{i=1}^{N} y_{a_i} = M\]

Given that understanding, we will go through an example of Sharpe ratio optimization. As an example dataset, we can use stocks from the S & P 500 Companies (available on Wikipedia) for our portfolio:

[4]:

pd.options.display.float_format = "{:,.2f}".format
# Get dataset to sample some companies
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
df = table[0]
df  # Show dataset in a dataframe format

[4]:

	Symbol	Security	GICS Sector	GICS Sub-Industry	Headquarters Location	Date added	CIK	Founded
0	MMM	3M	Industrials	Industrial Conglomerates	Saint Paul, Minnesota	1957-03-04	66740	1902
1	AOS	A. O. Smith	Industrials	Building Products	Milwaukee, Wisconsin	2017-07-26	91142	1916
2	ABT	Abbott	Health Care	Health Care Equipment	North Chicago, Illinois	1957-03-04	1800	1888
3	ABBV	AbbVie	Health Care	Pharmaceuticals	North Chicago, Illinois	2012-12-31	1551152	2013 (1888)
4	ACN	Accenture	Information Technology	IT Consulting & Other Services	Dublin, Ireland	2011-07-06	1467373	1989
...	...	...	...	...	...	...	...	...
498	YUM	Yum! Brands	Consumer Discretionary	Restaurants	Louisville, Kentucky	1997-10-06	1041061	1997
499	ZBRA	Zebra Technologies	Information Technology	Electronic Equipment & Instruments	Lincolnshire, Illinois	2019-12-23	877212	1969
500	ZBH	Zimmer Biomet	Health Care	Health Care Equipment	Warsaw, Indiana	2001-08-07	1136869	1927
501	ZION	Zions Bancorporation	Financials	Regional Banks	Salt Lake City, Utah	2001-06-22	109380	1873
502	ZTS	Zoetis	Health Care	Pharmaceuticals	Parsippany, New Jersey	2013-06-21	1555280	1952

503 rows × 8 columns

For demonstrative purposes, we can consider a portfolio of \(N=10\) stocks that are a pre-defined subset of the S & P 500 dataset for repeatability:

[5]:

NUM_STOCKS = 10
assets = [
    "NEM",
    "KEYS",
    "WM",
    "CE",
    "SYF",
    "GIS",
    "AAL",
    "D",
    "APH",
    "AMGN",
]  # List of s & p tickers for assets

Getting the Correlation Matrix

Given the selection of assets, we can then get the correlation matrix, \(\rho_{ij}\), between the chosen assets. In this case, we take a list of stock tickers as input and return the matrix for the monthly data in the last 2000 days:

[6]:

time = 2000

# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)

# Reformat date range
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")

df = pd.DataFrame()
print("Loading correlation matrix:")
for ticker in tqdm(assets):
    # Get monthly stock prices over date range
    prices_yf = pd.DataFrame(
        yf.download(ticker, start=start, end=end, interval="1mo", progress=False)["Adj Close"]
    )
    df[ticker] = prices_yf["Adj Close"]

df = df.reindex(sorted(df.columns), axis=1)
corr_matrix_df = df.corr(method="pearson")
corr_matrix_df

Loading correlation matrix:

100%|█| 10/1

[6]:

	AAL	AMGN	APH	CE	D	GIS	KEYS	NEM	SYF	WM
AAL	1.00	-0.78	-0.61	-0.23	-0.45	-0.75	-0.71	-0.69	-0.07	-0.67
AMGN	-0.78	1.00	0.76	0.39	0.35	0.85	0.78	0.68	0.28	0.77
APH	-0.61	0.76	1.00	0.68	0.17	0.86	0.95	0.58	0.66	0.94
CE	-0.23	0.39	0.68	1.00	0.40	0.31	0.68	0.61	0.88	0.58
D	-0.45	0.35	0.17	0.40	1.00	0.12	0.30	0.60	0.19	0.25
GIS	-0.75	0.85	0.86	0.31	0.12	1.00	0.83	0.52	0.29	0.90
KEYS	-0.71	0.78	0.95	0.68	0.30	0.83	1.00	0.62	0.67	0.93
NEM	-0.69	0.68	0.58	0.61	0.60	0.52	0.62	1.00	0.36	0.54
SYF	-0.07	0.28	0.66	0.88	0.19	0.29	0.67	0.36	1.00	0.57
WM	-0.67	0.77	0.94	0.58	0.25	0.90	0.93	0.54	0.57	1.00

Getting expected return and volatility

Next, we get the expected return and the corresponding volatility for our sample of assets for a given time frame (for example, 1 month). They are put into a dataframe as columns:

[7]:

returns_and_vols = {}
time = 2000

# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")
print("Loading expected returns and risks:")
for ticker in tqdm(assets):
    prices_yf = pd.DataFrame(
        yf.download(ticker, start=start, end=end, interval="1mo", progress=False)["Adj Close"]
    )
    prices_yf.rename(columns={"Adj Close": ticker}, inplace=True)
    returns_yf = pypfopt.expected_returns.returns_from_prices(prices_yf)
    exp_return = pypfopt.expected_returns.mean_historical_return(
        returns_yf, returns_data=True, compounding=True, frequency=12
    )
    vol = returns_yf.std() * np.sqrt(12)
    returns_and_vols[ticker] = {"ret": exp_return[0], "vol": vol[0]}

ret_and_vol = pd.DataFrame(returns_and_vols).transpose()
pd.DataFrame(ret_and_vol)

Loading expected returns and risks:

100%|█| 10/1

[7]:

	ret	vol
NEM	0.05	0.33
KEYS	0.27	0.29
WM	0.16	0.19
CE	0.05	0.32
SYF	0.01	0.42
GIS	0.11	0.18
AAL	-0.18	0.46
D	-0.02	0.20
APH	0.13	0.26
AMGN	0.07	0.24

Integration of Weight Discretization and Constraints

For formulation as a QUBO, we introduce discretization of our weights:

\[w_{a_i} = \sum_{j=1}^{N} d_j x_{ij}\]

[8]:

N = len(assets)
discretization = 9  # Parameter used in the discretization process

x = {}
for ticker in assets:
    for power in range(discretization):
        x[f"w_{ticker}_{power}"] = qv.boolean_var(f"w_{ticker}_{power}")

We then reformulate the expected return and volatility with the binary variables, imposing the weight constraint stated in the beginning:

[9]:

N = len(assets)
expected_return = 0
weight_constraint = 0
print("Creating weight constraint...")
for variable in tqdm(x):
    ticker, power = variable.split("_")[1], int(variable.split("_")[2])
    weight = 1 / (2 ** (power + 2))
    expected_return += weight * x[variable] * ret_and_vol["ret"][ticker]
    weight_constraint += weight * x[variable]
weight_constraint -= 1
volatility = 0
print("Creating objective function...")
for i in tqdm(range(N)):
    asset_i_weight_expression = 0
    asset_i_vol = ret_and_vol["vol"][assets[i]]
    for variable in x:
        ticker, power = variable.split("_")[1], int(variable.split("_")[2])
        if ticker == assets[i]:
            weight = 1 / (2 ** (power + 2))
            asset_i_weight_expression += weight * x[variable]

    volatility += asset_i_weight_expression**2 * asset_i_vol**2

    for j in range(i + 1, N):
        asset_i_weight_expression = 0
        asset_i_vol = ret_and_vol["vol"][assets[i]]
        asset_j_weight_expression = 0
        asset_j_vol = ret_and_vol["vol"][assets[j]]
        correlation = corr_matrix_df[assets[i]][assets[j]]
        for variable in x:
            ticker, power = variable.split("_")[1], int(variable.split("_")[2])
            weight = 1 / (2 ** (power + 2))
            if ticker == assets[i]:
                asset_i_weight_expression += weight * x[variable]
            elif ticker == assets[j]:
                asset_j_weight_expression += weight * x[variable]
        volatility += (
            2
            * asset_i_weight_expression
            * asset_j_weight_expression
            * asset_i_vol
            * asset_j_vol
            * correlation
        )

Creating weight constraint...

100%|█| 90/9

Creating objective function...

100%|█| 10/1

Using the `submit_qubo()` Function

For our Sharpe ratio optimization as a QUBO, we create an objective function of the form:

\[\text{obj} = kE[R_{P}] - (1-k)\sigma_{P}^2\]

with a hyperparameter, \(k\), that is preset and tuned to find the optimal Sharpe ratio.

[10]:

k = 0.8

obj = k * expected_return - (1 - k) * volatility
obj *= -1
M = max(obj.values())
# M = 1
lam = sympy.Symbol("lam")
obj = obj.add_constraint_eq_zero(weight_constraint, lam=10)

obj_QUBO = (
    obj.to_qubo()
)  # Converts our objective function into a QUBO. We then set up the client connection in the following cell.

We then set up the client connection in the following cell:

[11]:

# Setting up client connection

from general_superstaq import service
import general_superstaq as gss

client = service.Service()

Finally, we utilize the functionality in our submit_qubo() function to solve the optimization problem of our objective function as a QUBO using multiple techniques. In this case, we employ Superstaq’s internal simulator (which uses the simulated annealing technique) by specifying target="ss_unconstrained_simulator". Use client.get_targets(supports_submit_qubo=True) to see a complete list of available targets supporting QUBO submission.

[12]:

from qubovert.sim import anneal_qubo

res = client.submit_qubo(obj_QUBO, target="ss_unconstrained_simulator")

Portfolio Construction: Best Sharpe Ratio

Based on the solution set obtained from simulated annealing, we update the best Sharpe ratio using the portfolio information,

[13]:

best_sharpe_ratio = float("-inf")  # Set Sharpe ratio at the lowest at the start to compare
best_portfolio = None
for (
    sol
) in (
    res
):  # Loops over the simulated annealing solutions to classically post-process the best Sharpe ratio
    print(sol)
    solution = obj.convert_solution(sol)

    portfolio_weights = {ticker: 0 for ticker in assets}
    N = len(assets)

    for variable in solution:
        if solution[variable] == 1:
            ticker, power = variable.split("_")[1], int(variable.split("_")[2])
            weight = 1 / (2 ** (power + 2))
            portfolio_weights[ticker] += weight
    total = sum(portfolio_weights.values())
    for stock in portfolio_weights:
        portfolio_weights[stock] /= total
    portfolio_expected_return = 0
    portfolio_risk = 0

    for ticker in portfolio_weights:
        portfolio_expected_return += ret_and_vol["ret"][ticker] * portfolio_weights[ticker]

    for i in range(N):
        asset_i_vol = ret_and_vol["vol"][assets[i]]

        portfolio_risk += (portfolio_weights[assets[i]]) ** 2 * (asset_i_vol) ** 2

        for j in range(i + 1, N):
            asset_j_vol = ret_and_vol["vol"][assets[j]]
            correlation = corr_matrix_df[assets[i]][assets[j]]

            portfolio_risk += (
                2
                * portfolio_weights[assets[i]]
                * portfolio_weights[assets[j]]
                * asset_i_vol
                * asset_j_vol
                * correlation
            )

    portfolio_risk = np.sqrt(portfolio_risk)

    portfolio_sharpe_ratio = portfolio_expected_return / portfolio_risk

    portfolio = {
        "weights": portfolio_weights,
        "return": portfolio_expected_return,
        "risk": portfolio_risk,
        "sharpe": portfolio_sharpe_ratio,
    }

    if portfolio["sharpe"] > best_sharpe_ratio:
        best_sharpe_ratio = portfolio["sharpe"]
        best_portfolio = portfolio

{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 1, 10: 1, 11: 0, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 1, 24: 1, 25: 1, 26: 1, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 1, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0, 60: 0, 61: 0, 62: 0, 63: 0, 64: 0, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 1, 76: 1, 77: 1, 78: 1, 79: 1, 80: 1, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0}

and output the value of the expected return, risk, and value of the Sharpe ratio:

[14]:

print("The expected return value is:", round(best_portfolio["return"], 2))
print("The portfolio risk is:", round(best_portfolio["risk"], 2))
print("The Sharpe ratio for the best portfolio", round(best_portfolio["sharpe"], 2))

The expected return value is: 0.2
The portfolio risk is: 0.24
The Sharpe ratio for the best portfolio 0.85

Lastly, we can view what the optimal stocks and their corresponding weights are in our portfolio:

[15]:

fin_portfolio = pd.DataFrame(
    best_portfolio["weights"], ["weights"]
)  # Output portfolio as a dictionary with asset tickers as keys with corresponding asset weight in portfolio

[16]:

# Printing our output portfolio
fin_portfolio

[16]:

	NEM	KEYS	WM	CE	SYF	GIS	AAL	D	APH	AMGN
weights	0.00	0.43	0.50	0.00	0.00	0.01	0.00	0.00	0.06	0.00

[17]:

sum(best_portfolio["weights"].values())  # Check to see the weight constraint is satisfied

[17]:

1.0