MIS: Sharpe Ratio Maximization

Open in Colab Launch Binder

A well-known application of QUBO solving is the Maximum Independent Set (MIS) graph problem - the task of finding the largest independent set in a graph, where an independent set is a set of vertices such that no two vertices are adjacent. Here, we explore a similar method of solving a QUBO, but with a different application.

In this notebook, we will demonstrate an example portfolio optimization problem by looking at Sharpe ratio maximization. To that, we will formulate the problem as a QUBO and try to find optimal weights for assets in a given portoflio. We will get many results using simulated annealing for our QUBO and then classically post-process to find the one that gives the actual highest Sharpe ratio.

Begin by importing the necessary packages:

[1]:
try:
    import pypfopt
except ImportError:
    print("Installing PyPortfolioOpt...")
    %pip install --quiet PyPortfolioOpt
    print("Installed PyPortfolioOpt.")
    print("You may need to restart the kernel to import newly installed packages.")
    import pypfopt

try:
    import yfinance as yf
except ImportError:
    print("Installing yfinance...")
    %pip install --quiet yfinance
    print("Installed yfinance.")
    print("You may need to restart the kernel to import newly installed packages.")
    import yfinance as yf
[2]:
from datetime import date, timedelta
import warnings
import numpy as np
import os
import pandas as pd
import sympy
from tqdm import tqdm
import qubovert as qv

warnings.filterwarnings("ignore", category=DeprecationWarning)

Users should either store their API token in an environment variable or run the following cell below, replacing "key" with their own token retrieved from https://superstaq.infleqtion.com.

[3]:
# Run this to use token as an environment variable
# os.environ["SUPERSTAQ_API_KEY"] = "key"

Portfolio Risk Background

Portfolio risk is measured by the standard deviation, \(\sigma\). The greater the standard deviation, the greater the risk. Given a portfolio, \(P\), with two assets, \(A\) and \(B\), we represent the weights of the assets in the portfolio with \(w_{i}\) (with \(i =\) {A,B}), and the corresponding portfolio standard deviation as:

\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}}}\)

Given three assets (\(A,B,\) and \(C\)) in \(P\), our portfolio standard deviation would be:

\(\sigma_{P} = \sqrt{w_{A}^{2}σ_{A}^{2} + w_{B}^{2}σ_{B}^{2} + w_{C}^{2}σ_{C}^{2} + 2w_{A}w_{B}σ_{A}σ_{B}ρ_{{AB}} + 2w_{B}w_{C}σ_{B}σ_{C}ρ_{{BC}} + 2w_{A}w_{C}σ_{A}σ_{C}ρ_{{AC}}}\)

where \(\rho\) is a symmetric matrix that contains the correlation coefficient between an asset \(i\) and asset \(j\). The product \({\sigma_{a_i}}{\sigma_{a_j}}\rho_{ij}\) = \(\text{Cov}_{ij}\) is also called the covariance of the assets \(a_{i}\) and \(a_{j}\).

In general, for \(N\) assets = {\(a_1,a_2,...,a_N\)} in a portfolio \(P\), the square of the portfolio standard deviation, in other words the variance, is given by the following formula:

\[\begin{split}\begin{align} \sigma_{P}^2 &= \sum_{i=1}^{N} {w_{a_i}}^2{\sigma_{a_i}}^2 + 2\sum_{j=1}^{N}\sum_{i<j}^{N}{w_{a_i}}{w_{a_j}}{\sigma_{a_i}}{\sigma_{a_j}}\rho_{ij} \\ &= \sum_{i=1}^{N} {w_{a_i}}^2{\sigma_{a_i}}^2 + 2\sum_{j=1}^{N}\sum_{i<j}^{N}{w_{a_i}}{w_{a_j}}\text{Cov}_{ij} \end{align}\end{split}\]

subject to \(\sum_{i=1}^{N} w_{i} = 1\) and \(0 \leq w_{i}\).

If however, we wanted to choose a subset, M (with \(0 < M \leq N\)), from the N assets, our portfolio standard deviation would then be:

\[\sigma_{P}^2 = \sum_{i=1}^N y_{a_i}w_{{a_i}}^2\sigma_{a_i}^2 + 2\sum_ {i=1}^N \sum_{i<j}^N y_{a_i} y_{a_j} w_{{a_i}}w_{{a_j}}\text{Cov}_{ij}\]

where binary variables, \(y_{a_i}\), are introduced to control which assets are included in the portfolio.

Sharpe Ratio

A useful measure to consider then, is the Sharpe Ratio – which measures a portfolio’s “reward to risk” ratio. To do this, we need to define the portfolio return, \(R_{P}\):

\[R_{P} = \sum_{i=1}^N w_{a_i}r_{a_{i}}\]

where \(r_{a_i}\) is the return on asset \(a_i\). Given that, a portfolio manager who is reliant on the Sharpe ratio seeks to find a portfolio where the weights of each asset (i.e. the percent of the portfolio that each asset represents of the portfolio) maximize the ratio.

The Sharpe ratio is traditionally given in the form:

\[\frac{E[R_{P}]}{\sigma_{P}}\]

where \(E[R_{P}]\) is the expectation of \(R_{P}\). However, this is problematic to formulate as a QUBO. Dividing with binary variables means dividing by zero, so we’ll seek to minimize, instead, the following expression:

\[\sigma_{P}^2 - E[R_{P}]\]

such that

\[\sum_{i=1}^{N} y_{a_i} = M\]

Given that understanding we will go through an example Sharpe ratio optimization. As an example dataset, we can use stocks from the S & P 500 Companies (available on Wikipedia) for our portfolio:

[4]:
pd.options.display.float_format = "{:,.2f}".format
# Get dataset to sample some companies
table = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")
df = table[0]
df  # Show dataset in a dataframe format
[4]:
Symbol Security GICS Sector GICS Sub-Industry Headquarters Location Date added CIK Founded
0 MMM 3M Industrials Industrial Conglomerates Saint Paul, Minnesota 1957-03-04 66740 1902
1 AOS A. O. Smith Industrials Building Products Milwaukee, Wisconsin 2017-07-26 91142 1916
2 ABT Abbott Health Care Health Care Equipment North Chicago, Illinois 1957-03-04 1800 1888
3 ABBV AbbVie Health Care Pharmaceuticals North Chicago, Illinois 2012-12-31 1551152 2013 (1888)
4 ACN Accenture Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373 1989
... ... ... ... ... ... ... ... ...
498 YUM Yum! Brands Consumer Discretionary Restaurants Louisville, Kentucky 1997-10-06 1041061 1997
499 ZBRA Zebra Technologies Information Technology Electronic Equipment & Instruments Lincolnshire, Illinois 2019-12-23 877212 1969
500 ZBH Zimmer Biomet Health Care Health Care Equipment Warsaw, Indiana 2001-08-07 1136869 1927
501 ZION Zions Bancorporation Financials Regional Banks Salt Lake City, Utah 2001-06-22 109380 1873
502 ZTS Zoetis Health Care Pharmaceuticals Parsippany, New Jersey 2013-06-21 1555280 1952

503 rows × 8 columns

For demonstrative purposes, we can consider a portfolio of \(N=10\) stocks that are a pre-defined subset of the S & P 500 dataset for repeatability:

[5]:
NUM_STOCKS = 10
assets = [
    "NEM",
    "KEYS",
    "WM",
    "CE",
    "SYF",
    "GIS",
    "AAL",
    "D",
    "APH",
    "AMGN",
]  # List of s & p tickers for assets

Getting the Correlation Matrix

Given the selection of assets, we can then get the correlation matrix, \(\rho_{ij}\), between the choosen assets. In this case, we take a list of stock tickers as input and return the matrix for the monthly data in the last 2000 days:

[6]:
time = 2000

# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)

# Reformat date range
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")

df = pd.DataFrame()
print("Loading correlation matrix:")
for ticker in tqdm(assets):
    # Get monthly stock prices over date range
    prices_yf = pd.DataFrame(
        yf.download(ticker, start=start, end=end, interval="1mo", progress=False)["Adj Close"]
    )
    df[ticker] = prices_yf["Adj Close"]

df = df.reindex(sorted(df.columns), axis=1)
corr_matrix_df = df.corr(method="pearson")
corr_matrix_df
Loading correlation matrix:
100%|█| 10/1
[6]:
AAL AMGN APH CE D GIS KEYS NEM SYF WM
AAL 1.00 -0.78 -0.61 -0.23 -0.45 -0.75 -0.71 -0.69 -0.07 -0.67
AMGN -0.78 1.00 0.76 0.39 0.35 0.85 0.78 0.68 0.28 0.77
APH -0.61 0.76 1.00 0.68 0.17 0.86 0.95 0.58 0.66 0.94
CE -0.23 0.39 0.68 1.00 0.40 0.31 0.68 0.61 0.88 0.58
D -0.45 0.35 0.17 0.40 1.00 0.12 0.30 0.60 0.19 0.25
GIS -0.75 0.85 0.86 0.31 0.12 1.00 0.83 0.52 0.29 0.90
KEYS -0.71 0.78 0.95 0.68 0.30 0.83 1.00 0.62 0.67 0.93
NEM -0.69 0.68 0.58 0.61 0.60 0.52 0.62 1.00 0.36 0.54
SYF -0.07 0.28 0.66 0.88 0.19 0.29 0.67 0.36 1.00 0.57
WM -0.67 0.77 0.94 0.58 0.25 0.90 0.93 0.54 0.57 1.00

Getting expected return and volatility

Next, we get the expected return and the corresponding volatility for our sample of assets for a given time frame (for example, 1 month). They are put into a dataframe as columns:

[7]:
returns_and_vols = {}
time = 2000

# Set date range for historical prices
end_time = date.today()
start_time = end_time - timedelta(days=time)
end = end_time.strftime("%Y-%m-%d")
start = start_time.strftime("%Y-%m-%d")
print("Loading expected returns and risks:")
for ticker in tqdm(assets):
    prices_yf = pd.DataFrame(
        yf.download(ticker, start=start, end=end, interval="1mo", progress=False)["Adj Close"]
    )
    prices_yf.rename(columns={"Adj Close": ticker}, inplace=True)
    returns_yf = pypfopt.expected_returns.returns_from_prices(prices_yf)
    exp_return = pypfopt.expected_returns.mean_historical_return(
        returns_yf, returns_data=True, compounding=True, frequency=12
    )
    vol = returns_yf.std() * np.sqrt(12)
    returns_and_vols[ticker] = {"ret": exp_return[0], "vol": vol[0]}

ret_and_vol = pd.DataFrame(returns_and_vols).transpose()
pd.DataFrame(ret_and_vol)
Loading expected returns and risks:
100%|█| 10/1
[7]:
ret vol
NEM 0.05 0.33
KEYS 0.27 0.29
WM 0.16 0.19
CE 0.05 0.32
SYF 0.01 0.42
GIS 0.11 0.18
AAL -0.18 0.46
D -0.02 0.20
APH 0.13 0.26
AMGN 0.07 0.24

Integration of Weight Discretization and Constraints

For formulation as a QUBO, we introduce discretization of our weights:

\[w_{a_i} = \sum_{j=1}^{N} d_j x_{ij}\]
[8]:
N = len(assets)
discretization = 9  # Parameter used in the discretization process

x = {}
for ticker in assets:
    for power in range(discretization):
        x[f"w_{ticker}_{power}"] = qv.boolean_var(f"w_{ticker}_{power}")

We then reformulate the expected return and volatility with the binary variables, imposing the weight constraint stated in the beginning:

[9]:
N = len(assets)
expected_return = 0
weight_constraint = 0
print("Creating weight constraint...")
for variable in tqdm(x):
    ticker, power = variable.split("_")[1], int(variable.split("_")[2])
    weight = 1 / (2 ** (power + 2))
    expected_return += weight * x[variable] * ret_and_vol["ret"][ticker]
    weight_constraint += weight * x[variable]
weight_constraint -= 1
volatility = 0
print("Creating objective function...")
for i in tqdm(range(N)):
    asset_i_weight_expression = 0
    asset_i_vol = ret_and_vol["vol"][assets[i]]
    for variable in x:
        ticker, power = variable.split("_")[1], int(variable.split("_")[2])
        if ticker == assets[i]:
            weight = 1 / (2 ** (power + 2))
            asset_i_weight_expression += weight * x[variable]

    volatility += asset_i_weight_expression**2 * asset_i_vol**2

    for j in range(i + 1, N):
        asset_i_weight_expression = 0
        asset_i_vol = ret_and_vol["vol"][assets[i]]
        asset_j_weight_expression = 0
        asset_j_vol = ret_and_vol["vol"][assets[j]]
        correlation = corr_matrix_df[assets[i]][assets[j]]
        for variable in x:
            ticker, power = variable.split("_")[1], int(variable.split("_")[2])
            weight = 1 / (2 ** (power + 2))
            if ticker == assets[i]:
                asset_i_weight_expression += weight * x[variable]
            elif ticker == assets[j]:
                asset_j_weight_expression += weight * x[variable]
        volatility += (
            2
            * asset_i_weight_expression
            * asset_j_weight_expression
            * asset_i_vol
            * asset_j_vol
            * correlation
        )
Creating weight constraint...
100%|█| 90/9
Creating objective function...
100%|█| 10/1

Using the submit_qubo() Function

For our Sharpe ratio optimization as a QUBO, we create an objective function of the form:

\[\text{obj} = kE[R_{P}] - (1-k)\sigma_{P}^2\]

with a hyperparameter, \(k\), that is preset and tuned to find the optimal Sharpe ratio.

[10]:
k = 0.8

obj = k * expected_return - (1 - k) * volatility
obj *= -1
M = max(obj.values())
# M = 1
lam = sympy.Symbol("lam")
obj = obj.add_constraint_eq_zero(weight_constraint, lam=10)

obj_QUBO = (
    obj.to_qubo()
)  # Converts our objective function into a QUBO. We then set up the client connection in the following cell.

We then set up the client connection in the following cell:

[11]:
# Setting up client connection

from general_superstaq import service
import general_superstaq as gss

client = service.Service()

Finally, we utilize the functionality in our submit_qubo() function to solve the optimization problem of our objective function as a QUBO using multiple techniques. To employ the simulated annealing technique, we can add a method="dry-run" parameter. Without this argument, the client will use the target specified (in this case, the Toshiba Bifurcation Machine) as shown in the line commented out. Otherwise, we can specify our target; in this case, the Toshiba Bifurcation Machine:

[12]:
from qubovert.sim import anneal_qubo

res = client.submit_qubo(obj_QUBO, target="toshiba_bifurcation_simulator", method="dry-run")
# res = client.submit_qubo(obj_QUBO, target="toshiba_bifurcation_simulator")
res = gss.qubo.read_json_qubo_result(res)

Portfolio Construction: Best Sharpe Ratio

Based on the solution set obtained from simulated annealing, we update the best Sharpe ratio using the portfolio information,

[13]:
best_sharpe_ratio = float("-inf")  # Set Sharpe ratio at the lowest at the start to compare
best_portfolio = None
for (
    sol
) in (
    res
):  # Loops over the simulated annealing solutions to classically post-process the best Sharpe ratio
    print(sol)
    solution = obj.convert_solution(sol)

    portfolio_weights = {ticker: 0 for ticker in assets}
    N = len(assets)

    for variable in solution:
        if solution[variable] == 1:
            ticker, power = variable.split("_")[1], int(variable.split("_")[2])
            weight = 1 / (2 ** (power + 2))
            portfolio_weights[ticker] += weight
    total = sum(portfolio_weights.values())
    for stock in portfolio_weights:
        portfolio_weights[stock] /= total
    portfolio_expected_return = 0
    portfolio_risk = 0

    for ticker in portfolio_weights:
        portfolio_expected_return += ret_and_vol["ret"][ticker] * portfolio_weights[ticker]

    for i in range(N):
        asset_i_vol = ret_and_vol["vol"][assets[i]]

        portfolio_risk += (portfolio_weights[assets[i]]) ** 2 * (asset_i_vol) ** 2

        for j in range(i + 1, N):
            asset_j_vol = ret_and_vol["vol"][assets[j]]
            correlation = corr_matrix_df[assets[i]][assets[j]]

            portfolio_risk += (
                2
                * portfolio_weights[assets[i]]
                * portfolio_weights[assets[j]]
                * asset_i_vol
                * asset_j_vol
                * correlation
            )

    portfolio_risk = np.sqrt(portfolio_risk)

    portfolio_sharpe_ratio = portfolio_expected_return / portfolio_risk

    portfolio = {
        "weights": portfolio_weights,
        "return": portfolio_expected_return,
        "risk": portfolio_risk,
        "sharpe": portfolio_sharpe_ratio,
    }

    if portfolio["sharpe"] > best_sharpe_ratio:
        best_sharpe_ratio = portfolio["sharpe"]
        best_portfolio = portfolio
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 1, 10: 1, 11: 0, 12: 1, 13: 1, 14: 1, 15: 1, 16: 1, 17: 1, 18: 1, 19: 1, 20: 1, 21: 1, 22: 1, 23: 1, 24: 1, 25: 1, 26: 1, 27: 0, 28: 0, 29: 0, 30: 0, 31: 0, 32: 0, 33: 0, 34: 0, 35: 0, 36: 0, 37: 0, 38: 0, 39: 0, 40: 0, 41: 0, 42: 0, 43: 0, 44: 0, 45: 0, 46: 0, 47: 0, 48: 0, 49: 0, 50: 1, 51: 0, 52: 0, 53: 0, 54: 0, 55: 0, 56: 0, 57: 0, 58: 0, 59: 0, 60: 0, 61: 0, 62: 0, 63: 0, 64: 0, 65: 0, 66: 0, 67: 0, 68: 0, 69: 0, 70: 0, 71: 0, 72: 0, 73: 0, 74: 0, 75: 1, 76: 1, 77: 1, 78: 1, 79: 1, 80: 1, 81: 0, 82: 0, 83: 0, 84: 0, 85: 0, 86: 0, 87: 0, 88: 0, 89: 0}

and output the value of the expected return, risk, and value of the Sharpe ratio:

[14]:
print("The expected return value is:", round(best_portfolio["return"], 2))
print("The portfolio risk is:", round(best_portfolio["risk"], 2))
print("The Sharpe ratio for the best portfolio", round(best_portfolio["sharpe"], 2))
The expected return value is: 0.2
The portfolio risk is: 0.24
The Sharpe ratio for the best portfolio 0.85

Lastly, we can view what the optimal stocks and their corresponding weights are in our portfolio:

[15]:
fin_portfolio = pd.DataFrame(
    best_portfolio["weights"], ["weights"]
)  # Output portfolio as a dictionary with asset tickers as keys with corresponding asset weight in portfolio
[16]:
# Printing our output portfolio
fin_portfolio
[16]:
NEM KEYS WM CE SYF GIS AAL D APH AMGN
weights 0.00 0.43 0.50 0.00 0.00 0.01 0.00 0.00 0.06 0.00
[17]:
sum(best_portfolio["weights"].values())  # Check to see the weight constraint is satisfied
[17]:
1.0