Pair Trading in Python
A market-neutral strategy that simultaneously goes long on an underperforming asset and short on an outperforming correlated asset, profiting from the convergence of their price spread.
Definition
Pair Trading is the canonical implementation of statistical arbitrage, first formalized by Morgan Stanley quants Nunzio Tartaglia and colleagues in the 1980s. It exploits the cointegration relationship between two historically correlated assets — when the spread between them deviates beyond a statistical threshold, the strategy enters a long/short position expecting mean reversion to the equilibrium spread. True pair trading requires cointegration (a stable long-run linear relationship), not merely correlation. Correlated assets can diverge permanently; cointegrated assets are bound by an error-correction mechanism that prevents permanent divergence.
Quantitative Formula
Where and are the log prices of the two assets, is the cointegrating vector (hedge ratio), is a constant, and is the stationary residual spread. The condition — that the residuals are integrated of order zero (stationary) — is the mathematical definition of cointegration (Engle-Granger, 1987). Stationarity of is verified via the Augmented Dickey-Fuller test on the regression residuals.
Why It Matters in Backtesting
The most common backtesting error in pair trading is using the full historical period to estimate the hedge ratio $\beta$ and then applying it as if it were known in real time — a form of lookahead bias. The correct approach is a rolling or expanding-window OLS regression where $\beta$ is re-estimated at each step using only past data. Additionally, cointegration relationships are non-stationary over long horizons: pairs that were cointegrated from 2010–2015 may have structurally diverged by 2020, requiring regular relationship validation throughout the backtest.
Python Implementation
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import coint, adfuller
from statsmodels.regression.linear_model import OLS
from statsmodels.tools import add_constant
def pair_trading_backtest(price_y: pd.Series, price_x: pd.Series,
lookback: int = 60, entry_z: float = 2.0,
exit_z: float = 0.5) -> dict:
"""
Fully walk-forward pair trading backtest with rolling cointegration estimation.
Avoids lookahead bias by re-estimating hedge ratio at each timestep.
"""
log_y = np.log(price_y)
log_x = np.log(price_x)
# Full-period cointegration test (for initial validation only)
coint_t, coint_p, _ = coint(log_y, log_x)
signals = pd.Series(0.0, index=price_y.index)
spreads = pd.Series(np.nan, index=price_y.index)
hedge_ratios = pd.Series(np.nan, index=price_y.index)
for i in range(lookback, len(price_y)):
window_y = log_y.iloc[i - lookback:i]
window_x = log_x.iloc[i - lookback:i]
model = OLS(window_y, add_constant(window_x)).fit()
beta = model.params.iloc[1]
spread_history = window_y - beta * window_x
current_spread = log_y.iloc[i] - beta * log_x.iloc[i]
z = (current_spread - spread_history.mean()) / (spread_history.std() + 1e-9)
spreads.iloc[i] = current_spread
hedge_ratios.iloc[i] = beta
if z > entry_z:
signals.iloc[i] = -1.0
elif z < -entry_z:
signals.iloc[i] = 1.0
elif abs(z) < exit_z:
signals.iloc[i] = 0.0
else:
signals.iloc[i] = signals.iloc[i - 1]
returns = signals.shift(1) * (log_y.diff() - hedge_ratios.shift(1) * log_x.diff())
return {
"returns": returns.dropna(),
"signals": signals,
"spreads": spreads,
"hedge_ratios": hedge_ratios,
"cointegration_p_value": coint_p,
"cointegrated": coint_p < 0.05
}Test this in a live environment
Stop running Jupyter notebooks locally. Paste this Pair Trading code directly into Valetha's Strategy Lab and run a full historical backtest in seconds.
Open the Python Strategy Lab