Pair Trading in Python

A market-neutral strategy that simultaneously goes long on an underperforming asset and short on an outperforming correlated asset, profiting from the convergence of their price spread.

Definition

Pair Trading is the canonical implementation of statistical arbitrage, first formalized by Morgan Stanley quants Nunzio Tartaglia and colleagues in the 1980s. It exploits the cointegration relationship between two historically correlated assets — when the spread between them deviates beyond a statistical threshold, the strategy enters a long/short position expecting mean reversion to the equilibrium spread. True pair trading requires cointegration (a stable long-run linear relationship), not merely correlation. Correlated assets can diverge permanently; cointegrated assets are bound by an error-correction mechanism that prevents permanent divergence.

Quantitative Formula

Y_t = \alpha + \beta X_t + \epsilon_t, \quad \epsilon_t \sim I(0)

Where $Y_t$ and $X_t$ are the log prices of the two assets, $\beta$ is the cointegrating vector (hedge ratio), $\alpha$ is a constant, and $\epsilon_t$ is the stationary residual spread. The condition $\epsilon_t \sim I(0)$ — that the residuals are integrated of order zero (stationary) — is the mathematical definition of cointegration (Engle-Granger, 1987). Stationarity of $\epsilon_t$ is verified via the Augmented Dickey-Fuller test on the regression residuals.

Why It Matters in Backtesting

The most common backtesting error in pair trading is using the full historical period to estimate the hedge ratio $\beta$ and then applying it as if it were known in real time — a form of lookahead bias. The correct approach is a rolling or expanding-window OLS regression where $\beta$ is re-estimated at each step using only past data. Additionally, cointegration relationships are non-stationary over long horizons: pairs that were cointegrated from 2010–2015 may have structurally diverged by 2020, requiring regular relationship validation throughout the backtest.

Python Implementation

import numpy as np
    import pandas as pd
    from statsmodels.tsa.stattools import coint, adfuller
    from statsmodels.regression.linear_model import OLS
    from statsmodels.tools import add_constant

    def pair_trading_backtest(price_y: pd.Series, price_x: pd.Series,
                              lookback: int = 60, entry_z: float = 2.0,
                              exit_z: float = 0.5) -> dict:
        """
        Fully walk-forward pair trading backtest with rolling cointegration estimation.
        Avoids lookahead bias by re-estimating hedge ratio at each timestep.
        """
        log_y = np.log(price_y)
        log_x = np.log(price_x)
        # Full-period cointegration test (for initial validation only)
        coint_t, coint_p, _ = coint(log_y, log_x)
        signals = pd.Series(0.0, index=price_y.index)
        spreads = pd.Series(np.nan, index=price_y.index)
        hedge_ratios = pd.Series(np.nan, index=price_y.index)
        for i in range(lookback, len(price_y)):
            window_y = log_y.iloc[i - lookback:i]
            window_x = log_x.iloc[i - lookback:i]
            model = OLS(window_y, add_constant(window_x)).fit()
            beta = model.params.iloc[1]
            spread_history = window_y - beta * window_x
            current_spread = log_y.iloc[i] - beta * log_x.iloc[i]
            z = (current_spread - spread_history.mean()) / (spread_history.std() + 1e-9)
            spreads.iloc[i] = current_spread
            hedge_ratios.iloc[i] = beta
            if z > entry_z:
                signals.iloc[i] = -1.0
            elif z < -entry_z:
                signals.iloc[i] = 1.0
            elif abs(z) < exit_z:
                signals.iloc[i] = 0.0
            else:
                signals.iloc[i] = signals.iloc[i - 1]
        returns = signals.shift(1) * (log_y.diff() - hedge_ratios.shift(1) * log_x.diff())
        return {
            "returns": returns.dropna(),
            "signals": signals,
            "spreads": spreads,
            "hedge_ratios": hedge_ratios,
            "cointegration_p_value": coint_p,
            "cointegrated": coint_p < 0.05
        }

Test this in a live environment

Stop running Jupyter notebooks locally. Paste this Pair Trading code directly into Valetha's Strategy Lab and run a full historical backtest in seconds.