Curve Fitting in Python
The process of over-optimizing strategy parameters to match historical data so precisely that the model loses predictive power.
Definition
Curve Fitting (also called data dredging or in-sample optimization) is the pathological extreme of parameter optimization in quantitative strategy development. It occurs when a researcher iterates through hundreds or thousands of parameter combinations — stop losses, lookback periods, thresholds, indicators — until finding a configuration that produces exceptional historical performance. The resulting strategy is not exploiting a genuine market inefficiency; it has been reverse-engineered to fit the noise in past data. The mathematical reality is that with enough free parameters, any strategy can be made to fit any dataset perfectly in-sample while having zero predictive validity.
Quantitative Formula
The Bias-Variance decomposition reveals the fundamental tradeoff in model complexity. As the number of parameters increases (greater model flexibility), bias decreases but variance increases sharply. Curve fitting sits at the pathological high-variance extreme: the model has near-zero training error but enormous out-of-sample error. The irreducible noise cannot be learned — any model that appears to 'learn' it is memorizing randomness.
Why It Matters in Backtesting
The most common form of curve fitting in backtesting is optimizing a moving average crossover with 50 different parameter combinations and reporting only the best result. This is statistically equivalent to p-hacking. The Deflated Sharpe Ratio (Bailey & López de Prado, 2014) corrects for this by penalizing the reported Sharpe Ratio based on the number of parameter combinations tested. A strategy that achieves a Sharpe of 2.0 after testing 200 parameter sets may have a Deflated Sharpe below 0.5 — consistent with pure noise.
Python Implementation
import numpy as np
import pandas as pd
from scipy.stats import norm
def deflated_sharpe_ratio(returns: pd.Series, n_trials: int,
trading_days: int = 252) -> dict:
"""
Computes the Deflated Sharpe Ratio (DSR) to correct for multiple testing.
DSR accounts for the number of strategy configurations tested (n_trials).
Reference: Bailey & López de Prado (2014).
"""
n = len(returns)
sr = (returns.mean() / returns.std()) * np.sqrt(trading_days)
skew = returns.skew()
kurt = returns.kurtosis() # Excess kurtosis (Fisher)
# Expected maximum Sharpe Ratio under null hypothesis after n_trials tests
euler_mascheroni = 0.5772156649
expected_max_sr = (1 - euler_mascheroni) * norm.ppf(1 - 1 / n_trials) + euler_mascheroni * norm.ppf(1 - 1 / (n_trials * np.e))
# Deflation formula
sr_std = np.sqrt((1 + (0.5 * sr**2) - skew * sr + ((kurt + 3) / 4) * sr**2) / (n - 1))
dsr = norm.cdf((sr - expected_max_sr) / sr_std)
return {
"observed_sharpe": sr,
"expected_max_sr_null": expected_max_sr,
"deflated_sharpe_ratio": dsr,
"n_trials_tested": n_trials,
"likely_curve_fitted": dsr < 0.95
}Test this in a live environment
Stop running Jupyter notebooks locally. Paste this Curve Fitting code directly into Valetha's Strategy Lab and run a full historical backtest in seconds.
Open the Python Strategy Lab