Curve Fitting in Python

The process of over-optimizing strategy parameters to match historical data so precisely that the model loses predictive power.

Definition

Curve Fitting (also called data dredging or in-sample optimization) is the pathological extreme of parameter optimization in quantitative strategy development. It occurs when a researcher iterates through hundreds or thousands of parameter combinations — stop losses, lookback periods, thresholds, indicators — until finding a configuration that produces exceptional historical performance. The resulting strategy is not exploiting a genuine market inefficiency; it has been reverse-engineered to fit the noise in past data. The mathematical reality is that with enough free parameters, any strategy can be made to fit any dataset perfectly in-sample while having zero predictive validity.

Quantitative Formula

\text{Bias-Variance: } E[(y - \hat{f})^2] = \text{Bias}^2 + \text{Variance} + \sigma^2

The Bias-Variance decomposition reveals the fundamental tradeoff in model complexity. As the number of parameters increases (greater model flexibility), bias decreases but variance increases sharply. Curve fitting sits at the pathological high-variance extreme: the model has near-zero training error but enormous out-of-sample error. The irreducible noise $\sigma^2$ cannot be learned — any model that appears to 'learn' it is memorizing randomness.

Why It Matters in Backtesting

The most common form of curve fitting in backtesting is optimizing a moving average crossover with 50 different parameter combinations and reporting only the best result. This is statistically equivalent to p-hacking. The Deflated Sharpe Ratio (Bailey & López de Prado, 2014) corrects for this by penalizing the reported Sharpe Ratio based on the number of parameter combinations tested. A strategy that achieves a Sharpe of 2.0 after testing 200 parameter sets may have a Deflated Sharpe below 0.5 — consistent with pure noise.

Python Implementation

import numpy as np
    import pandas as pd
    from scipy.stats import norm

    def deflated_sharpe_ratio(returns: pd.Series, n_trials: int,
                              trading_days: int = 252) -> dict:
        """
        Computes the Deflated Sharpe Ratio (DSR) to correct for multiple testing.
        DSR accounts for the number of strategy configurations tested (n_trials).
        Reference: Bailey & López de Prado (2014).
        """
        n = len(returns)
        sr = (returns.mean() / returns.std()) * np.sqrt(trading_days)
        skew = returns.skew()
        kurt = returns.kurtosis()  # Excess kurtosis (Fisher)
        # Expected maximum Sharpe Ratio under null hypothesis after n_trials tests
        euler_mascheroni = 0.5772156649
        expected_max_sr = (1 - euler_mascheroni) * norm.ppf(1 - 1 / n_trials) +                           euler_mascheroni * norm.ppf(1 - 1 / (n_trials * np.e))
        # Deflation formula
        sr_std = np.sqrt((1 + (0.5 * sr**2) - skew * sr + ((kurt + 3) / 4) * sr**2) / (n - 1))
        dsr = norm.cdf((sr - expected_max_sr) / sr_std)
        return {
            "observed_sharpe": sr,
            "expected_max_sr_null": expected_max_sr,
            "deflated_sharpe_ratio": dsr,
            "n_trials_tested": n_trials,
            "likely_curve_fitted": dsr < 0.95
        }

Test this in a live environment

Stop running Jupyter notebooks locally. Paste this Curve Fitting code directly into Valetha's Strategy Lab and run a full historical backtest in seconds.