Engineering project
QuantLab
Research-to-production ML backtesting framework with leakage-safe validation and transaction-cost-aware evaluation.
Python pandas scikit-learn XGBoost Financial ML Backtesting
Problem
Financial ML experiments are easy to overfit and hard to trust without careful temporal validation, cost assumptions, and baseline comparisons.
Current status
Portfolio skeleton with planned reproducible examples.
What I built
- Designed a pipeline from market data caching through features, labels, walk-forward splits, training, and backtesting.
- Added comparison points for simple baselines and ML-driven strategies.
- Structured reports around risk, turnover, transaction costs, and reproducibility.
Architecture / system design
- 01
Market Data
- 02
Feature Engineering
- 03
Label Generation
- 04
Walk-Forward Split
- 05
Model Training
- 06
Backtest
- 07
Metrics / Report
Technical highlights
- Validation design is treated as part of the system architecture.
- Transaction costs and risk metrics are included early instead of after model selection.
- Reports are built to make failed experiments useful rather than invisible.
Future work
- Add public toy datasets and deterministic example reports.
- Compare tree-based models, linear baselines, and simple rule-based strategies.
- Document leakage checks and experiment review criteria.
Tech stack
Python pandas NumPy scikit-learn XGBoost Matplotlib pytest
Demo / screenshots
Example reports will use public or synthetic data only.
Resume bullet draft
- Built a financial ML backtesting framework with data caching, feature engineering, walk-forward validation, transaction costs, and reproducible reports.
- Compared baseline and ML strategies with risk-aware metrics to reduce leakage and overfitting risk.