Do you have any recommended resources (or personal insights) on backtesting? Specifically, I'm interested in subjects like out-of-sample backtesting, choosing the out-of-sample period(s), total length of backtest, weighting different intervals differently, etc. Also interested in understanding the many statistical gotchas therein.
If you haven't picked a backtester yet, Amibroker is powerful for the price imho. Free to try and has a user community that may be able to help. I use it with Norgate data. https://www.amibroker.com/
has several chapters on backtesting. https://www.quantresearch.org/Lectures.htm has free lecture notes and papers from the same author, some of which cover backtesting.
You are basically doing nonlinear regression on a set of data. You hope your resulting function offers some predictive power in the future. But you have no idea how much of the data you used to create the function was noise and how much of it was the signal you are attempting to model. That's the "gotcha". So you try to solve this problem by setting half your data aside for "testing". This is exactly what you do with deep learning (neural networks, etc.). Most of your seemingly brilliant shower ideas will take a fitted curve just great, only to fail when "testing". This is because you basically fit a curve to random data and that offers no utility. So you repeat this loop of hypothesis, algorithm design and "testing". Over and over. Then one day your "testing" turns up some decent results. If you go at it long enough it's guaranteed you will reach this point. As in it is mathematically certain, even if you are just playing with a big set of random data you created. That's the "gotcha #2". Your testing wasn't really testing at all. It was just a way to filter through some more noise and set aside a few outliers. Be sure to calculate the standard deviation of your results and the standard error of the mean. That's a start but it does not solve all problems. You'll also need to decide what you are measuring against. Don't assume SPY or some other index is sufficient unless you are trading only stocks in that index and weighting them accordingly. Even then you can fool yourself as sometimes groups of stocks in the index correlate more closely than they do at other times. It would be easy to just write a book on the subject. Lots to consider.
Your solution to all this is mean and standard deviation. But when it comes to stocks everything correlates to some degree and your math makes the assumption that all events are independent. When market volatility picks up all correlations move towards 1. "back testing" without knowing ahead of time what you are looking for isn't likely to yield valuable results imo. You kind of need to know something about a quirk in the market before you start. Then do your statistical modeling to perfect a trading strategy based on something you understood already to be true.
For some reason the message board didn't post the thumbnail to Amazon correctly. I believe you mean this book: https://www.amazon.com/Advances-Financial-Machine-Learning-Marcos-ebook/product-reviews/B079KLDW21 Looks like a good one
A few years ago I created a huge number of indicators, tens of thousands. Is GLD over or under the 200 dma? Did unemployment fall last month? etc., etc., etc. And lo and behold, a few indicators had ridiculously high win rates, like 75% or more. Obviously they had no predictive power.
Yes, that's the book. You might not see it displayed in my post if your web browser has an ad blocker enabled.
Suppose you have a very good idea but one of the parameters was not chosen properly. You run out-of-sample testing and fail. You have two choices: to throw away your good idea (that is pity) or correct the wrong parameter (that is curve fitting). This why I don't use out-of-sample testing. Instead: I try to have as much data as possible (10 years at minimum). Number of trades should be more than 100 (my own rule of thumb). Minimum number of parameters. I divide the whole testing period into 3 approximately equal periods and try to optimize my very small number of parameters to have approximately similar results across these 3 time periods. "Optimize" for me means to arrive to some descent rate of return while keeping maximum drawdown of 15%. Why 15%? Because from my personal experience I know that I can "swallow" drawdowns of 30%. So I optimize for max 15% assuming that the future will be twice bad than my backtesting predicts.
All very interesting and appreciated. You point 4) in particular is something I hadn’t thought of before.