Hello, what would you use to compare the performance of your strategy to? I can think of those: a) daily returns for the market for the whole time b) daily returns while in the trade c) buy the market, always in the market (length similar to your trade length) d) returns of some generic standard strategy (e.g. moving average crossover for trends) Based on this, we could calculate $-return, %-return, avg. trade return, %-profitability. What do you think?
Trade efficiency secondary to profitability. For the futures day trading market, efficiency means earning 2 tks of profit per trade + round trip costs ($5). So for NQ , YM = $15 gross profit per contract per trade. It is $25 for GC, CL and $30 for ES. I define overall profitability (also call financial security, earning a second income, etc) as net earnings of $50K per year or $1000/wk; $200/d. If you have good trade efficiency but too few trades per day then overall profits suffer. You can trade alone, with a human or a robot.
What I'm looking for, is an objective measure, to which you can compare the performance of your strategy to. If e.g. simple buy-and-hold would perform better, there would be no need for a sophisticated strategy. This is, what I can think of: a) daily returns for the market for the whole time b) daily returns while in the trade - up/down days would level out performance and should be worse than a longer held trade. c) buy the market, always in the market (length similar to your trade length) d) returns of some generic standard strategy (e.g. moving average crossover for trends) Any ideas?
Erm, don't you want to include some measure of risk and associated reward before you even start talking about "second income"?
Those are fitness functions, but no comparison to the underlying market. A strategy can have a good value in a fitness function and still be worse than a simple strategy on the same market.
You are testing/comparing a strategy. The underlying market is not a strategy. Using one of your 'examples,' You can compare your strategy, to a buy and hold strategy: You could do this by backtesting both strategies on the same market. You could use a fitness metric to compare the results of the backtest. This is how you can compare strategies. Using your other 'example:' You can do the same as above, but comparing your strategy to the 'simple' strategy, in order to determine if a simple strategy is better than your strategy. In sum: The markets aren't strategies, and can't be compared to strategies. To compare strategies to strategies, you'll need to backtest/forward-test them over the same market data. To evaluate the backtests/forward-tests of two or more strategies, you'll need a fitness metric for comparison.
Sounds like you're looking for a market specific benchmark to compare your strategy to. The benchmark could be random trading, simulated by taking entries/exits when a random number is above or below a threshold. If your strategy only goes long or short, you could match that behavior in random trading. If your strategy only trades intraday, you could match that in the random trading. Same with the trade frequency and holding times, by adjusting the thresholds for entries/exits. You'd also want slippage/commission to be the same. The output of the random trading is a distribution when you save 1000 runs, for example. You could then say your strategy's net profit is better than the 85th percentile of random trading. Or your Sortino ratio is better than the 90th percentile of the 1000 runs. You could turn this into a p-value under two conditions: 1. Your strategy's performance is truly out-of-sample, and 2. you're comparing just one strategy once. (Can't go back and change something. If you want to compare another strategy or modification thereof, you'd need something like the Bonferroni correction.)
FWIW, I am not sure multiple testing error truly applies if you are just making small adjustments as you’re still testing a single hypothesis. In any case, as long as you are making manual adjustments or coming up with priors based on your experience (as opposed to iteration) multiple testing error is unlikely to be relevant.