I recently completed a research project focused on creating synthetic data using Generative Adversarial Networks (GANs). This innovative approach has shown significant promise in enhancing trading strategy development by addressing overfitting challenges and expanding datasets. Why Synthetic Data Works: Synthetic data mimics real-world data patterns and can be used in the same way as your regular OHLC data. By using GANs, we can generate high-quality, diverse datasets that improve the robustness and accuracy of trading models. Why It’s Useful: You can generate an infinate amount of additional data, of each market and every specific timeframe. In addition, it enhances model performance by reducing overfitting, synthetic data helps create more reliable and profitable trading strategies. I’ve seen impressive results, including a significant increase in trading profits and Sharpe ratio during my research. I’m now looking to connect with others interested in exploring the benefits of synthetic data further, so please feel free to respond.
I found synthetic data very helpful in dealing with path dependent aspects of the strategy, for example on placing stop-loss and take-profit levels. My main gripe is that it’s hard to convince yourself that synthetic data is truly similar to real life.
Supposedly using something like GAN allows you to keep all kinds of latent features like autocorrelation and response to discontinuity.
30 years ago there were several freeware programs that created big sets of data for back testing. i used them and they were of good use and very novel at the time. however today a trader needs to think ahead of the game not react to it and that requires actual patterns which are not random. rather each highly liquid market has a personality of it's own that can be relied upon to exploit as an edge. if you are reacting to price you're far to late. chart below is using 10 point range bars and sub-graph is a study like yours where the close dictates the center-line and ll the data revolves around it. see how consistent the size of the moves are, and note they are actual price data not a synthetic indicator.
Oddly enough, it is. I use TimeGAN package and made my own metrics of “similarity” based on a variety of characteristics and it seems like it. This said, I only use it (obviously) for stuff that heavily path dependent and not for alpha research - things like thresholds for delta hedging, hysteresis bands, stop losses and take profits. Still, it’s a worthwhile investment
I always start with synthetic data but just a dataframe with a 100 columns and 250 rows of uniform random -1 to 1. That is just to not have to be bothered with handling real data. TimeGAN looks interesting. There are also these transformer time series models that I haven't looked much into yet. Salesforce has Uni2TS, a universal time series transformer. My intuition though is there are many underlying assumptions that don't hold for what I am trying to do. I think I am mostly trying to figure out the presence or absence of path dependent change points. My intuition may be totally wrong though but I need to level up my deep NN skills to find out.
In real market data, the price variance (vol) is always changing. In statistics, a random variable is generated using probability density functions and the probability integral transform. The problem with this approach to data generation is that, with these models, variance is a simple constant. If you allow the density function to use functional or stochastic variances, things very quickly become too complex for basic statistics to deal with. Even simple and natural assumptions about how real market price variance evolves can have wildly complex outcomes in terms of the data being generated. You can experiment with it in excel. You simply put a uniform variable into the inverse cumulative density function of a given pdf, and then let the variance be any combination of things. The statistics becomes unwieldy, but the data looks much more like market data. https://en.wikipedia.org/wiki/Doubly_stochastic_model https://en.wikipedia.org/wiki/Compound_probability_distribution