Exploring the Potential of Synthetic Data in Trading Strategies

Discussion in 'Automated Trading' started by SjoerdAlgo, Jun 20, 2024.

  1. I recently completed a research project focused on creating synthetic data using Generative Adversarial Networks (GANs). This innovative approach has shown significant promise in enhancing trading strategy development by addressing overfitting challenges and expanding datasets.

    Why Synthetic Data Works: Synthetic data mimics real-world data patterns and can be used in the same way as your regular OHLC data. By using GANs, we can generate high-quality, diverse datasets that improve the robustness and accuracy of trading models.

    Why It’s Useful: You can generate an infinate amount of additional data, of each market and every specific timeframe. In addition, it enhances model performance by reducing overfitting, synthetic data helps create more reliable and profitable trading strategies.

    I’ve seen impressive results, including a significant increase in trading profits and Sharpe ratio during my research.

    I’m now looking to connect with others interested in exploring the benefits of synthetic data further, so please feel free to respond.
     
  2. 2rosy

    2rosy

    I just use real historical time series returns and shuffle them. You get same mean and variance
     
    SjoerdAlgo likes this.
  3. I found synthetic data very helpful in dealing with path dependent aspects of the strategy, for example on placing stop-loss and take-profit levels. My main gripe is that it’s hard to convince yourself that synthetic data is truly similar to real life.
     
    EdgeHunter and SjoerdAlgo like this.
  4. Supposedly using something like GAN allows you to keep all kinds of latent features like autocorrelation and response to discontinuity.
     
    SjoerdAlgo likes this.
  5. MarkBrown

    MarkBrown

    30 years ago there were several freeware programs that created big sets of data for back testing. i used them and they were of good use and very novel at the time.

    however today a trader needs to think ahead of the game not react to it and that requires actual patterns which are not random. rather each highly liquid market has a personality of it's own that can be relied upon to exploit as an edge.

    if you are reacting to price you're far to late.

    chart below is using 10 point range bars and sub-graph is a study like yours where the close dictates the center-line and ll the data revolves around it. see how consistent the size of the moves are, and note they are actual price data not a synthetic indicator.[​IMG]
     
    Last edited: Jun 20, 2024
    SjoerdAlgo, birdman and PennySnatch like this.
  6. SunTrader

    SunTrader

    Spendable or research profits?
     
    SjoerdAlgo likes this.
  7. newbunch

    newbunch

    Well, is it?
     
    SjoerdAlgo likes this.
  8. Oddly enough, it is. I use TimeGAN package and made my own metrics of “similarity” based on a variety of characteristics and it seems like it. This said, I only use it (obviously) for stuff that heavily path dependent and not for alpha research - things like thresholds for delta hedging, hysteresis bands, stop losses and take profits. Still, it’s a worthwhile investment
     
    SjoerdAlgo, Zwaen and lariatier like this.
  9. lariatier

    lariatier

    I always start with synthetic data but just a dataframe with a 100 columns and 250 rows of uniform random -1 to 1. That is just to not have to be bothered with handling real data.

    TimeGAN looks interesting. There are also these transformer time series models that I haven't looked much into yet. Salesforce has Uni2TS, a universal time series transformer. My intuition though is there are many underlying assumptions that don't hold for what I am trying to do. I think I am mostly trying to figure out the presence or absence of path dependent change points. My intuition may be totally wrong though but I need to level up my deep NN skills to find out.
     
  10. Real Money

    Real Money

    In real market data, the price variance (vol) is always changing. In statistics, a random variable is generated using probability density functions and the probability integral transform. The problem with this approach to data generation is that, with these models, variance is a simple constant.

    If you allow the density function to use functional or stochastic variances, things very quickly become too complex for basic statistics to deal with. Even simple and natural assumptions about how real market price variance evolves can have wildly complex outcomes in terms of the data being generated.

    You can experiment with it in excel. You simply put a uniform variable into the inverse cumulative density function of a given pdf, and then let the variance be any combination of things. The statistics becomes unwieldy, but the data looks much more like market data.

    https://en.wikipedia.org/wiki/Doubly_stochastic_model
    https://en.wikipedia.org/wiki/Compound_probability_distribution
     
    #10     Jun 22, 2024
    SjoerdAlgo likes this.