Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Exploring the Potential of Synthetic Data in Trading Strategies

Discussion in 'Automated Trading' started by SjoerdAlgo, Jun 20, 2024.

SjoerdAlgo
- 4
  Posts
- 0
  Likes
I recently completed a research project focused on creating synthetic data using Generative Adversarial Networks (GANs). This innovative approach has shown significant promise in enhancing trading strategy development by addressing overfitting challenges and expanding datasets.

Why Synthetic Data Works: Synthetic data mimics real-world data patterns and can be used in the same way as your regular OHLC data. By using GANs, we can generate high-quality, diverse datasets that improve the robustness and accuracy of trading models.

Why It’s Useful: You can generate an infinate amount of additional data, of each market and every specific timeframe. In addition, it enhances model performance by reducing overfitting, synthetic data helps create more reliable and profitable trading strategies.

I’ve seen impressive results, including a significant increase in trading profits and Sharpe ratio during my research.

I’m now looking to connect with others interested in exploring the benefits of synthetic data further, so please feel free to respond.
- ExampleSyntheticData.jpeg
  
  File size:
  
  200.3 KB
  
  Views:
  
  61
#1 Jun 20, 2024

Share
2rosy
- 3,207
  Posts
- 1,366
  Likes
I just use real historical time series returns and shuffle them. You get same mean and variance

#2 Jun 20, 2024

Share

SjoerdAlgo likes this.
Slow Learning Elf
- 496
  Posts
- 600
  Likes
I found synthetic data very helpful in dealing with path dependent aspects of the strategy, for example on placing stop-loss and take-profit levels. My main gripe is that it’s hard to convince yourself that synthetic data is truly similar to real life.

#3 Jun 20, 2024

Share

EdgeHunter and SjoerdAlgo like this.
Slow Learning Elf
- 496
  Posts
- 600
  Likes
2rosy said:
I just use real historical time series returns and shuffle them. You get same mean and variance
More...

Supposedly using something like GAN allows you to keep all kinds of latent features like autocorrelation and response to discontinuity.

#4 Jun 20, 2024

Share

SjoerdAlgo likes this.
MarkBrown
- 5,165
  Posts
- 4,376
  Likes
30 years ago there were several freeware programs that created big sets of data for back testing. i used them and they were of good use and very novel at the time.

however today a trader needs to think ahead of the game not react to it and that requires actual patterns which are not random. rather each highly liquid market has a personality of it's own that can be relied upon to exploit as an edge.

if you are reacting to price you're far to late.

chart below is using 10 point range bars and sub-graph is a study like yours where the close dictates the center-line and ll the data revolves around it. see how consistent the size of the moves are, and note they are actual price data not a synthetic indicator.

Last edited: Jun 20, 2024

#5 Jun 20, 2024

Share

SjoerdAlgo, birdman and PennySnatch like this.
SunTrader
- 19,737
  Posts
- 8,526
  Likes
SjoerdAlgo said:
...

I’ve seen impressive results, including a significant increase in trading profits and Sharpe ratio during my research..
More...

Spendable or research profits?

#6 Jun 20, 2024

Share

SjoerdAlgo likes this.
newbunch
- 1,316
  Posts
- 171
  Likes
Slow Learning Elf said:
My main gripe is that it’s hard to convince yourself that synthetic data is truly similar to real life.
More...

Well, is it?

#7 Jun 21, 2024

Share

SjoerdAlgo likes this.
Slow Learning Elf
- 496
  Posts
- 600
  Likes
newbunch said:
Well, is it?
More...

Oddly enough, it is. I use TimeGAN package and made my own metrics of “similarity” based on a variety of characteristics and it seems like it. This said, I only use it (obviously) for stuff that heavily path dependent and not for alpha research - things like thresholds for delta hedging, hysteresis bands, stop losses and take profits. Still, it’s a worthwhile investment

#8 Jun 21, 2024

Share

SjoerdAlgo, Zwaen and lariatier like this.
lariatier
- 20
  Posts
- 13
  Likes
I always start with synthetic data but just a dataframe with a 100 columns and 250 rows of uniform random -1 to 1. That is just to not have to be bothered with handling real data.

TimeGAN looks interesting. There are also these transformer time series models that I haven't looked much into yet. Salesforce has Uni2TS, a universal time series transformer. My intuition though is there are many underlying assumptions that don't hold for what I am trying to do. I think I am mostly trying to figure out the presence or absence of path dependent change points. My intuition may be totally wrong though but I need to level up my deep NN skills to find out.

#9 Jun 22, 2024

Share
Real Money
- 1,221
  Posts
- 1,049
  Likes
In real market data, the price variance (vol) is always changing. In statistics, a random variable is generated using probability density functions and the probability integral transform. The problem with this approach to data generation is that, with these models, variance is a simple constant.

If you allow the density function to use functional or stochastic variances, things very quickly become too complex for basic statistics to deal with. Even simple and natural assumptions about how real market price variance evolves can have wildly complex outcomes in terms of the data being generated.

You can experiment with it in excel. You simply put a uniform variable into the inverse cumulative density function of a given pdf, and then let the variance be any combination of things. The statistics becomes unwieldy, but the data looks much more like market data.

https://en.wikipedia.org/wiki/Doubly_stochastic_model
https://en.wikipedia.org/wiki/Compound_probability_distribution

#10 Jun 22, 2024

Share

SjoerdAlgo likes this.

(You must log in or sign up to reply here.)

Search