Data Science approaches and curve fitting

Bad_Badness · Dec 8, 2021

Hi All,

Wondering about what people think about a DS approach to automation WRT back test fitting.

System background:
Using ES
System can be shown a "pattern" on an intraday chart, out of back test sample range.
Pattern is then processed into the "find it" sub system
When it is back tested, this "pattern" can be "found" somewhere between 2-4 times over a two year period.
With very basic trade management, the win rate can be, 2 for 2, 2 for 3 or 3 for 4.

So what would be the problem with finding 20-30 patterns and having the system use that to trade?

Patterns can be added and removed. Pattern sets can be selected based on a criteria, of course. So far the only defined sets are Long and Short.

Is this really over fitting?

In some sense ML and DS are all about "fitting" and not necessarily "over fitting"

Obvious points are:
Collisions in setups: So far very few.
Can't find 20-30 patterns: So far 0-5 patterns per day with mode being 2, mean 2.2.
Heavy computing load: Not really an issue. Code is fairly compact and efficient.
Need more data for back testing: Working on it. Target is 4-6 years.
What is the Win:Loss ratio: About 2.5-3.0: 1

Some people have obviously gone down this path before!

Comments?

longandshort · Dec 8, 2021

you should pick up Ernie Chan's books

2rosy · Dec 8, 2021

first, define data science.
What are you doing that is unlike regular back testing?

Zwaen · Dec 8, 2021

I have a background in DS (job). I think it is rather impossible to find something using your laptop or cloud computing and your Python and R scripts, given the AI competition. You use only price as input?

I think a simple momentum system eg ‘If it moves it moves’ will have better results (I dont use this btw)

Snuskpelle · Dec 8, 2021

I personally haven't found something like this work (i.e. prediction better than random), although the thought of it is appealing. Probably my approaches were too unsophisticated (either my amateurish statistical method and/or useless input price only aka. garbage in garbage out).

Not saying that I know for a fact it couldn't but my hunch is this is way too simple and hence millions of people already tried it. Not to mention people with an actual mathematics/statistics background probably find this laughably simple anyway.

These days I'm more into thinking you either need unique inputs for a general alpha, or a specific context where a very simple algo will work solely within that regime based on the assumptions (e.g. bull market). Trying to be "smart" about the algo itself as a hobbyist is IMO a pointless road. Again, not claiming I'm some kind of authority or that someone else couldn't do it, just trying to set up a red warning marker "here your time might burn".

Bad_Badness · Dec 8, 2021

2rosy said:
first, define data science.
What are you doing that is unlike regular back testing?
More...

Probably used the wrong word DS. Essentially I am providing 250 data points from a chart. From that a set of "Turing Tests" are applied to determine how to look at the data. It is not a "Here is the data, go find it again". The tests are not really "grind it out" type tests.

So really the only DS about the whole thing is the starting point, data. I assume a lot of methods pay close attention to the inputs, then massage them, versus how the inputs are interpreted. I am doing the latter.

So from there nothing really different from regular back testing, I assume. So maybe the question is better posed as:
If I can find a high win rate, high profit rate trades, that are infrequent, and then repeat until the total frequency gets to a worthwhile point, what are the issues?

PS: Don't really want to get theoretical in this forum-thread.
PPS: Please understand, although I understand a lot about logic, strong AI and Epistemology, all this algorithmic trading is new to me.

Bad_Badness · Dec 8, 2021

Snuskpelle said:
I personally haven't found something like this work (i.e. prediction better than random),"here your time might burn".
More...

Very much appreciate the thoughts. Hence the posting. The methods are probably beyond 95% most peoples understanding. I spent 4 years studying how to create-define formal languages for Strong AI. But of course that did not directly work out so well.

2rosy · Dec 8, 2021

Bad_Badness said:
Probably used the wrong word DS. Essentially I am providing 250 data points from a chart. From that a set of "Turing Tests" are applied to determine how to look at the data. It is not a "Here is the data, go find it again". The tests are not really "grind it out" type tests.
More...

So you want the algorithm to take data sets and find something that is ideally profitable? deep learning? Maybe see what these people do if it helps.
https://www.youtube.com/c/DeepMind/videos

ph1l · Dec 8, 2021

Bad_Badness said:
Hi All,

Wondering about what people think about a DS approach to automation WRT back test fitting.

System background:
Using ES
System can be shown a "pattern" on an intraday chart, out of back test sample range.
Pattern is then processed into the "find it" sub system
When it is back tested, this "pattern" can be "found" somewhere between 2-4 times over a two year period.
With very basic trade management, the win rate can be, 2 for 2, 2 for 3 or 3 for 4.

So what would be the problem with finding 20-30 patterns and having the system use that to trade?

Patterns can be added and removed. Pattern sets can be selected based on a criteria, of course. So far the only defined sets are Long and Short.

Is this really over fitting?

In some sense ML and DS are all about "fitting" and not necessarily "over fitting"

Obvious points are:
Collisions in setups: So far very few.
Can't find 20-30 patterns: So far 0-5 patterns per day with mode being 2, mean 2.2.
Heavy computing load: Not really an issue. Code is fairly compact and efficient.
Need more data for back testing: Working on it. Target is 4-6 years.
What is the Win:Loss ratio: About 2.5-3.0: 1

Some people have obviously gone down this path before!

Comments?
More...

I developed a k-nearest neighbor strategy to match current price patterns to similar past price patterns and wrote about it here and here. It didn't work as well as I had hoped it would.

userque · Dec 8, 2021

Bad_Badness said:
So what would be the problem with finding 20-30 patterns and having the system use that to trade?
More...

Nothing, as long as the patterns are indeed sufficiently profitable.

Bad_Badness said:
Patterns can be added and removed. Pattern sets can be selected based on a criteria, of course. So far the only defined sets are Long and Short.

Is this really over fitting?
More...

If the out of sample results are sufficiently and convincingly worse than the in-sample, then overfitting is likely.

Bad_Badness said:
In some sense ML and DS are all about "fitting" and not necessarily "over fitting"
More...

Or they are all about determining and separating fitted algos from unfitted ones.

Bad_Badness said:
Comments?
More...

Among other things; ask and answer for yourself:

How often will two a pattern "work" 2 of 2 time; or 4 of four times ... randomly?

Which should be more reliable, a pattern that appears once? Or a pattern that appears scores of times?