Searching for micro alpha

garachen · Feb 25, 2016

Hi all,

I've been debating if I should post or not here mostly because it would take quite a bit of backfill to understand why I'm taking this particular approach. Let's just assume I have good reason to do what I'm doing and I'm looking for a fresh, outsider perspective.

Over the last few months I've been looking at microstructure features in order to predict 1 and 5 minutes into the future. I've found that I can be really good at predicting the size of the move but really bad at predicting direction. By bad I mean I'm right 53% of the time.

It seems I can successfully monetize knowing the size of a move and I'm still refining and deploying strategies that take advantage of that but I REALLY would like to get a handle on direction. I'm pretty sure 55% would work. 60% would be amazing. If I can't find anything better, I'll see what I can do with 53%.

Requirements are that the signal can be traded at least 10 times a day. The lower the signal strengh, the more I would need to trade. What I'm getting now - 53% - would probably need 300 trades a day in order to be worth running.

I have tried a lot of things which I'm happy to share. I have direct access to all the exchanges (futures) I trade and several years of direct access data to test with. I don't need to test over several years though. If it's not profitable on a weekly basis I wouldn't run it.

Anyway, if anyone wants a challenge, if you throw me an idea that makes sense and I code it up and find it to work I'd be willing to fly you out here and spend a few days helping you with your trading.

Of course, if you have a large Alpha edge you should trade it yourself, but if you have an idea that can only work with large volume and low fees, giving it away is better than doing nothing.

Surfeur · Feb 25, 2016

Hi garachen,

What is your predictor indicator ? orderbook ?

Regards.

garachen · Feb 25, 2016

I have a mixture of 93 different signals.

I'll give a few examples:
1) Take each trade and measure it's distance to some short term moving average of weighted mid market. Subject that distance to a cap. Weight it by the size of the trade and then add it to previous signal which decays on trade size.

2) Measure market jumps: places in the order book where there is a blow through where all trades happen on one side of the market and nothing happens on the other side for a period of time and a number of increments. Count the frequency of these on both the bid and ask side.

3) Measure price distance from the extremes of a 1 min, 5 min 10 min, etc range

4) Carefully count trades that happen on both the bid and the offer for 5 increment above and below the current price within the last X minutes. Take several weighted averages of these.

The list goes on and on. I'll list some of the features of some direct market feeds which aren't the same as the retail feed.

--When 100 lot crosses the book you can see both the 100 lot and the number of passive orders it took out to fill that 100 lot. On ICE you also see the size of all those passive orders.

--In addition to order size per level you also can see how many orders make up that total size. With ICE you can also see the size of all the constituent orders as well as the time each on was submitted and where it is in the queue

--On ICE, everybody's order modification is individually viewable. If size changes/cancels I can identify the exact order that is active.

--I can, by order, count all the modifications to determine if it is manual or automated.

Sig · Feb 26, 2016

Why not either trade spreads on either side of the ATM strike or Nadex binaries, which are basically the same thing. For example, security is currently at $100. You can predict it will move 2% but not which direction. You buy a 98/99 and 101/102 spread so you make money if it falls in either one. On Nadex you'd just buy a 101 and sell a 99 strike, of course only works for them if you have signals on indices or commodities. But they have 5 minute binaries, regular options you can only do this once a week for most.

profitlocker · Feb 26, 2016

garachen said:
I have a mixture of 93 different signals.
More...

Sounds like some pretty savvy shit. Fascinating really. I'm probably too dumb to implement something like this but if you want to hire my stupid ass send me a PM

dartmus · Feb 26, 2016

profitlocker said:
Sounds like some pretty savvy shit. Fascinating really. I'm probably too dumb to implement something like this but if you want to hire my stupid ass send me a PM
More...

Sounds boring and tedious. Tedious is joy in the proper realm like developing the strong directional logic G's looking for but don't pm me. I'm not interested.

garachen · Feb 26, 2016

Sig said:
Why not either trade spreads on either side of the ATM strike or Nadex binaries, which are basically the same thing. For example, security is currently at $100. You can predict it will move 2% but not which direction. You buy a 98/99 and 101/102 spread so you make money if it falls in either one. On Nadex you'd just buy a 101 and sell a 99 strike, of course only works for them if you have signals on indices or commodities. But they have 5 minute binaries, regular options you can only do this once a week for most.
More...

I checked them out for a while today. Their main market maker (which is either themselves or Group One) doesn't pay any fees because at times they quote 25c - which you'd never do if you paid the advertised min of 50c. The bid/ask on the binaries I'd do is usually $8 wide. With commission I think I'd barely eke out a profit doing market taking. But as soon as I start doing so they'd start widening their quotes. They aren't going to let me steal from them forever. There doesn't seem to be enough volume to do market making. They are very coy with their numbers but I barely saw anything go through and the only number I found was 5,000 per day.
Traditional options wouldn't be sensitive enough on my time frames.

Johanmul · Feb 27, 2016

I think your problem is underspecified. 5 minutes during US market hours vs. the over night session are completely different things. Even during the day there is significant variation depending on the time of day and the instrument you're trading. As you might have discovered already you might get better prediction accuracy during certain time slots (wink wink). In addition, when you are predicting 5 minutes out, there might be other features that are more relevant than the market microstructure 5 minutes ago. Have you thought about this?

53% is pretty much a coin toss. Divide your data set into five chunks, do you consistently get 53% in all five? Also, I don't buy into the fact that you have 93 signals. Do you know which signals have stronger predictive power? Check out the correlation matrix of these signals. I bet you don't have more than a handful of signals after removing the ones that do not contain any "additional" information.

I like the way you think though, but if I were you I'd post my signal data set for people to play around with. Actually I have done this in a private forum and one of the guys who I shared our data with was a world renowned statistician. His findings didn't really help with the problem at hand at that time but turned out to be quite useful later on.

Buena suerte

PS: just FYI, we have put in many years of research in similar problems. It is possible to predict the direction of the market with 65-70% accuracy in certain instruments under certain conditions in the time frames you're talking about, though probably not in the way you imagine.

garachen · Feb 27, 2016

Johanmul said:
I think your problem is underspecified. 5 minutes during US market hours vs. the over night session are completely different things. Even during the day there is significant variation depending on the time of day and the instrument you're trading. As you might have discovered already you might get better prediction accuracy during certain time slots (wink wink). In addition, when you are predicting 5 minutes out, there might be other features that are more relevant than the market microstructure 5 minutes ago. Have you thought about this?

53% is pretty much a coin toss. Divide your data set into five chunks, do you consistently get 53% in all five? Also, I don't buy into the fact that you have 93 signals. Do you know which signals have stronger predictive power? Check out the correlation matrix of these signals. I bet you don't have more than a handful of signals after removing the ones that do not contain any "additional" information.

I like the way you think though, but if I were you I'd post my signal data set for people to play around with. Actually I have done this in a private forum and one of the guys who I shared our data with was a world renowned statistician. His findings didn't really help with the problem at hand at that time but turned out to be quite useful later on.

Buena suerte

PS: just FYI, we have put in many years of research in similar problems. It is possible to predict the direction of the market with 65-70% accuracy in certain instruments under certain conditions in the time frames you're talking about, though probably not in the way you imagine.
More...

You are right. I have divided the data into 4 different time slots based on volume and market depth. My best accuracy for predicting size of the move is from 3pm to midnight and 7:35 to noon. The 7:35+ is what I've been running. (PST)

I've played a lot with the starting number of features. From using the 12 best to going all the way up to 600. This is just the starting number. When I retune the model it performs feature selection to get to something smaller. But I find these have - at least recently - tended to vary significantly. I spent about a month with a more constrained feature set and it was always slightly underperforming my regular feature set. Through discussion with others, I've know some who start with over 1,000 features - which seemed crazy to me at the time but they also perform similar feature selection techniques.

I'm pretty sure if I went super low frequency like 10 trades a day per product I could probably get 65-70% accuracy on direction. But if I'm that low frequency I'd have to push pretty massive size to get good money. Maybe a larger position than I'm comfortable with. And I'd have to rework my feature extraction code as I'd be looking for very specific sequences of events that I currently abstract away. But that's the next rabbit hole I'm currently contemplating. Ug...

I don't think I can't legally post raw exchange data to a public forum and there's not much point in posting my current features. I wouldn't expect anyone here to put in the time to really work with them or try to understand them and there's enough exchange data in there to make me cautious distributing.

Thanks for your response. It's reminded me of other conversations I've had about this several years ago that I'd forgotten about.

Johanmul · Feb 27, 2016

The prediction problem you are talking about has been studied in other domains for many many years, I'd dig up some power systems papers and see how utilities like PG&E are predicting the load for the electric grid for the next day or next month. Wholesale electricity market is an interesting one, you place virtual bids against the spread between the Day-Ahead (DA) and Real-Time (RT) price (LMP) for each pricing node (there are thousands of these nodes) and hour. There are also Financial Transmission Rights (FTRs) where you bid on the electricity prices 1 month, 3 month or even 1 year out. The weather is a big unknown in this problem and as anyone could guess the accuracy of your predictions degrade with the increasing time window. For example you can predict the weather tomorrow with much higher accuracy than the weather 3 months out. That said, there are heuristics you can use, like historical weather data for zip codes. Other factors that play a significant role are power plant/transmission line outages, the physical nature of the transmission lines and fuel (mainly natural gas and coal) prices. I could write more but I guess you can see the parallels to other markets. Going back to the weather forecast example, if you can get 60% accuracy for time frame t, you should be able to do a better job with t/2 and a much better job with t/4. First find a signal that's reliable in time frame t and then work your way backwards to improve the accuracy.