Log in or Sign up

ET News & Sponsor Info

General Topics

Markets

Technical Topics

Brokerage Firms

Company Specific

Interactive Brokers

Tools of the Trade

Trading for a Living

Community Lounge

Site Support

Feedback

Live data feed options and processing

Discussion in 'Data Sets and Feeds' started by 931, Jul 3, 2020.

931
- 517
  Posts
- 122
  Likes
Im wondering what it takes to gather and process 1000-3000 stocks tick data.
In terms of computing hardware-software-networking and datafeed.

SP500 list for example is mandatory due to its low spread stocks.
Basically maximal ammount of low spread stocks would be welcome until system bottlenecks.

What data rate could i expect to see if collecting tick data of all SP500 stocks and market is very active?
Also good if you can tell your data provider to know format overhead.

Today i did some simulations by using historical tick data as incoming stream to simulate and test how much platform can handle.
It was done over LAN network and i think software is optimized enough to handle 1000+ stocks using regular workstation pc , or the tick data was not consisting of all ticks...

Packet serialization overhead was minimal compared to data feed providers formats that need json parsers etc. , that part was not considered during the test.

But i have API connections and incoming data parsers running on separate computer from other parts of software.
It is to spread workload around and to support hardware firewall with advanced rules, without need to make new rules for each new API that gets tested.

I still suspect that at this stage i dont have the networking-computing-software infrastructure to process this ammount of real incoming tick data with low delays.

Biggest bottleneck atm with 1000+ streams is that all incoming data gets evaluated on same pc and at same intervals.
Causing 100% spikes in cpu usage and delays for orders.
Not impossible to fix if collecting with all data timings shifted and process at different times. But at some point will still need to separate more.

Also im wondering if any data provider sends out ~1sec interval bid and ask bar data as live feed instead of tick data and covers 1000+ stocks?

High quality historical bid-ask data is also very important.
It is good if it goes back at 10+ years, but it could be kept as seperate feed from live.

What provider could be useful in this scenario?

Last edited: Jul 3, 2020

#1 Jul 3, 2020

Share

PlatformFX likes this.
ZBZB
- 3,577
  Posts
- 1,635
  Likes
Www.nanex.net www.iqfeed.net www.polygon.io

#2 Jul 3, 2020

Share
931
- 517
  Posts
- 122
  Likes
Checked out Iq and Poly before.
Does Nanex also offer historical bid/ask data?
If , How far back?

#3 Jul 3, 2020

Share
2rosy
- 2,220
  Posts
- 835
  Likes
you're processing historical tick data correct? then it doesn't matter the rate at which messages arrived in the real world, all you need to do is keep everything in sync. you can slow everything down and use excel or do something like debugger style and step through

#4 Jul 3, 2020

Share
algoseek Sponsor
- 140
  Posts
- 35
  Likes
931 said:
Checked out Iq and Poly before.
Does Nanex also offer historical bid/ask data?
If , How far back?
More...

Nanex currently offers historical market data from January 2004 to the present day.

Ayodeji Olumofe
algoseek.com
algoseek provides high quality, fair price, institutional market data.

#5 Jul 5, 2020

Share
calvinp239
- 34
  Posts
- 4
  Likes
Try checking with algoseek.com

#6 Jul 17, 2020

Share
931
- 517
  Posts
- 122
  Likes
calvinp239 said:
Try checking with algoseek.com
More...

Is algoseek's historical data valid as it comes or still needs later processing?
How much do they ask?

#7 Jul 17, 2020

Share
dholliday
- 193
  Posts
- 105
  Likes
931 said:
Im wondering what it takes to gather and process 1000-3000 stocks tick data.
In terms of computing hardware-software-networking and datafeed.

What provider could be useful in this scenario?
More...

If I understand you correctly, any old machine can do this. The amount of data that comes across the wire is trivial.
I use IQFeed.
Watching every tick and every update (bid ask change etc.) for over 1,000 of the most liquid stocks (that's my filter) on an old i7 desktop (3.4GHz, 16GB ram), I run about 3% cpu usage after the first 30 seconds at the open. Approximately 99% of that is parsing the data coming in (I ran a profiler). The rest (1%) is running those 1,000 systems. I have very few graphics (It is possible to graph anything internally, but I have not done so for many years).
The second 2,000 most liquid symbols will take much less cpu time than the first 1,000. Many won't even trade some days.
Just make sure you write your code in C++, C#, Java, etc. or anything that runs on the CLR or JVM.
I don't have experience with scripting languages so maybe someone else can let you know if that would work.

It's rare, these days, for me to run 1,000 systems. I usually run a setup program the night before to narrow down my symbol list. This might be something to consider.

#8 Jul 17, 2020

Share

DiceAreCast likes this.
931
- 517
  Posts
- 122
  Likes
You clearly have more efficient algos. ~3% usage is great, id assume you wrote in C/C++ as you recommend it first.

It seems viable that text based incoming data stream parsing is most of the pre-processing bottleneck.
Next steps i take after parsing are ->sending over TCP in more efficient format->to hi-lo bars->data validity checks-> +some math before feeding for evaluations.
Those are quite cheap as well.

In my situation biggest bottleneck is while evaluating stocks. Caused by constant cache misses due to relatively random memory access patterns.
Running every tick trough is not possible atm due to lag it would create.

dholliday said:
I usually run a setup program the night before to narrow down my symbol list.
More...

What do you look for when filtering symbol list?
Past volume?

Last edited: Jul 18, 2020

#9 Jul 18, 2020

Share
dholliday
- 193
  Posts
- 105
  Likes
931 said:
You clearly have more efficient algos. ~3% usage is great, id assume you wrote in C/C++ as you recommend it first.

It seems viable that text based incoming data stream parsing is most of the pre-processing bottleneck.
Next steps i take after parsing are ->sending over TCP in more efficient format->to hi-lo bars->data validity checks-> +some math before feeding for evaluations.
Those are quite cheap as well.

In my situation biggest bottleneck is while evaluating stocks. Caused by constant cache misses due to relatively random memory access patterns.
Running every tick trough is not possible atm due to lag it would create.

What do you look for when filtering symbol list?
Past volume?
More...

#10 Jul 19, 2020

Share

(You must log in or sign up to reply here.)

Search