Your backtest system must be coupled with a nice/fast/compressed database... Ideally, the backtest program (say C++) query the database and you run backtest on small trunks a time... So how do you do it? What database do you use? Thank you! ( I am thinking KDB together with Matlab)...
If you have budget for KDB you won't regret. Alternatively, flat files. Most relational databases are too inefficient at extracting long linear sequences of data.
Plain binary files. Can't beat them. With todays fast computers any superimposed database layer will create unnecessary overhead (speed and code). Datafields: Byte key (trade, dom bid/ask event, volume, others) Single price Long volume Long timestamp (millisec from 00:00 same day) One file per symbol per day. Very simple to access, search, analyze.
CSV? And if you want to query some part of your data, say one year worth of tick data, what do you do?
Yeah, or HDF5 which is essentially the same as a flat file, but it handles the structure and compression, so it might save some effort
Yeah, I have access to KDB. So you think KDB + Matlab is a viable approach? The reason I ask this is because I really hate Q language, I would like to avoid Q as much as possible. So if KDB can store all the data, I just need to use an interface to query the KDB tick database from Matlab. And the majority of the analytical work can be done in Matlab (KDB doesn't have analytics...) Any thoughts about this approach?
Indexing and querying a small trunk from 1-day worth of tick data is still a hassle and slow, am I right? Slower than KDB... I think.