C++ concurrent data structure for fast retrieval

kmiklas · Sep 2, 2020

I have a requirement to create and maintain a ledger of orders. Thousands and of them, with many updates per second. It will operate in a high-speed environment: currently in single-digit mcs range; I'd like to see it in high nanos.

- Must be thread-safe
- Multiple threads and processes will be reading and writing concurrently
- Said ledger will be updated with algos (no GUI order input)

Noodling on how to best implement this... what data structure makes sense here?

1. A vector of pointers or references to orders?
2. A hash table?
3. Caching? I fear race conditions.
4. Database of some kind? I fear perf penalties.

I was thinking a separate thread or process, dedicated to holding the structure in memory, with accessor and mutator methods? Make every effort to stick with atomic operations, to perf penalties associated with locking?

Feedback and/or advice welcome. Thanks.

thecoder · Sep 2, 2020

As an experienced professional senior C++ dev and SW-arch, I can design and develop for you the fastest possible solution.
You just need to give me the project, and of course pay me

kmiklas · Sep 2, 2020

thecoder said:
As an experienced professional senior C++ dev and SW-arch, I can design and develop for you the fastest possible solution.
You just need to give me the project, and of course pay me
More...

Oh yeah? How do you reverse a string?

gaussian · Sep 2, 2020

std::atomic is probably what you want to look at. You will have to insure atomicity.

In terms of architecture you will at least need some way to persist orders so that your history can be rebuilt in the event of a crash. I would suggest first looking at the simplest possible solution: a local cache of orders in the current "window" that gets offloaded to a database at some interval. This database could be something as simple as redis, which then itself can replicate over to cold storage system later. A 3 layer caching strategy would guarantee reliability.

Establishing atomic writes is not terribly difficult using a modern database and good engineering practices. I would suggest you first implement the simplest solution (above), benchmark it, and then make modifications to bring the execution time down to where you need it. Atomicity on the local system's ledger cache will need to be carefully engineered and tested. Consider using the Ravenscar profile of ADA/GNAT as a good baseline. You need a baseline first and reaching sub-microsecond execution times gets into a mixture of:

1. How you programmed it (C++ is likely still too high level and you'll be looking into ASICs eventually)
2. How you architected it (do you have fiber pipe going to databases?)
3. The hardware it is running on (are your structures storeable in the L2 cache of a CPU? What hardware level caching do you need to fit your structures in?)

This problem is not simple. You will want to tailor it to specific hardware and optimize your structures in-flight to fit inside of the fastest available CPU caches to insure top-tier performance. You will also want to look into the lag caused by branch prediction and several other things and write your make file/compiler flags to optimize for this. Good luck.

thecoder · Sep 2, 2020

kmiklas said:
Oh yeah? How do you reverse a string?
More...

Just use this built in function from the std namespace:
Code:
  reverse(str.begin(), str.end());
Why do you ask such a childish question? Do we already have a contract? I don't think so

kmiklas · Sep 2, 2020

gaussian said:
std::atomic is probably what you want to look at. You will have to insure atomicity.
More...

Fully agree. Locking would crush my performance.

gaussian said:
In terms of architecture you will at least need some way to persist orders so that your history can be rebuilt in the event of a crash. I would suggest first looking at the simplest possible solution: a local cache of orders in the current "window" that gets offloaded to a database at some interval. This database could be something as simple as redis, which then itself can replicate over to cold storage system later. A 3 layer caching strategy would guarantee reliability.
More...

Yes, I was thinking of offloading this to a separate process/core with an absolute minimum of data sent across a socket.

gaussian said:
Establishing atomic writes is not terribly difficult using a modern database and good engineering practices. I would suggest you first implement the simplest solution (above), benchmark it, and then make modifications to bring the execution time down to where you need it. Atomicity on the local system's ledger cache will need to be carefully engineered and tested. Consider using the Ravenscar profile of ADA/GNAT as a good baseline. You need a baseline first and reaching sub-microsecond execution times gets into a mixture of:
More...

gaussian said:
1. How you programmed it (C++ is likely still too high level and you'll be looking into ASICs eventually)
More...

Yeah no ASIC or FPGA just yet. For now it's C++, C, and perhaps a little assembler in hotspots.[/QUOTE]

gaussian said:
2. How you architected it (do you have fiber pipe going to databases?)
More...

What can't I offload this to another process/core on the same rig?

gaussian said:
3. The hardware it is running on (are your structures storeable in the L2 cache of a CPU? What hardware level caching do you need to fit your structures in?)
More...

This I did not think of. Thank you.

gaussian said:
This problem is not simple. You will want to tailor it to specific hardware and optimize your structures in-flight to fit inside of the fastest available CPU caches to insure top-tier performance. You will also want to look into the lag caused by branch prediction and several other things and write your make file/compiler flags to optimize for this. Good luck.
More...

Thank you. I appreciate the comments.

kmiklas · Sep 2, 2020

thecoder said:
Just use this built in function from the std namespace:
Code:
  reverse(str.begin(), str.end());
Why do you ask such a childish question? Do we already have a contract? I don't think so
More...
NEXT!

thecoder · Sep 2, 2020

kmiklas said:
NEXT!
More...

And why? Do you have a faster algorithm?
In case of multithreading you of course need to exec that piece of code in a locked state. That's so obvious.

gaussian · Sep 2, 2020

kmiklas said:
What can't I offload this to another process/core on the same rig?
More...

You can. I would suggest not doing that to free up resources for your ledger but if you have one of those fancy multi-CPU server boxes perhaps this is possible. I was just considering fault tolerance.

thecoder said:
And why? Do you have a faster algorithm?
More...

If a prospective employee gave me your answer I would think it was sarcastic. Just because you can use a string reverse function doesn't mean you know how to reverse a string.

You could use a stack which is the canonical solution. You could also do it in-place with pointers. Which is faster asymptotically?

thecoder · Sep 2, 2020

@gaussian, the spec was then insufficient. He should clearly say that he wants to see an implementation, ie. the design, of such a function, not it's mere use.