- virtual memory, where we swap values on disk as needed (The Redis Virtual Memory way)
- storing data on disk, in a complex form so that operations can be implemented directly in the on-disk representation, and using the OS cache as a cache layer for the working set (let’s call it the Mongo DB way)
- storing data on disk, but not for direct manipulation, and use memory as a cache of objects that are active, flushing writes on disks when this objects change.
His choice for further experimentation is 3) above.
Currently Salvatore is working on a B-tree based implementation:
Unfortunately I was not able to find a BSD implementation of a btree that satisfied my requirements and that was self-contained enough, so I’m writing one from scratch. The good news is that the new implementation is written as a stand alone library and completely decoupled from Redis itself, so it will be possible to use it for other projects as well.
In terms of requirements for the implementation, Salvatore list the following goals:
- not the fastest but decently fast disk KV store
- good resistance to corruption
- ability to update N keys transactionally so that either all the new values or neither will get in
- no need for compaction
- no need for parallel accesses
In his research for a B-tree implementation, he investigated possible options with two Xapian developers that have worked on Xapian B-tree implementation. You can find their conversation archived here. (nb: look for the dialog between antirez, richarb, and owjb).
Adam: @antirez any plans on adding a log-structured merge tree to make writes less painful?
Salvatore: @marcua I’m not sure what the final tree implementation will be, I’m starting simple, but LSM trees have bad lookup behavior apparently?
Justin: @antirez typically an LSM tree will have much better random write perf than a simple btree, but worst-case lookup time is slower.
Salvatore: @justinsheehy probably a btree can work decently from this point of view?
- Justin: @antirez I agree in principle that a btree as the 2ndary level for Redis may work well. As I see more details I may have more opinions.
Justin: @antirez @marcua One typically uses btrees (or something similar) as a component in an LSM-tree, so you’re going in the right order anyhow.
- Salvatore: @justinsheehy thanks. I think I’ve to consider that for redis the disk is just a backend, and I’ve the in memory cache. So while I hope the implementation will be decent for now the most important thing is to have it working well and soon :)
Adam: @justinsheehy @antirez I was just wondering about write-heavy workloads, which shouldn’t be priority #1
Salvatore: @marcua @justinsheehy more or less our recipe for write-heavy workload is, use Redis as in-memory database! ;)
- Adam: @antirez point taken: will be even better when you can parition!:)