I think that some form of scalability would be valuable, but that could easily be achieved with some sort of single threaded worker for each db 'server', and then have multiple instances running to provide the scalability. In order to make the single threaded semantics work well even in a multi-threaded application, I already have a short library for erlang-style pattern matched message passing.
Given the data volumes I've been planning on working with, I mostly want to use the permanent storage for history and fault tolerance, as opposed to live access. That could probably be handled in-memory for the most part. So maybe some form of flat file system would work without causing too many problems.
I originally started using git to effectively achieve that design without having to manage the trees, history, and diff calculation myself, but I've discovered that storing thousands of tiny objects in the git index may not be very efficient. I still think something similar is a good idea, but I would want to separate 'local' version states for each object from the 'global' version, so that it doesn't take forever to save the state of a single object. Maybe storing each object in a git 'branch' with the guid of the object as the branch name would work, since only one object would be in each index. The overhead for saving each object would be slightly higher, but it should be constant, rather than linear with the total number of objects.
Any obvious flaws with that idea that I'm missing? Have any better ideas or foundations to build off of?
That code is pretty rudimentary, but allows low level access to git commands from arc, plus storage and retrieval of arc objects. After my previous comment though, I'll probably change it so that each object gets a separate branch, with 'meta branches' listing which branches to load if necessary.