If arc had support for something like DBA or SQLAlchemy built in, I might just have used it with either postgres or sqlite. However, neither of those databases really fit the arc data model very well, imo, because arc is very hash table and list oriented. Objects have very little in the way of a set schema, and hash tables map pretty well to... hash tables.
Anyway, I mostly want to leave all the objects in memory and use direct references between them; my data relations aren't that complicated, and explicit relations where necessary are actually fairly efficient. In fact, that's what most orm's a la SQLAlchemy seem to do; whenever an object is loaded, you can specify desired relations that also get loaded in memory, so you don't have to explicitly query the database each time.
Memory is cheap these days, and I was hoping for something that allowed versioning and perhaps graph-db features.
I think that some form of scalability would be valuable, but that could easily be achieved with some sort of single threaded worker for each db 'server', and then have multiple instances running to provide the scalability. In order to make the single threaded semantics work well even in a multi-threaded application, I already have a short library for erlang-style pattern matched message passing.
Given the data volumes I've been planning on working with, I mostly want to use the permanent storage for history and fault tolerance, as opposed to live access. That could probably be handled in-memory for the most part. So maybe some form of flat file system would work without causing too many problems.
I originally started using git to effectively achieve that design without having to manage the trees, history, and diff calculation myself, but I've discovered that storing thousands of tiny objects in the git index may not be very efficient. I still think something similar is a good idea, but I would want to separate 'local' version states for each object from the 'global' version, so that it doesn't take forever to save the state of a single object. Maybe storing each object in a git 'branch' with the guid of the object as the branch name would work, since only one object would be in each index. The overhead for saving each object would be slightly higher, but it should be constant, rather than linear with the total number of objects.
Any obvious flaws with that idea that I'm missing? Have any better ideas or foundations to build off of?
That code is pretty rudimentary, but allows low level access to git commands from arc, plus storage and retrieval of arc objects. After my previous comment though, I'll probably change it so that each object gets a separate branch, with 'meta branches' listing which branches to load if necessary.