Because it's the simplest solution for Arc. Arc does not have any kind of SQL library, and it doesn't even have an officially supported way to drop down to Racket libraries, nor does it have any kind of FFI whatsoever. So flat files are indeed the fastest way to get up and running.
Perhaps in the long run it might be better to have some sort of way to connect to an SQL DB, but keep in mind that Arc is still a work in progress, and unfortunately has not been updated in a long time. For a while, Arc didn't even have Unicode support, because pg was working on more important things.
If you think of Arc as being a prototype, then it all makes sense. pg probably intended to eventually flesh it out into a full language, but I hear he hasn't had the time.
If arc had support for something like DBA or SQLAlchemy built in, I might just have used it with either postgres or sqlite. However, neither of those databases really fit the arc data model very well, imo, because arc is very hash table and list oriented. Objects have very little in the way of a set schema, and hash tables map pretty well to... hash tables.
Anyway, I mostly want to leave all the objects in memory and use direct references between them; my data relations aren't that complicated, and explicit relations where necessary are actually fairly efficient. In fact, that's what most orm's a la SQLAlchemy seem to do; whenever an object is loaded, you can specify desired relations that also get loaded in memory, so you don't have to explicitly query the database each time.
Memory is cheap these days, and I was hoping for something that allowed versioning and perhaps graph-db features.
I think that some form of scalability would be valuable, but that could easily be achieved with some sort of single threaded worker for each db 'server', and then have multiple instances running to provide the scalability. In order to make the single threaded semantics work well even in a multi-threaded application, I already have a short library for erlang-style pattern matched message passing.
Given the data volumes I've been planning on working with, I mostly want to use the permanent storage for history and fault tolerance, as opposed to live access. That could probably be handled in-memory for the most part. So maybe some form of flat file system would work without causing too many problems.
I originally started using git to effectively achieve that design without having to manage the trees, history, and diff calculation myself, but I've discovered that storing thousands of tiny objects in the git index may not be very efficient. I still think something similar is a good idea, but I would want to separate 'local' version states for each object from the 'global' version, so that it doesn't take forever to save the state of a single object. Maybe storing each object in a git 'branch' with the guid of the object as the branch name would work, since only one object would be in each index. The overhead for saving each object would be slightly higher, but it should be constant, rather than linear with the total number of objects.
Any obvious flaws with that idea that I'm missing? Have any better ideas or foundations to build off of?
That code is pretty rudimentary, but allows low level access to git commands from arc, plus storage and retrieval of arc objects. After my previous comment though, I'll probably change it so that each object gets a separate branch, with 'meta branches' listing which branches to load if necessary.
So I guess that makes a consistent foreign function interface something very important for Arcueid to start having then. I think I've built up a foreign function API (sorta) and am now working out the details for dynamic loading of shared libraries so you can do something like (load "mysql.so") and have it dynamically link into Arcueid, as well as a way to compile C sources into such an extension shared object.