|I switched rainbow from a continuation-passing-style implementation to a stack-oriented VM. This alone sped things up about 2x. Most of this effort was due to frustration with welder, an arc IDE, being excruciatingly slow. It used to take anything between 4 and 5 seconds to refresh (re-parse) arc.arc - that number is now down to under 400ms.|
The stack-oriented VM runs faster than the CPS-based approach, partly because it allocates fewer objects per function call, but also because it does a lot more work compiling code in advance. The entire history of rainbow, in fact, is about getting more work done in the compilation phase. Rainbow is the first time I've implemented a language, and I think I'm beginning to understand the basics now.
The stack machine affords more opportunities for optimisation, as it's easier to spot inefficiencies when your code is pushing and popping the same values over and over again. Here are some examples:
The expansion of
ends with something like
(or a b)
which is exactly the same as
(if gs24 gs24 nil)
So when rainbow generates stack-instructions for conditionals, it checks for this case, and emits the raw symbol where possible. Of course, we could also modify 'or not to do this, but I think it should be a goal of an interpreter to optimise away this kind of inefficiency if it permits a more concise and elegant expression of an idea.
(do a b c)
, an invocation of a closure with no arguments, no parameters, that is not assigned to any variable. Simply inlining this (instead of creating a closure then invoking it as rainbow did formerly), knocked 20ms off my then-380ms test case. It was a big win.
((fn () a b c))
Another example: the invocation handler by default requested N parameters off the top of the argument stack, where N varied depending on the expression under evaluation. It turns out that the java equivalent of
for a 3-arg invocation is way faster than requesting the list of the last N args each time, mostly, I suspect, because the conditional loop of that request is eliminated.
(cons (pop args) (cons (pop args) (cons (pop args) nil)))
It bothered me too that the VM would push args onto the stack only to have them popped off again directly, when these args were either literals or symbol lookups, ie their values were immediately accessible. Passing these via the stack seemed like unnecessary overhead, but on the other hand, there is an explosive combination of possible invocation types, as the java code for retrieving values is different for bound symbols, free variables, and literals.
I hand-wrote handlers for what I presumed were the most common cases, and the results were good. So yesterday, I dumped the hand-coded versions, and hacked out an arc script to generate handlers for all combinations of invocations involving free variables, bound variables, and literals, for invocations up to 3 args. That makes about 170 different cases, and combined these optimisations knocked another 20ms off my test case. Note that this complicates the build somewhat: you need to build rainbow in order to run the arc script to generate the optimisation code. So to get the optimisations, run the build twice. I'll describe this more in the readme file.
(self) ; invoke a bound variable with zero args
(map car xs) ; (free-var free-var bound-var)
(is x #\space) ; (free-var bound-var literal)
... several hundred more
So, with everything combined, my test-case
returned 303ms on the most recent run, compared to 313ms for arc3 on mzscheme. (It looks like rainbow is a little faster in this case, but I haven't upgraded to arc3.1 yet so I can't say for sure)
(bm 100 (index-source a)) ; a is the text content of arc.arc
The other good news is that there are many more unexploited optimisations waiting. The expansion of 'aif, for example, could be treated with a special-case conditional instruction that doesn't pop its argument. There could be many more invocation handlers for cases where some arguments are the results of other invocations.
Useful to know: the two most significant performance gains were the following
1. run "java -server" instead of default client-VM java (at least on macosx 10.4 or 10.5, java5 or java6). They could have renamed this switch - "java -fast"
2. Quit firefox. Seriously. This speeds things up about 10% for me.
Just to recap, here's what rainbow does:
* a fully compatible arc3 interpreter that runs arc code unmodified
* still supports tail-call optimisation and first-class continuations
* for certain cases, appears to run at least as fast as arc3
* offers an easy interface to java libraries
* the whole stack-oriented VM was committed a day ago, so it's still very fresh, it hasn't had a thorough testing. All 393 automated tests pass, and stuff generally appears to work, but I haven't tried running news.arc on it yet.
And thanks to a suggestion from palsecam, the rainbow repl quits on ctrl-D (on macosx). Some people sent patches, thank you all!