There was an earlier patch (first comment under http://arclanguage.org/item?id=155) that used mzscheme's date operators and avoided the platform-dependent system insanity. I advocate stamping out use of "system" from the Arc code, as it's almost guaranteed to break on different platforms. If mzscheme has taken care of platform dependence, Arc might as well take advantage of it.
As an aside, I notice that both the OpenID and "Arc at work" use system to do the heavy lifting. I propose a law that any sufficiently complicated Arc application requires use of "system" to get things done.
This is entirely orthogonal to benchmarking, but I thought I'd point out how memoization is a huge win for the recursive Fibonacci:
arc> (defmemo fib (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
#<procedure>
arc> (time (fib 10000))
time: 957 msec.
...many digits of output omitted...
For those unfamililar with memoization, it makes the function magically remember the results for previous calls. So once you've computed (fib 100), for instance, subsequent calls to (fib 100) return the memoized value immediately. Obviously this only makes sense for functions that depend only on their arguments and don't have side effects. (You pay a space cost, of course, to hold all these results.)
(Even with memoization, this is a silly way to compute Fibonacci numbers, but I think it's an interesting demonstration.)
Is ac-niltree/ac-denil a constant overhead, or does it make some O(1) operations into O(n) operations? Hopefully the nil/denil traverses the whole list only when you're already traversing the whole list. But I haven't been able to convince myself that's always the case. This is on my list of things to investigate, but has anyone already figured this out?
I'm not completely sure, but I think this is a compile-time overhead. So when you type directly into the toplevel, its a O(n) overhead, but when you execute a function defined in arc, executing the function itself should be less additional overhead because it was compiled to a scheme function in the original def.
; translate fn directly into a lambda if it has ordinary
; parameters, otherwise use a rest parameter and parse it.
(define (ac-fn args body env)
; ...
A conditional such as if can only be implemented as a macro if it expands to another conditional such as cond, which must be implemented as a special form.
I think the logic is to make if a "primitive macro," the way + is a primitive function. Just as you can't define + in terms of Arc functions, you can't define if in terms of Arc macros. Nevertheless, (isa + 'fn), and so the thinking is that (isa if 'mac) makes sense.
There are valid reasons not to do things this way, of course. For instance, what's so powerful about macros is that they literally expand into a code tree, which (as you observed) if can't do. For that reason, why not have (isa if 'form) (or 'prim, or 'special, or something similarly descriptive)? Then you have first-class special forms without literally making them macros. This would be a big improvement over the current state of affairs:
arc> (type if)
Error: "reference to undefined identifier: _if"
.
If can be implemented as a primitive function instead of a special form given the fn operator. It could take three arguments: a condition, a then function, and an else function and then call the appropriate function. An example implementation is:
(def if2 (cond then else)
(if cond (then) (else)))
If2 would be given as a primitive function and its implementation would be hidden like that of cons is. Given access to if2, a basic if construct is:
(mac basic-if (cond then (o else))
`(if2 ,cond (fn () ,then) (fn () ,else)))
This would make code walkers easier because they wouldn't have to know about as many special forms.
A convention such as this (or Common Lisp's trailing p for predicates) seems like a good thing to me. Does Arc avoid it because it "costs" an extra character?
Well I don't think it's ugly, but ugliness aside it's very descriptive in one char. I'd personally like to see this convention carried over. Ruby uses the same convention so it's not just Scheme.
If this was intentionally left out my guess would be that PG was trying to save those chars for something meaningful to the compiler (syntax) not for aesthetics.
It should be possible to detect, at define-time, whether a function/macro has side-effects (see the ideas for a conservative purity analyzer in http://arclanguage.org/item?id=558).
So if the function/macro has side-effects, def/mac could e.g. also assign it to a second symbol with "!" appended.
'?' could also be done similarly, but would be more difficult. Here is a start:
def/mac should also assign its function/macro to a second symbol with "?" appended if:
1) The function/macro has no side-effects (no "!")
Leaving aside the issues of detecting a query, I think it's a really bad idea to have the compiler force (badly guessed) semantic knowledge on me.
It's my code, I wrote it, and I don't want any compiler telling me that a query that caches previous results must be imperative! and not end in a ? .
I also thing needlessly polluting a namespace with addition symbols is a bad idea. You only get one namespace in arc, try to preserve it for the future.
that makes sense. '!' is already being used (though the convention doesn't interfere with the syntax at the moment)
bear with me here, but '!' means the function isn't pure, right? if so, who cares? it seems like an ivory tower thing. ? is fine, though maybe prefix notation can be considered
Just because something is academic doesn't mean it's not worthwhile. For instance, map, keep, reduce, etc. are different if passed a function with side-effects. What happens if you give them a function which modifies the list as it's being traversed? A pathological example:
However, if map is passed a pure function, you can e.g. run one copy of the function on each core and run it in parallel.
And "dangerSet" is both non-Lispy and non-Arcy. It's non-Arcy because it's long. And it's non-Lispy because of that capital letter: danger-set would be marginally better. But nevertheless, pair? is shorter than is-pair, and (in my opinion) reads much more nicely.
I gave up (temporarily) on running Arc's web server under Windows, due to various problems (http://arclanguage.org/item?id=1531). My impression is that Windows support is "immature", based on the (system "some Unix command") scattered through the code.
I find it hard to believe that reading the code can be a replacement for
documentation. For example, I would find a one-line description much more
useful than trying to figure out:
In case anyone was wondering... this is Arc's enqueue operation, which atomically adds obj to the queue q. Arc's queue data structure provide constant-time enqueue, dequeue, length, and conversion to list operations. My point is that documentation is a good thing, since understanding raw code can be difficult for mere mortals such as myself.
Cool. Note that commands can also produce output on stderr, especially shell commands executed through "system". So if you want to get "everything" from a command, you'll need to grab stderr too.
Is annotate a general mechanism, or specifically for defining macros? Is ellipsize supposed to limit its output to 80 characters or display at most 80 characters of the input? Is median of an even-length list supposed to return the lower of the middle two? Is cdar supposed to be missing? Is (type 3.0) supposed to be an int? Is support for complex numbers supposed to be in Arc? Is client-ip supposed to be there, or is it left over from something? Does afn stand for "anonymous fn"? What does "rfn" stand for?
These are all real questions that I've faced while trying to write documentation. Let me make it clear that these are not rhetorical questions; I'm more interested in getting actual answers than arguing about using the code as the spec.
Wasn't the point keeping the names car and cdr so you can compose them? (I remember reading such in one of pg's essays.) Then it seems to me to take full advantage of that you need to provide those names for use.
I don't think it is unreasonable to do the following, but it is currently not provided in Arc:
I didn't mean Arc will never have cdar. But to avoid having a language designed according to arbitrary rules rather than the actual demands of the task, I've been trying to be disciplined about not adding operators till I need them.
I noticed this contradiction, too... :)
If we're not going to use c[ad]r composability, why not just use unique, easily distinguishable names for all of these that don't compose:
car --> hd
cdr --> tl
caar --> inner
cddr --> skip2
cadr --> second
...or something like that. Unique names would reduce errors.
I thought from looking at the internals before that the ar_funcall overhead would be the main factor. However, it turns out that of the 40 seconds, about 25 are spent in _<, _+ takes about 9 seconds, ar-funcalls about 2.5 seconds, ar-false? and _- about 1 second each.
So it looks like arc< is the main time sink, although the others all contribute.
I think checking the type of all the arguments is costly.
In case anyone is interested, the Scheme code assigned to _fib is:
(lambda (n)
(quote nil)
(if (not (ar-false? (ar-funcall2 _< n 2))) 1
(ar-funcall2 _+ (ar-funcall1 _fib (ar-funcall2 _- n 1))
(ar-funcall1 _fib (ar-funcall2 _- n 2)))))
I don't know much about MzScheme internals. But from my experience of implementing Gauche Scheme, inlining basic operators such as <, +, and - is a big performance boost, and I suspect MzScheme does similar thing as far as these ops are not redefined. Calling them many times via 'wrapper' function like ac.scm does diminishes that effect.
That said, one possible overhead is that, although arc compiler tries to use ar-funcall2 to avoid consing the arguments, they are consed after all since _+ and _< are defined to take a list of arguments.
The original (time (fib 34)) took 79.8s on my machine.
Changing arc< in ac.scm with this:
(define arc<
(case-lambda
[(a b)
(cond [(and (number? a) (number? b)) (< a b)]
[(and (string? a) (string? b)) (string<? a b)]
[(and (symbol? a) (symbol? b)) (string<? (symbol->string a)
(symbol->string b))]
[(and (char? a) (char? b)) (char<? a b)]
[else (< a b)])]
[args
(cond ((all number? args) (apply < args))
((all string? args) (pairwise string<? args #f))
((all symbol? args) (pairwise (lambda (x y)
(string<? (symbol->string x)
(symbol->string y)))
args
#f))
((all char? args) (pairwise char<? args #f))
(#t (apply < args)))]))
made (time (fib 34)) 72.8s, and further changing _+ with this:
(xdef '+
(case-lambda
[() 0]
[(a b)
(cond [(and (string? a) (string? b)) (string-append a b)]
[(and (arc-list? a) (arc-list? b))
(ac-niltree (append (ar-nil-terminate a)
(ar-nil-terminate b)))]
[else (+ a b)])]
[args
(cond [(all string? args)
(apply string-append args)]
[(all arc-list? args)
(ac-niltree (apply append (map ar-nil-terminate args)))]
[else (apply + args)])]))
made (time (fib 34)) 49.5s.
But generally, these kind of tuning for microbenchmarks only have small effect in real applications, for microbenchmarks magnifies inefficiency of very specific parts.
(Afterthought: It won't be too difficult to write a Scheme macro that expands variable-arity type-dispatching functions like _+ or _+ into case-lambda form. Then this kind of optimization can be applied without cluttering the source too much.)