Arc Forum | Bullet: Adventures in a minimally-delimited lisp

Arc Forum

	Bullet: Adventures in a minimally-delimited lisp (seertaak.posterous.com)
	5 points by seertaak 4968 days ago \| 36 comments

3 points by rocketnia 4968 days ago | link

Awesome work! Some of these ideas look mighty familiar, but it's good to see them in this combination, lol. Bullet reminds me of a cross between Jarc and Wart.

Why does bullet use (var xs: new ArrayList) instead of (let xs new.ArrayList ...)? If it's mainly to support optional typing, what about something like (let-typed ArrayList xs new.ArrayList ...) instead?

Speaking of types, what do you hope to get out of them? A lint-like system that discourages the developer from using code that doesn't satisfy it? Optimizations? Letting programmers seal up the abstraction leaks in libraries they write (at the cost of library users' ability to take advantage of those leaks)? Inferring implementation code from its type (e.g. resolving certain cases of polymorphic dispatch at compile time)? Inferring autocompletion results from a type?

You do mention "gradual typing," which to me implies answers to lots of these questions. :) (Basically "yes" for everything but implementation inference.) But I'm interested in your take on it.

-----

2 points by seertaak 4968 days ago | link

Hi, sorry for the late reply (it's now morning in Berlin :) , and thanks for the kind comments.

I opted for the "var" notation because I wanted to emphasize that the bindings are to _variables_; not to _values_ as is the case in (some) other lisps. I wanted to leave "let" for the case when you really want a constant. Another reason is purely aesthetic; the first programming language I learned was Pascal (ah, the fond memories :) which I always thought was a very _clean_ language. I wanted to evoke that (that's also why I use "begin" rather than "do" -- interestingly, once you use the 2/4-indent rules, you actually get pushed to _not_ using two-letter names, since 2/4-indent is ambiguous under 2-letter-name).

Regarding gradual typing, it would be nice because:

1. when programming in the large, knowing which types are expected and returned by a procedure is extremely helpful;

2. typing allows IDE autocompletion as well as refactoring (as you mention!);

3. allows for type hints for unboxed arithmetic, i.e. for performance.

1 & 2 are important, 3 much less so. My thinking is that if you're going for speed, you're probably going to want to do it in Java anyway. (Indeed, that I believe is one of the fallacies of Clojure: very few people are going to do performance-critical code in lisp. Having worked in e.g. a stat-arb fund, I can assert that no manager in their right mind would use anything other than tried-and-tested Java/C++/C# in that scenario.)

That's also why I eschew the idea of numerical tower in favour of Java's (inferior) numeric types. While the numerical tower is nicer, the standard statistical packages all use unboxed numbers -- what's the point of a beautiful tower edifice if you're going to chuck it out as soon as you do hard work?

-----

2 points by rocketnia 4967 days ago | link

Sorry in advance for getting carried away....

---

"I opted for the "var" notation because I wanted to emphasize that the bindings are to _variables_; not to _values_ as is the case in (some) other lisps. I wanted to leave "let" for the case when you really want a constant."

In that case, why not have let but call it "var"? :-p

It's not that there aren't good reasons to go your way, it's that I don't know them! If it helps, Racket's another lisp where programmers are encouraged to use internal (define ...) forms when possible rather than let forms.

My understanding fails in the same way when it comes to your treatment of looping. In the article you mentioned having a for loop as a primitive, and I'm fine with that up to the point that I understand defining it in terms of lambda would raise issues regarding the JVM's lack of tail calls. But why is 'for a primitive, when 'while can be the primitive and 'for can be a macro that expands to it?

You also said you wanted to give 'for support for break and continue. If you do, I kinda recommend supporting labeled break and continue (like Java does) so that they don't arbitrarily go out of scope in inner loops. But then, with labeled break and continue... why not just have 'goto as the primitive?

Er, I'll table that suggestion for now. ^^; Maybe the primitive could be an expression that supports nonlocal return from within its body. Something like Arc's 'point:

  (fn ()
    (+ "I "
       (point my-return
         (while foo
           (when pop.foo
             (my-return "found it!")))
         "didn't find it!")))
  
  
  ; Translates to JavaScript something like this:
  ; (The Java output would be slightly more verbose.)
  
  (function () {
      var temp_0 = "I ";
      var temp_1;
      point_1: while ( true ) {
          while ( truthy( v_foo ) ) {
              if ( truthy( v_pop( v_foo ) ) ) {
                  temp_1 = "found it!";
                  break point_1;
              }
         }
         temp_1 = "didn't find it!";
         break;
     }
     return plus( temp_0, temp_1 );
  })

In Arc, 'my-return is a continuation. For bullet purposes, I'm thinking of it as a locally scoped special form or macro. If that's too out-of-this-world for anyone, the break syntax could just be (goto my-return "found it!") instead, with 'goto being an additional primitive dedicated to breaking. That's right, I've actually been talking about a form of 'goto this whole time.

With 'while and 'point in the language, 'for could be implemented like so:

  (mac for (init condition step . body)
    `(do ,init
         (point break         ; anaphoric
           (while ,condition
             (point continue  ; anaphoric
               ,@body
               ,step)))))

But 'while doesn't need to be a primitive either. The full 'while can be defined in terms of (while t ...), which I call 'forever:

  (mac while (condition body)
    (w/uniq break
      `(point ,break
         (forever
           (unless ,condition
             (,break nil))     ; or (goto ,break nil)
           ,@body))))

In fact, both 'point and 'forever can be defined in terms of a single axiom, if it comes right down to it:

  (mac point (break . body)
    `(pointforever ,break
       (,break (do ,@body))))  ; or (goto ,break (do ,@body))
  
  (mac forever body
    (w/uniq break
      `(pointforever ,break
         (do ,@body))))

By the way, everyone feel free to use any of the code in this post for yourself, lol.

-----

1 point by seertaak 4967 days ago | link

> But why is 'for a primitive, when 'while can be the primitive and 'for can be a macro that expands to it?

For no better reason than I wrote the 'for primitive before I added macros!

I agree about labeled break and continue -- I specifically mention them in the article BTW. And to be honest, it has also occurred to me that this is just a local goto. So I have contemplating adding them.

The "point" approach is very cool. I like that it can translates into JVM features directly while remaining quite general (e.g. it's trivial to implement "return" using point). I think it's pretty easy to implement in the interpreter also, using exceptions (it's not going to win any speed contests), though you'd use labels and gotos when compiling.

Thanks for offering use of the code, I'm going to try it out. When I've got something worth reporting on I'll do so :)

-----

4 points by rocketnia 4966 days ago | link

You haven't responded to the "why not just have let but call it 'var'" part, so I'd like to get another idea out of the way. It's actually a reason I might like having 'var around.

For some background... in JavaScript, every variable declaration's scope is the full range of the function () { ... } block surrounding it. For instance, you can use variables "before" you declare them (in which case they're a default value, undefined). In certain cases this is pretty annoying, since it means a var inside a loop actually declares one variable that's in scope for all iterations, rather than a variable per iteration, which matters for closures that capture the variable. I just mention this to clarify what's going on.

Suppose there were a special form (varscope <var> <...body...>) which scoped (<var> <name> <val>) as a special form in the body, which both defines and assigns to a variable and returns the new value. One can say (var foo foo) to define a variable without modifying it.

(As with 'point, I'm using a locally scoped label since I have no intention for nesting to put the var form out of scope. Just as with 'point, instead of having a locally scoped special form, we could have two special forms (varscope <env-name> <...body...>) and (var <env-name> <name> <val>).)

With this kind of var form, we have the ability to define a variable closer to where it actually matters to our code, even when we want its scope to be farther away:

  (varscope var
    (var gui (window "Alert!"
               (var cries (label "I'm a dialog. Log! Dialog-log!"))
               (var ok (button "Okay..."))
    (on click ok
      (if okay-already
        dismiss.gui
        (do (var okay-already t)
            (set-text cries "Dialog? Dial? Diiial!"))))
    show.gui)
  
  ==>
  
  (with (gui nil cries nil ok nil okay-already nil)
    (= gui (window "Alert!"
             (= cries (label "I'm a dialog. Log! Dialog-log!"))
             (= ok (button "Okay..."))
    (on click ok
      (if okay-already
        dismiss.gui
        (do (= okay-already t)
            (set-text cries "Dialog? Dial? Diiial!"))))
    show.gui)

In a language with unobtrusive table lookup syntax, like Arc or JavaScript, programmers can resort to tables to accomplish the same kind of expressiveness, probably at the cost of static analyzability:

  (let o (obj)
    (= o!gui (window "Alert!"
               (= o!cries (label "I'm a dialog. Log! Dialog-log!"))
               (= o!ok (button "Okay..."))
    (on click o!ok
      (if o!okay-already
        (dismiss o!gui)
        (do (= o!okay-already t)
            (set-text o!cries "Dialog? Dial? Diiial!"))))
    (show o!gui))

-----

2 points by seertaak 4966 days ago | link

When I first read your earlier question, I thought the question was about the names rather than how you specify the scoping. In any case, I opted for a binding declaration ("var") which consists of a single or series of bindings, rather than the "let" alternative where the bindings are over an explicitly specified body. To my mind, the former approach is simpler, and avoids extra nesting merely to define a new set of variables. It's the approach of e.g. arc and python. I don't see the gain of the varscope concept over just having "var" and "set" operate on the scope that surround the declarations and child scopes. Maybe I'm not understanding something? For example, in the situation that you describe, where you want to define a variable over a "large" scope, but don't actually know what to bind it to until later: what's wrong with just setting the variable to null until you know what its true value should be?

Here's how bullet handles things:

* A variable comes into existence either through a "var" or a "set" declaration. It is an error to have consecutive "var" calls in the same scope. Assignments are much more permissive: obviously you can use any number of them in any scope, and you don't need a "var" to preceed the first instance of a "set".

* If you write "var x ..." in a scope, and again in a child scope, then there is a new binding for x in the child scope that shadows the binding in the parent scope. On the other hand, "set x ..." not create a new binding if one already exists, so it will set x in the parent scope.

* All of this is handled by Environment class, using a rather pedestrian HashMap<Symbol, Variable>, where Variable is nothing more than a box. Obviously we capture variables that escape from functions when we analyze lambda definitions.

* The Environment functionality is reused to provide python-like modules. There is a special evaluation syntax (basically, it's the same syntax as for array/map access in clojure), which coexists quite nicely with the "." syntax:

    set y 5
    var x 42

    module m
      var x 1
      func dblx ()
        set x: + y: * x 2
      func printX ()
        print "x = " x

    print "x = " x
    print "m.dblx = ": m.dblx
    m.printX
    set m.x 50
    print "m.x = " m.x

    // prints:
    // x = 42
    // m.dblx = 7
    // x = 7
    // m.x = 50

An example of its use is in the bullet.bt, where I define a mini module for string functions:

    print: str.joinOn " " "modules" "are" "cool"

By the way, I toyed with removing "var" altogether, but I think I want to keep it; it'll come in handy when I build in an object system. Basically, within an object definition, variables will be member variables and you'll be able to refer to them within member functions ("mfunc" or "meth", not sure yet) either through "this.varname" or just "varname".

Also, in bullet "with" resembles pascal's "with" except that it returns the (effected upon) object:

    print: with (new ArrayList)
      add 1
      add "foo"
      ...

I'm not sure if I've answered your question. Thanks again for your input, it's awesome getting feedback on design decisions (there are so many decisions to make!).

EDIT: Added an example of "nested" set.

-----

2 points by rocketnia 4966 days ago | link

"and avoids extra nesting merely to define a new set of variables. It's the approach of e.g. arc and python."

Arc uses an extra level of nesting, just like most Schemes and MLs do. Arc's let is different from Scheme's let, but only because it supports destructuring, does only one binding, and uses fewer parentheses.

As for Python's approach where all plain variable assignments define locals except when declared otherwise[1], it rubs me the wrong way. I'm not sure why. (Side note: Now that I've looked up Python scope, it sounds an awful lot like Kernel's mutation policy, where nonlocal variables can't be rebound without using a captured environment. I'm hopeful that this kind of scoping can make fexpr code more efficient, so... it might have similar ramifications for interpreted Python...? Fexprs in Python, scary.)

[1] http://stackoverflow.com/questions/7935966/python-overwritin...

---

"I don't see the gain of the varscope concept over just having "var" and "set" operate on the scope that surround the declarations and child scopes. [...] what's wrong with just setting the variable to null [in the outer scope] until you know what its true value should be?"

My one reason for liking var is that you don't have to trek up to the top of a variable's scope to define it. If you do still have to trek outside a child varscope, that's not a full expression of the feature.

In practice we might use the same name for every varscope, having decided that shadowing is an acceptable compromise for being able to copy a variable declaration from one part of the code to another. Given that we're choosing the same names everywhere, we might even have macros that choose them for us.

---

"It is an error to have consecutive "var" calls in the same scope."

I assume you mean multiple vars of the same name in the same scope. Otherwise I'd expect your "module m" example to fall apart. (If func doesn't expand into var, I'd rather it did.)

Whether or not that's what you mean, I don't like that error. In JavaScript, while multiple declarations of a single variable are allowed and commonly discouraged, I occasionally find myself preferring to have two "constant" variables that just so happen to have the same name. It can help emphasize that the code that uses them is almost exactly the same. (What? JS has no macros. :-p )

One consistent example of my rebellion is with loops, where I have no problem using "for ( var i = ..." for multiple loops in a single function.

---

"The Environment functionality is reused to provide python-like modules."

I like it, but I have more extreme recommendations.

I believe it's possible to implement (varscope ...) in a Scheme that has manual access to the compiler. In that language, (varscope foo ...) would compile its body in a local environment where 'foo is a macro that expands to '= and pushes the variable name to a list, and then it would embed the compiled body in a (let ...) form that bound all the listed variables to nil. A similar technique would work to implement your 'module form. (Manual access to the compiler might not be necessary if the Scheme has a suitable concept of locally scoped macros instead.)

While this in itself is nice, it would be cleaner to be able to do this without using mutation. This could be easily accomplished if there were a special compiler utility that took a tuple (varscope-label, how-to-expand-the-varscope-label, code) and returned a tuple (compiled-code, set-of-variable-names).

Keep in mind that by "compiler" I mean whatever handles the phase that expands macros. If you don't expand macros until execution time (in your "interpreter" perhaps), whoops, you've got fexprs. :-p

-----

1 point by seertaak 4966 days ago | link

> My one reason for liking var is that you don't have to trek up to the top of a variable's scope to define it. If you do still have to trek outside a child varscope, that's not a full expression of the feature.

I'm afraid I don't understand what you mean by this. Maybe you could give a short example? As I see it, one way or another you need to mark out the "outer" scope, where you declare the lifetime of the variable (even though you don't know its value yet). Then somewhere in two or more branches of that scope the variable is set and used. IIUC you propose to write "varscope" at the top, and "var" further down. Doesn't that mean that whenever you define a variable, you need to write "varscope" before it? I somehow can't believe that, which makes me think I'm still not grokking!

> I assume you mean multiple vars of the same name in the same scope.

Yes, otherwise you quickly run in trouble with parent-scope scheme :)

> One consistent example of my rebellion is with loops, where I have no problem using "for ( var i = ..." for multiple loops in a single function.

The way that's handled in bullet is that when we encounter the for, we push a new environment, run the initialization, then push (and subsequently pop -- the interpreter holds a stack of environments representing runtime frames) an environment for each iteration of the loop. So it's ok to use the same variable name in subsequent for loops: the previous instance is not alive by the time current is reached.

> If func doesn't expand into var, I'd rather it did.

Here's the definition of "func" in bullet:

    macro func (name args :rest exprs)
      qquote
        set ,name
          fn ,args ,@exprs

So, yes, it's just a var binding. Note that this allows definition of "module" functions.

    func m.foo (): print "m.foo" 
    ==>
    set m.foo: fn (): print "m.foo"

By the way, even macros are defined as a macro:

    var macro
      tfm (name args :rest body)
        qquote: set ,name: tfm ,args ,@body

Macro's don't have any functionality analogous to Lambdas to capture variables from enclosing scopes.

As it should be clear by now, my implementation is a sort of illegitimate child of fexprs and macros. Basically, I've introduced all the weaknesses of fexprs in return for only some of the gains :)

In bullet, macros are values represented by the Transform class. Their definition is almost identical to Lambdas: they hold their formal parameters and their body as an AST. The only difference to Lambdas is how they're treated for evaluation purposes by the interpreter. Transforms (like Primitives which are basically fsubrs) receive their operands unevaluated. They are expanded at runtime, and the result of the expansion is then evaluated in the lexical environment of the call site. That means you can use macros in higher-order functions; they truly are first class.

Until now, this was an artifact of my interpreter design (emphasizing getting something up and running quickly). My intention had been to double back and fix the discrepancy with "real" lisps by doing the standard initial macroexpand traversal of the AST before evaluating. Either that or ditch the interpreter and write a compiler.

However, the material you've presented me with regarding fexprs is truly fascinating (no, I didn't know what they were before I read your post). I've just got to try these fexpr style macros; the idea of just controlling evaluation of operands, but otherwise being just like a regular function is very appealing.

In conclusion, I would appreciate if you could explain the varscope concept further. Again, what I don't get is whether need to write "varscope.." before you can bind a variable using "var..". I bang on about that because I'm loathe to introduce a feature that introduces such a high overhead for a single variable use. Or do you also have "lets" that work as in scheme?

Also, you kind of lose me in the last two paragraphs. It would be help if you could in some sense "sell" your concept it to me (please!): what extra bit of power is now available, that I can't express in my implementation? Maybe by the extra-cool use, I will understand the tradeoff in terms of extra typing for a variable declaration.

-----

3 points by rocketnia 4965 days ago | link

Turns out 'varscope has an inconsistent corner case, the way I was originally thinking about it.

  (mac foo () "macro")
  (mac id-mac (x) x)
  (varscope v
    (id-mac (v foo (fn () "function")))
    (foo))

Should this result in "macro" or "function"? What if we change it up like this?

  (mac foo () "macro")
  (mac id-mac (x) x)
  (varscope v
    (id-mac (v id-mac (fn (x) nil)))
    (id-mac (v foo (fn () "function")))
    (foo))

I'd rather have 'varscope work in a compilation phase (no dependence on fexprs), and I'd rather not make the order of compilation matter, so I'm going to make a very hackish decision: The body of a (varscope ...) form should be compiled as though it put no variables in scope. Macros from the surrounding scope will work even if the local scope shadows them at run time.

By no coincidence, this design compromise is compatible with the hypothetical implementation below. I actually only realized this flaw once I was documenting that implementation. :-p

(By complete coincidence(?), this is similar to Arc 3.1's bug where local variables don't shadow macros. In the hypothetical language(s) I'm talking about, function parameters and let-bound variables would hopefully still shadow macros, so it wouldn't be quite the same.)

---

"IIUC you propose to write "varscope" at the top, and "var" further down. Doesn't that mean that whenever you define a variable, you need to write "varscope" before it?"

Close. You can have more than one variable per varscope. But I'm guessing you knew that. :-p

That means when you define a variable, you don't necessarily need to define a varscope if a suitable one already exists. But I don't expect even a single 'varscope to appear very often in code; instead I expect convenience macros to take care of it.

---

"The way that's handled in bullet ... it's ok to use the same variable name in subsequent for loops: the previous instance is not alive by the time current is reached."

That's a good example of when a macro could take care of establishing a varscope.

  (for <init> <condition> <step>
    <...body...>)
  ==>
  (varscope var
    <init>
    (while <condition>
      <...body...>
      <step>))

For the analogous case in JS (or rather a hypothetical JS-like language whose semantics are based on 'varscope), the only thing that establishes a varscope is the "function () {}" syntax. Sibling loops of the form "for ( var i = ..." use the same i because they don't establish a new scope for themselves.

---

"Or do you also have "lets" that work as in scheme?"

Yes, I would have them both. It's hard not to have 'let since it can just be defined as a macro over 'fn. Whether one would be emphasized over the other, I'm not sure.

---

"Also, you kind of lose me in the last two paragraphs."

I was talking about a compiler for varscope bodies. It'll help to back up a bit....

In a language where macros return compiled code rather than code to compile, traditional macros are simple to implement as sugar:

  (mac when (condition . body)
    `(if ,condition ,@body))
  ==>
  (def-syntax when (condition . body) gensym123_static-env
    (compile `(if ,condition (do ,@body)) gensym123_static-env))

If you find this shockingly similar to Kernel-style fexprs, hey, me too. :-p

  (mac when (condition . body)
    `(if ,condition ,@body))
  ==>
  (def-fexpr when (condition . body) gensym123_dynamic-env
    (eval `(if ,condition (do ,@body)) gensym123_dynamic-env))

IMO, the compile phase is just an fexpr eval phase whose result is used as code. Arc macros, which don't have access to the environment, are limited in the same way as pre-Kernel fexprs.

So what I'm suggesting is that in addition to 'compile, we have a second compilation function that lets us compile the body of a (varscope ...) or (module ...) form.

In fact, here's exactly how I'd use it. I'll call it 'compile-w/vars.

  ; I'm assuming 'compile-w/vars, 'def-syntax, and 'compile exist in
  ; Arc.
  ;
  ; I'm also assuming 'mc exists in Arc as an anonymous macro syntax (so
  ; that 'mc is to 'mac as 'fn is to 'def).
  ;
  ; Last but not least, I'm assuming (nocompile <code>) exists in Arc as
  ; a way to embed compiled code inside uncompiled code. When
  ; (nocompile <code>) is compiled in a static environment <env>, it
  ; should associate any free variables in <code> with variables bound
  ; in <env>. To make this happen, both 'compile-w/vars and 'compile
  ; should accept code even if it has free variables, and compiled code
  ; should be internally managed in a format that allows for this kind
  ; of augmentation.
  
  (def-syntax varscope (label . body) env
    ; In case you're not familiar, this is a destructuring use of 'let.
    (let (new-body vars)
           ; NOTE: We compile the body in the *outer* environment, not
           ; the local environment the varscope establishes.
           (compile-w/vars
             label (mc (var val)
                     `(= ,var ,val))
             `(do ,@body) env)
      (make-compiled-let (map [list _ nil] vars)

      (compile `(with ,(mappend [do `(,_ nil)] vars)
                  (nocompile ,new-body))
               env)))
  
  (def-syntax anon-module-w/var (label . body) env
    (w/uniq g-table
      (let (new-body vars)
             ; NOTE: We compile the body in the *outer* environment, not
             ; the local environment the module establishes.
             (compile-w/vars
               label (mc (var val)
                       ; Set both a variable and a table entry.
                       `(= (,g-table ',var) (= ,var ,val)))
               `(do ,@body) env)
        (compile `(with (,g-table (obj) ,@(mappend [do `(,_ nil)] vars))
                    (nocompile ,new-body)
                    ,g-table)
                 env))))
  
  ; This anaphorically binds 'var as the module's variable declaration
  ; form.
  (mac module (name . body)
    `(= ,name (anon-module-w/var var ,@body)))

As before, I release this code for anyone to use--or rather to derive actual working code from. :-p No need for credit.

---

"So, yes, it's just a var binding."

That's a var binding even though it uses 'set? I'm confused.

---

"no, I didn't know what [fexprs] were before I read your post"

What post is that?

I had a half-written reply that started with "Come to think of it, you probably do have fexprs," and went on to explain why I suspected it, what they were, and what you might get if you embraced it or rejected it. Should I still post it? It sounds like you understand it already, but it wouldn't do to have a time paradox. :-p

Anyway, since you're an fexpr fan now, I would like to emphasize the other side: The translation of my (point ...) example into imperative code is straightforward to do during a compilation phase, and fexprs get in the way of compilation phases. :)

It may be possible to force one's way through fexprs during a compilation phase too, but I expect that algorithm to look like a cross between a) constant-folding and b) static type inference with dependent types (since eval's return type depends on the input values). Rather than simply using recursion to compile subexpressions, the algorithm would do something more like using recursion together with concurrency, so that some subgoals could wait for information from other subgoals. To complicate things further, if the program uses a lot of mutable variables, the algorithm might not be able to treat them as constants, and it might not get very far unless you run it at run time as a kind of JIT.

I find this pretty intimidating myself. I've made steps toward at least the constant-folding part of this (which I expect will be sufficient for almost all fexpr programs written in a reasonable style), but I've gotten bogged down in not only the difficulty but also my own apathy about fexprs.

-----

1 point by Pauan 4965 days ago | link

"The translation of my (point ...) example into imperative code is straightforward to do during a compilation phase, and fexprs get in the way of compilation phases. :)"

Why not have both? As in, have a way of saying "this should all be done at compile-time" that doesn't involve fexprs at all, or involves a special variant of fexprs. Reminds me of a discussion somewhere about micros (as opposed to macros)...

-----

3 points by rocketnia 4965 days ago | link

"Reminds me of a discussion somewhere about micros (as opposed to macros)..."

This probably isn't what you mean, but...

One of my oldest toy languages (Jisp) had syntactic abstractions I called "micros," and I discuss them here: http://arclanguage.org/item?id=10719

tl;dr: My micros are fexprs that not only leave their arguments unevaluated but also leave them unparsed. The input to a micro is a string.

Much like how I just said macroexpansion in the compilation phase was like fexpr evaluation, what I've pursued with Penknife and Chops is like a compilation phase based on micro evaluation.

---

"Why not have both? As in, have a way of saying "this should all be done at compile-time" that doesn't involve fexprs at all, or involves a special variant of fexprs."

One way to have both is to have two fexpr evaluation phases, one of which we call the compile phase. This is staged programming straightforwardly applied to an fexpr language... and it's as easy as wrapping every top-level expression in (eval ... (current-environment)).

However, that means explicitly building all the code. If you want to call foo at the repl, you can't just say (foo a b c), you have to say (list foo a b c).

With quasiquote it's much easier for the code that builds the code to look readable. So suppose the REPL automatically wraps all your code in (eval `... (current-environment)). Entering (foo a b c) will do the expected thing, and we can say (foo a ,(bar q) c) if we want (bar q) to evaluate at compile time.

Now let's fall into an abyss. Say the REPL automatically detects the number of unquote levels we use in a command, and for each level, it wraps our code in (eval `... (current-environment)) to balance it. Now (foo a b c) will do the expected thing because it's wrapped 0 times, (foo a ,(bar q) c) will do the expected thing because it's wrapped once, and so on. We have as many compile phases as we need.

The price is one reserved word: unquote. This would be the one "special variant of fexprs."

-----

1 point by rocketnia 4963 days ago | link

"The price is one reserved word: unquote. This would be the one "special variant of fexprs.""

Possible correction: If any kind of 'quasiquote is ever going to be in the language, it should probably have special treatment so that its own unquotes nest properly with the REPL's meaning of unquote. An alternative is to use a different syntax for non-REPL unquotes (e.g. ~ instead of ,).

Also note that ,foo could be a built-in syntax that doesn't desugar to anything at all (not even using the "unquote" name), instead just causing phase separation in a way that's easy to explain to people who understand 'quasiquote.

-----

1 point by Pauan 4963 days ago | link

"Also note that ,foo could be a built-in syntax that doesn't desugar to anything at all (not even using the "unquote" name), instead just causing phase separation in a way that's easy to explain to people who understand 'quasiquote."

Yes, I currently think that all syntax should be at the reader level, rather than trying to use macros to define syntax. As an example of what I'm talking about, in Nu, [a b c] expands into (square-brackets a b c) letting you easily change the meaning of [...] by redefining the square-brackets macro.

Or the fact that 'foo expands into (quote foo) letting you change the meaning of the quote operator... or the fact that `(foo ,bar) expands into (quasiquote (foo (unquote bar))), etc.

I used to think that was great: hey look I can easily change the meaning of the square bracket syntax! But now I think it's bad. I have both conceptual and practical reasons for thinking this.

---

I'll start with the conceptual problems. In Lisp, there's essentially three major "phases": read-time, compile-time, and run-time. At read-time Lisp will take a stream of characters and convert it into a data structure (often a cons cell or symbol), compile-time is where macros live, and run-time is where eval happens.

Okay, so, when people try to treat macros as the same as functions, it causes problems because they operate at different phase levels, and I think the same exact thing happens when you try to mix read-time and compile-time phases.

---

To discuss those problems, let's talk about practicality. quasiquote in particular is egregiously bad, so I'll be focusing primarily on it, though quasisyntax also suffers from the exact same problems. Consider this:

  `(,foo . ,bar)

You would expect that to be the same as (cons foo bar), but instead it's equivalent to (list foo 'unquote 'bar). And here's why. The above expression is changed into the following at read-time:

  (quasiquote ((unquote foo) . (unquote bar)))

And as you should know, the . indicates a cons cell, which means that the above is equivalent to this:

  (quasiquote ((unquote foo) unquote bar))

Oops. This has caused practical problems for me when writing macros in Arc.

---

Another problem with this approach is that you're hardcoding symbols, which is inherently unhygienic and creates inconsistent situations that can trip up programmers. Consider this:

  `(,foo (unquote ,bar))

You might expect that to result in the list (list foo (list 'unquote bar)) but instead it results in the list (list foo bar), because the symbol unquote is hardcoded.

---

Yet another problem is that it requires you to memorize all the hard-coded names for all the syntax. You have to remember to never define a function/macro called quote. To never define a function/macro called unquote, to never define a function/macro called square-brackets, etc... which means this will break:

  ; oops, redefined the meaning of the quote syntax
  (let quote ...
    'foo)

When the number of syntax is small, that's not really a high price to pay, but it is still a price.

---

Also, this whole "read syntax expands into macros" thing is also inconsistent with other syntax. For instance, (1 2 3) isn't expanded by the reader into (list 1 2 3). That is, if you redefine the list function, the meaning of the syntax (1 2 3) doesn't change. But if you redefine the quote macro, then suddenly the syntax 'foo is different.

The same goes for strings. Arc doesn't expand "foo" into (string #\f #\o #\o) either. So redefining the string function doesn't change the meaning of the string syntax. So why are we doing this for only some syntax but not others?

---

All of the above problems go away completely when you just realize that read-time is a separate phase from compile-time. So if you want to change the meaning of the syntax 'foo the solution isn't to redefine the quote macro. The solution is to use a facility designed for dealing with syntax (such as reader macros).

This is just like how we separate compile-time from run-time: you use functions to define run-time stuff, macros to define compile-time stuff, and reader macros to define read-time stuff.

This also means that because the only way to change the syntax is via reader macros (or similar), the language designer is encouraged to provide a really slick, simple, easy-to-use system for extending the syntax, rather than awful kludgy reader macros.

-----

1 point by Pauan 4965 days ago | link

"then push (and subsequently pop -- the interpreter holds a stack of environments representing runtime frames)"

Uh oh, my warning bells went off. If I were you, I'd put some unit tests that verify that closures work properly. In particular, this might very well break in bullet (though I won't know without testing it):

  (def foo (x)
    (fn () x))

  ((foo 4)) -> 4

---

"They are expanded at runtime, and the result of the expansion is then evaluated in the lexical environment of the call site. That means you can use macros in higher-order functions; they truly are first class."

Ewww, runtime macros. I do not like. They combine all the awfulness of macros[1] without any of the benefits of fexprs[1], while also giving up the only benefit macros have[1]. The worst of all worlds, in my opinion.

---

"My intention had been to double back and fix the discrepancy with "real" lisps by doing the standard initial macroexpand traversal of the AST before evaluating."

Good. I think Lisps should either embrace macros (warts and non-first-classness included), or embrace fexprs and dump macros since they're not needed and just get in the way. Naturally, I'm in favor of fexprs unless speed is critical, and even then I'd prefer to just make the interpreter faster rather than dump the elegance of fexprs.

---

"I've just got to try these fexpr style macros; the idea of just controlling evaluation of operands, but otherwise being just like a regular function is very appealing."

It sure is! An example of a very beautiful Lisp that uses fexprs at its very core is Kernel (though it calls them operatives and uses the $vau form to create them):

http://web.cs.wpi.edu/~jshutt/kernel.html

http://www.wpi.edu/Pubs/ETD/Available/etd-090110-124904/unre...

ftp://ftp.cs.wpi.edu/pub/techreports/pdf/05-07.pdf

There are other Lisps that use fexprs (or at least things similar to fexprs) as well, such as Picolisp and newLISP (which erroneously calls them macros), but I'm especially fond of Kernel (for many reasons), but in part due to its static (lexical) scope.

---

* [1]: I'm only slightly exaggerating... but in all seriousness, first-classness is only one of the (multiple) benefits of fexprs, and even with first-class macros, you still need to worry about hygiene, which is basically a non-issue in Kernel (that is to say, in Kernel, hygiene is so incredibly easy to achieve that it naturally happens, because the language is so incredibly well designed, so I consider this a mostly "solved problem" in Kernel).

Plus, I suspect if you're basically macro-expanding macros at runtime, you'd actually get slightly faster speed with fexprs (not that speed is a huge issue, but it can be an issue, depending on what you want to do, so I mention it for completeness and because I have a personal interest in making powerful things go fast).

As far as I can tell, the only real benefit of macros is that they're always preprocessed, so they only need to macro-expand once. That is also why they're non-first-class.

I suppose a minor benefit is it allows you to treat macros as basically a template facility, but I find that benefit to be dubious at best, especially since it's so easy to use templating facilities in fexpr (or define your own).

Another minor benefit is that you can macro-expand a macro to do things like code walkers, but... I feel that should be part of a debugger/inspection suite or something.

---

Just to make sure you don't feel like I'm railing on you: it seems to me that you were unaware of fexprs when you designed bullet, hence why bullet has macros rather than fexprs. That's totally fine, I understand. I'm mentioning all these things not only for your benefit, but also anybody else who might stumble along and read this post.

-----

2 points by seertaak 4965 days ago | link

> In particular, this might very well break in bullet

It works:

    func foo (x): fn () x
    print ((foo 4)) // prints 4

I explained incorrectly: the interpreter env stack is basically a stack of bindings representing both "true" locals (i.e. locals on JVM) and environments representing lexical scopes. The latter are held for instance by functions, macros, modules explicitly, and also get implicitly created as required in e.g. looping primitives.

I'll reply to your other points tomorrow morning! (basically, I agree :))

-----

1 point by Pauan 4965 days ago | link

Nice! So lexical environments do form a proper tree and persist even after the outer function has returned? If so, then that shouldn't be a problem.

-----

1 point by Pauan 4966 days ago | link

"whoops, you've got fexprs. :-p"

Not if said "fexprs" don't have access to the dynamic environment, though!

-----

1 point by rocketnia 4966 days ago | link

"I agree about labeled break and continue -- I specifically mention them in the article BTW."

Oh, so you do! XD

I'm surprised and glad you understood the 'point stuff the first time I described it, lol.

-----

2 points by akkartik 4968 days ago | link

  mklist +
         -
         /

I don't see why this needs the 4-indent. Wouldn't bullet simply translate '(-)' to '-' like it does for '(1)'?

Perhaps this is a better example of the problem?

  mkList +
      - /

which would expand to (mkList + (- /)) without the 4-indent.

I have a slightly different rule at http://github.com/akkartik/wart#readme (search for 'one exception').

Again, I think my approach is better :) (I hope you find me saying that more useful than the cantankerous lisp veteran who says "noobs! Stop messing with my parens!")

-----

1 point by seertaak 4968 days ago | link

Well spotted; you've just waded into the dark corners of _bullet_ :)

The reason the 2-indent would evaluate (i.e. result in an invocation) is that '+', '-', etc. are _primitives_, which behave semantically like _functions_. For all other objects, the (1) rule applies.

Do you think this is too convoluted?

Alternatives I see:

1. The obvious is not to allow (obj-thing) ==> obj-thing. But then we'd need to define (func val (x) x) so that (e.g.):

     print: if (is-saviour name)
       val "go frodo"
       val "don't know you"

Otherwise, we'd have an error.

2. Make (fn-or-prim) ==> fn-or-prim (i.e. unifiy treatment of functions and objects, and in addition. To force evaluation of a 0-arity function, allow:

    save-middle-earth ()  // <-- () causes invocation.

What do you think?

-----

3 points by rocketnia 4967 days ago | link

(Psst, to get italics, use asterisks, not underscores. http://arclanguage.org/formatdoc)

Option 2 looks particularly troublesome to me. A veteran lisp user seeing "save-middle-earth ()" will probably guess it means "(save-middle-earth ())", with () being the empty list. Do you have () in bullet?

It might be less confusing to use dot syntax instead, like this...

  save-middle-earth.

...because that looks a bit like "foo.bar" syntax with no argument. Alternatitvely, you could go with a syntax that stands on its own rather than borrowing existing connotations:

  save-middle-earth!

However, I'm not sure any of these nullary call syntaxes gives you a clear translation to s-expressions. If "save-middle-earth." just becomes "((save-middle-earth))" and does nothing twice, that doesn't help. ^_^; (On the other hand, maybe you could make this work by treating ((x)) as a special case in the s-expression semantics.)

Here's a third (fourth? fifth?) option: Define (func call (f :rest args) (apply f args)).

  call save-middle-earth

-----

1 point by seertaak 4967 days ago | link

Thanks for the tip about the formatting!

No, currently attempting to evaluate () is an error (and a bad crash with no error message, I'm embarrassed to say! [but now fixed and pushed]).

Your argument regarding () is pretty convincing. It's a bad idea. But I find the full-stop notation a bit jarring, not sure why. I'll try it and see if I get used to it... And I didn't want to tread on the "!" real-estate, given that's it's commonly used in identifiers.

I have to say I still lean slightly towards preferring a default of invocation for 0-arity functions, and using the "val" (again, id) function for passing the function itself.

Maybe the best approach is to offer a define-syntax functionality which somehow corrects the anomality (by visiting the AST and reducing without evaluating) before applying the macro transform. But that's quite possibly another can of worms...

-----

1 point by seertaak 4967 days ago | link

Having slept on this, I think I'm going to remove the (obj) ==> obj reduction. I think the gain is not enough to compensate the loss of transparency.

It's not even that big of a loss, since you can write:

    if (condition)
        true-clause   // 4-indent.
        false-clause

-----

1 point by akkartik 4968 days ago | link

"One of the beauties of lisp is that the prefix notation is simple, making the grammar almost trivial to parse. Bullet's reader has to work a little harder, needing one token of look-ahead to be able to handle its infix operators.."

I chose to make the tokenizer oblivious to infix syntax. The only delimiters it recognizes are quotes, unquote (comma), parens and whitespace. The infix operators[1] are implemented in a later phase; it can transform each token without needing any context around it.

http://github.com/akkartik/wart/blob/e616c86cf0/009ssyntax.c...

Hmm, when I look at it now I realize that I could do this earlier in the pipeline. I could do this transformation before I construct the cons cells for expressions, and it'd probably be more efficient.

I'd love to chat more about the implementation details since it seems we've dealt with a lot of the same problems. Talking to you about it has already given me two ideas for improvements, thanks.

[1] Arc calls them ssyntax; I think the name is intended to line up with s-expressions.

-----

4 points by seertaak 4968 days ago | link

The pipeline in bullet is:

1. line processor: text lines are surrounded by brackets (textually!) where appropriated (i.e. 2-indents subject to nesting rules). Infix operators aren't touched. Result is stored in an array of CodeFragment, which is basically text + source code location.

2. reader: the forms are "read" and turned into B-expressions. You could call this the parsing stage, but lisp is a bit strange in some sense: there are two levels of parsing: the reader is the first, and the second is in the...

3. interpreter/compiler: every lisp has a starting phase which takes the "low-level" S-expressions comprising the code and parses it into the true semantic expression of the language: e.g. IfExpr, LambdaExpr, FnApplyExpr, etc.

Indeed, I think this is what makes lisp unique: all other languages have parsers which take you directly to step 3. The AST for semantic analysis purposes trees is basically just the parse tree. In lisp, you have to do work to get the AST -- but that's also why you get flexibility in macro writing.

I'd be happy to exchange implementation details. As I mention in the article, my code resides at

http://code.google.com/p/bullet-lang

My email is

martin at percossi dot com

feel free to email me if you have questions. It's nice to know other people are working on/thinking about the same problems! :)

-----

1 point by akkartik 4968 days ago | link

  var name "Martin"
  cond
    (name equals "Martin") (print "You are a fool.")
    (name equals "Frodo")  (print "You like saved everyone.")
    (name equals "Alex")   (print "You rock.")

"..somewhat less ugly from what they would be in regular lisp owing to one set of parentheses being removed in bullet w.r.t lisp, but parentheses still abound.."

There's more to aesthetics than sheer number of parens. I would argue that lisp's cond is ugly primarily because it puts conditions and actions on a single line.

Here's how I write it in wart:

  (if
    (iso name "Martin")
      prn "fool"
    (iso name "Frodo")
      prn "savior"
    (iso name "Alex")
      prn "rocks"
    :else
      prn "who knows?")

I think this looks better even though it has more parens.

-----

1 point by rocketnia 4967 days ago | link

"I would argue that lisp's cond is ugly primarily because it puts conditions and actions on a single line."

I like single-line cases. I might argue 'cond is ugly because any language with 'cond is probably verbose enough to force the case and consequence onto separate lines. :-p (Clearly bullet could be an exception to that.)

What's especially ugly about 'cond is that, once the condition and consequence are on separate lines, there's an indentation issue. I don't like indenting by just one space or indenting past an already matched paren, so my code would look like this, probably unlike anyone else's code:

  (cond
    ( (...case...)
      (...consequence...))
    ( (...case...)
      (...consequence...)))

(Actually, my code would probably define a macro (ifs ...) that worked like Arc's 'if, but that's cheating.)

-----

1 point by akkartik 4967 days ago | link

"I like single-line cases. 'cond is ugly because.. the case and consequence onto separate lines."

You know, you're right. I change my mind. The major problem isn't merging lines, it is how to keep things clear when both the case and consequence are long. Just using indent to distinguish them is a poor solution because we use indent to group things together everywhere else.

To see how bad things can get, this definition of markdown is an eyesore: http://github.com/nex3/arc/blob/fe2c9eb2d6/lib/app.arc. Wart doesn't really improve matters except to add the :else to distinguish the two consequences right at the end.

-----

1 point by seertaak 4968 days ago | link

That quote was meant to be a bit flippant :)

You can break the line accross by using "begin" in _bullet_:

    cond
      name equals "Martin" ; begin
        print "what a"
        print "fool"
      name equals "Frodo" ; begin
        print "what a"
        print "saviour"
      name equals "Alex" ; begin
        print "err..."
        print "who knows?"

-----

1 point by akkartik 4968 days ago | link

Very cool indeed.

"A form containing a single object evaluates to the object, which means that (1) ==> 1. Even without this added alteration to standard lisp evaluation of forms, we simply write val 1 instead of 1 above, val being the identity function"

Here's how I describe the rule at http://github.com/akkartik/wart#readme:

"Lines with a single token are never wrapped in parens."

Hmm, though it's not quite accurate. Parens, quotes and unquotes don't count as tokens for the purposes of this rule, so as to do right by lines like this:

  34)))

I should just stop using the jargon-y 'tokens' and say 'words' to clarify this.

-----

1 point by seertaak 4968 days ago | link

Funny; we really are thinking about the same problems. bullet will currently puke if it encounters a lonely ')'. It's definitely sub-optimal. I think your solution of looking for words is the right way to go -- I didn't get around to implementing it though, because at some point I waded neck-deep into semantic analysis, code generation, ASM, invoke dynamic. Compilers really are fascinating creatures :)

-----

1 point by akkartik 4968 days ago | link

When I built my lisp-without-parens I made the conscious design choice that traditional s-expressions should just work. This way I can make s-expressions more accessible to newcomers without alienating veterans. It's not clear from the article: does bullet have this property?

-----

2 points by seertaak 4968 days ago | link

Yes, and it's not very difficult to get the behaviour:

    begin
         // 4-indent, so no parens: back in lisp world.
         (if (prefers-s-expresssions user)
           (print "bullet sucks")
           (print "why are you doing this to yourself??"))

-----

1 point by akkartik 4968 days ago | link

How do you call functions with zero arguments if (foo) is always equivalent to foo?

-----

1 point by seertaak 4968 days ago | link

Foo will get invoked because it's a function. In _bullet_ functions and primitives are invoked when "solo" on a line. Objects, on the other hand, aren't invoked.

-----

1 point by seertaak 4968 days ago | link

And just to be clear: in bullet, everything that's not a primitive or a function is an object :)

-----