Arc Forum | More on compiling Arc to JavaScript and an Arc-like DSL for JavaScript

Arc Forum

More on compiling Arc to JavaScript and an Arc-like DSL for JavaScript

5 points by evanrmurphy 5434 days ago | 21 comments

I've written in the past about compiling Arc to JavaScript and shared a compiler I'd been working on to do the job [1, 2, 3, 4]. Soon afterward I stopped working on the project, stumped by some issues I couldn't resolve. I've been thinking about this project again a lot recently and have some new ideas I'd like to share.

The biggest issues stumping me when I last looked at this project had to do with the gap between idiomatic Arc and idiomatic JavaScript. For example, we love to nest function calls in Arc because they're tail-call optimized and the syntax makes it so pleasant. In JavaScript, on the other hand, we need to go a little easy on the nested function calls (since there is no TCO), but object method chaining is huge.

fallintothis observed that my project "[wasn't] just an Arc-to-Javascript compiler, but also a DSL for Javascript in Arc." [5] I wrestled with that duality for some time, but recently realized that making a good JavaScript DSL was my top priority, with the compiler just serving as the means to that end. So, I'm now in the mindset of designing a little language that compiles to JavaScript and that, while inspired by pg's Arc, makes no attempt to be faithful to it. What I'd like to end up with is something not unlike CoffeeScript [6], only s-expression based and with macros.

I've started by stripping Arc of all its ssyntax, with those characters in mind for a different ssyntax layer better suited to writing idiomatic JavaScript. I think that in the long run, a well-thought out new ssyntax will be the answer. For the moment though, it's kinda neat what you can do just with the superset of Arc's valid symbols that comes from eliminating the ssyntax (and doing a bit of reader hacking, hence my recent question about this [7]):

  ; Arc-like expressions      ; Compiled JavaScript output

  ({ a 1  b 2  c 3)           {a: 1, b: 2, c: 3}

  ([ 1 2 3)                   [1, 2, 3]

  (. ($ "body")               $("p.neat")
     (addClass "ohmy")          .addClass("ohmy")
     (show "slow"))             .show("slow");

  ; not sure about this one
  (=> anArray 'someKey)       anArray['someKey']

Here are the ssyntax possibilities I'm most attracted to. They corresponding to the four above examples, respectively:

  {a 1  b 2  c 3}  

  [1 2 3]          

  ($ "body")       
    .(addClass "ohmy")
    .(show "slow"))   

  anArray['someKey]

Of course, lots of the language can remain like Arc, because Arc has some really great operators that work fine in JavaScript:

  (fn () foo)                     function() { return foo; }

  (def bar ()                     var bar = function() {
    (= pet "dog")                   var pet = "dog";
    (alert                          return alert("my pet is a " + pet);
      (+ "my pet is a " pet)))    };

  (if a b                         (function() {
        c d                         if (a) {
          e)                          return b;
                                    else if (c) {
                                      return d;
                                    } else {
                                      return e;
                                    }
                                  }).call(this);

  (do x y z)                      (function() {
                                     x;
                                     y;
                                     return z;
                                  }).call(this);

  (let x 5                        (function() {
    (alert                          var x = 5;
      (+ "x is " x)))               return alert("x is " + x); 
                                  }).call(this);

This one is pretty insignificant but for some reason I like it. The ternary operator macro could be a synonym for if, functionally speaking (since we probably want to break down JavaScript's statement/expression dichotomy), except that it has the different flavor of JS output:

  (?: a b                         (a ? b
        c d                          : (c ? d
          e)                              : e))

One big break from the previous compiler implementation is that we focus on JavaScript's data types, not Arc's. That means having arrays, objects, true, false and null (and undefined?) instead of conses, hashes, t and nil.

Ah, I need to wrap this up. I know I've left out a lot of details here, but I'd love to get feedback on what there is so far.

---

References

[1] "Try Arc->Javascript": http://arclanguage.org/item?id=12156

[2] "JavaScript Compiler, js.arc (w/ arc.js)": http://arclanguage.org/item?id=11918

[3] "Poll: Best representation for JS dot notation": http://arclanguage.org/item?id=12079

[4] "Ask HN: Designing a Parenscript alternative": http://news.ycombinator.com/item?id=1949275

[5] Comment on [1]: http://arclanguage.org/item?id=12165

[6] CoffeeScript: http://jashkenas.github.com/coffee-script/

[7] "Extend Arc's ssyntax?": http://arclanguage.org/item?id=12866

---

Relevant but not mentioned:

* "Ask: html and javascript in arc": http://arclanguage.org/item?id=12421

* Parenscript: http://common-lisp.net/project/parenscript/

* SibilantJS: http://sibilantjs.info/

* lisp.js: http://lisp-js.posterous.com/lispjs-a-lisp-for-nodejs-0

3 points by d0m 5430 days ago | link

Evan, I find that project really interesting as it opens Arc to new perspectives usable in the "real world".

When you give Javascript examples, you seem to care a lot about jQuery (with good reasons, of course). Wouldn't it be possible to separate the Arc-that-compile-in-js and jQuery?

For instance:

  (def foo ()
    (alert "test"))

  $('.click_me').click(foo);

Just to be clear, you could use Arc functions as callback with jQuery.. but jQuery will keep its javascript syntax.

Ideally, that would be also possible:

  $('.click_me').click((fn () (alert 'test')));

And

  $('.click_me').click([alert "test"]);

However, we have some problems such as:

  ; How to differentiate betweens javascript syntax or arc/js syntax?
  $.each([1,2,3], [alert _]);

To fix that, it might be possible to separate Arc expressions with a delimiter or something when it is used inside jQuery:

  $.each([1,2,3], ~[alert _]);

  $('.click_me').click( ~(fn () (alert "test")));

  $('.click_me').click( ~[alert 'test]);

  $('click_me').click( function(e) {
    ~(alert "test in js/arc") 
    alert("test in pure js");
  });

  $('.click_me').keydown( ~[here, _ would be bound to the event object]);

And just as a matter of preference, I find:

  ({} a b c d) -> js: {a: 'b', c: 'd'}

Better than the single { equivalent. (Not to mention that it would screw up all editors :p)

  ([] 1 2 3) -> [1,2,3]

So you could do:

  (([] 10 12 14) 1) -> 12

Even if you disagree with everything I said here, I really wish you could go forward with that project :)

-----

1 point by evanrmurphy 5430 days ago | link

> Wouldn't it be possible to separate the Arc-that-compile-in-js and jQuery?

I'd never thought of this before. I especially like your examples with the delimiter for switching between syntaxes. CoffeeScript provides one of those (http://jashkenas.github.com/coffee-script/#embedded), and I think you're right that it'd be nice to have.

> I find: ({} a b c d) -> js: {a: 'b', c: 'd'} Better than the single { equivalent.

([ 1 2 3) and ({ a 1 b 2) are kind of an abuse of s-expressions. I think the fact that I want them suggests I should be going all the way to [1 2 3] and {a 1 b 2}, without the parens at all. If you have to delimit them with parens, I agree [] and {} have advantages.

(Not to mention that it would screw up all editors :p)

Haha, yes it would. I finally had to turn paredit off in emacs when I was writing these expressions.

Thanks for your input, d0m. :)

-----

1 point by evanrmurphy 5433 days ago | link

There's a thread from over a year ago about table literal syntax for Arc that has helped me with some of my thinking here [1]. These are some examples of how object literal ssyntax might work in the JavaScript DSL, especially in the context of quotation:

  (= a 'foo)            a = 'foo';
  {a a}                 {a: a}  // => {a: 'foo'}
  {a 'a}                {a: 'a'}
  {'a 'a}               {'a': 'a'}
  '{a a}                '{a: a}'

  {a (- 5 4)}           {a: 5 - 4}
  {a '(- 5 4)}          {a: '5 - 4'}
  '{a (- 5 4)}          '{a: 5 - 4}'
  '{a '(- 5 4)}         '{a: \'5 - 4\'}'

And here are some array examples:

  [1 2 3]               [1, 2, 3]

  (= a 'foo)            a = 'foo';
  [a 2 3]               [a, 2, 3] // => ['foo', 2, 3]
  ['a 2 3]              ['a', 2, 3]
  '[a 2 3]              '[a, 2, 3]'
  '['a 2 3]             '[\'a\', 2, 3]'

I guess you could support quasi-quotation like this (though I don't know if you'd want to):

  (= a 'foo)            a = 'foo';
  `[,a 2 3]             '[' + a + ', 2, 3]'

But it will definitely work in the context of macros, where the quasi-quotation is transformed away before it hits JavaScript compilation:

  (mac array (x y z)
    `[,x ,y ,z])

  (array 1 2 3)

  ; expands to...
  
  [1 2 3]                [1, 2, 3]

Here's a much more flexible array macro:

  (mac array elts
    `[,@elts])

  (array 'foo 'bar 
         'baz 'quux)

  ; expands to...
  
  ['foo 'bar             ['foo', 'bar',
   'baz 'quux]            'baz', 'quux']

---

[1] "hmm, how about '#table(a 1 b 2)' for a literal table syntax?": http://arclanguage.org/item?id=10678

-----

1 point by evanrmurphy 5430 days ago | link

I've been experimenting with this project in a fork of SibilantJS:

https://github.com/evanrmurphy/sibilant

I think this group would appreciate my commit log: "s/defun/def/g", "s/progn/do/g", etc.

-----

1 point by hasenj 5434 days ago | link

Nice, I like the dot notation.

But why does it have to depart from arc's syntax? What's wrong with:

  (chain  ($ "body") (addClass "wow") (show "slow"))

Also, does = inside a def creates a local variable?

-----

1 point by evanrmurphy 5433 days ago | link

> But why does it have to depart from arc's syntax?

The problem arises when you try to make use of Arc's dot ssyntax. Here are a couple jQuery examples using the old ArcScript, which is more ssyntactically faithful to Arc:

  ($.document.`ready (fn ()              $(document).ready(function() {
    ($!a.`click (fn ()                     $("a").click(function() {
      (alert "Hello world!")))))             alert("Hello world!");
                                           });
                                         });


  ($.document.`ready (fn ()              $(document).ready(function() {
    ((($ "#orderedlist")                   $("#orderedlist")
        `addClass) "red")))                  .addClass("red");
                                         });

While you can tell that the left side is supposed to be jQuery, it's completely non-obvious how the syntax transformation works. It takes too much effort every time you see the dot to remember that it has nothing to do with JavaScript's dot operator, but rather means:

  $.document => ($ document) => $(document)

You could just resolve not to use Arc's dot ssyntax because of the confusion with JavaScript's dot operator, but now you have this great unused character you can decide what to do with. Since this is a DSL for JavaScript, and JS's dot operator plays such a key role in idiomatic JavaScript, why not let Arc's dot compile to it?

> Also, does = inside a def creates a local variable?

This depends on whether you want a DSL that's really close to the metal of JavaScript, or one that makes some nice abstractions for you, like always guaranteeing lexical scope or making all statements valid expressions. I can see the utility of each. Maybe they could both be available as different modes you can toggle:

  ; Metal mode

  (def bar ()                     
    (var= pet "dog")                       
    (return                                 
      (alert                                ; Both compile to this JavaScript        
        (+ "my pet is a " pet)))            
                                            var bar = function() {
  ; Abstraction mode                          var pet = "dog";
                                              return alert("my pet is a " + pet);
  (def bar ()                               };
    (= pet "dog")                 
    (alert                        
      (+ "my pet is a " pet)))

-----

1 point by rocketnia 5433 days ago | link

With "var bar = function() { ... };", I bet you run into self-recursion issues. Unless I'm mistaken, the variable "bar" isn't in the scope of the closure.

At the very least, (def bar ...) and (def foo ...) won't be corecursive, since foo isn't in bar's scope. That in particular is what the "function bar() { ... }" syntax is for. ^_^ I believe that kind of statement assigns to a variable that's implicitly declared at the top of the current scope, meaning it's visible to itself and all the other functions declared that way, without the need for a "var bar, foo; bar = ...; foo = ...;" idiom.

makes some nice abstractions for you, like always guaranteeing lexical scope

For me, that's one of those "why would I ever want the language to do this for me" things. ^_- Maybe this means I'm a Blub programmer, but I prefer explicitly declaring the new scope's variables rather than explicitly declaring which variables I take from the outer scope.

Declaring only globals is the worst, 'cause it implies not being able to shadow a local with another local of the same name. Ever since 'it, '_, and 'self, I've been accustomed to using the same variable names over and over, even in nesting scopes. Shame on me, but still. :-p

You can probably disregard most of this rant, 'cause it's probably just as annoying either way. XD Maybe someone else has an opinion though.

-----

1 point by evanrmurphy 5433 days ago | link

> With "var bar = function() { ... };", I bet you run into self-recursion issues. Unless I'm mistaken, the variable "bar" isn't in the scope of the closure.

You're right, thanks for the correction. There is an alternative to the "function bar() { ... }" syntax that allows for recursion, though:

  var bar;
  bar = function() {
    ...
  };

This is how CoffeeScript compiles function definitions, and it's what I meant to write. I'm not really sure yet about all the advantages and disadvantages when compared to the other format.

> You can probably disregard most of this rant, 'cause it's probably just as annoying either way.

Your rants are always welcome here! ^_^

---

Update: Sorry, rocketnia. Reading your comment more carefully I see you already talked about this:

> I believe that kind of statement assigns to a variable that's implicitly declared at the top of the current scope, meaning it's visible to itself and all the other functions declared that way, without the need for a "var bar, foo; bar = ...; foo = ...;" idiom.

I guess whether you use the explicit var declaration is largely a matter of taste. Declaring all variables explicitly at the top of the scope more transparent, but less concise. I suppose since CoffeeScript tends to declare other variables at the top of the scope, including functions variables in that group allows for a certain consistency.

-----

1 point by evanrmurphy 5423 days ago | link

> With "var bar = function() { ... };", I bet you run into self-recursion issues. Unless I'm mistaken, the variable "bar" isn't in the scope of the closure. At the very least, (def bar ...) and (def foo ...) won't be corecursive, since foo isn't in bar's scope.

Actually, both of these cases are working fine for me with the ``var bar = function() {...};`` format. This is testing in Chrome 5 on Linux:

  // recursion

  var foo = function(x) { 
    if (x == 0) 
      return "done"; 
    else 
      return foo(x - 1); 
  };

  foo(5)
  // => "done" 

  // corecursion

  var foo = function(x) { 
    if (x == 0) 
      return "done"; 
    else 
      return bar(x - 1); 
  };

  var bar = function(x) { 
    if (x == 0) 
      return "done"; 
    else 
      return foo(x - 1); 
  };

  foo(5)
  // => "done"
  bar(5)
  // => "done"

Am I missing something here?

-----

1 point by rocketnia 5423 days ago | link

Actually, you're right. ^^; In JavaScript, all variables declared in a function are visible throughout the function, even before the declaration statement.

  var foo = 4;
  
  (function(){
    alert( foo );  // "undefined"
    var foo = 2;
  })();

My mistake. ^_^;

-----

1 point by evanrmurphy 5433 days ago | link

> why would I ever want the language to do this for me

Hmm... so would you also prefer that return statements never be implicit?

I'd like to design the bottom layer operators of this compiler to have a one-to-one correspondence with JavaScript and zero automagic. This means explicit var declarations, explicit returns and an embrace of the expression/statement dichotomy. Then, we should be able to layer on top all desired forms of automagic using macros so that programmers can opt-in or opt-out as they like.

Or is it really naive to think it could work out this way? :P

-----

2 points by rocketnia 5433 days ago | link

Nah, that comment was only about function scope. I'm saying I really prefer the kind of scoping Arc and JavaScript both have, where variables shadow each other and closed-over variables don't need to be declared (but local ones do). I can't put my finger on why... but wait, pg can. :D

http://www.paulgraham.com/arclessons.html

Skip to "Implicit local variables conflict with macros."

As for implicit return statements, I don't know what you expect me to be, but I'm for them. ^_^ I think you can always say "return undefined;" in JavaScript to accomplish the same thing as leaving out the return statement, so in a way, leaving the return statement out is just syntactic sugar.

I'm for explicit return statements too, for the purposes of short-circuiting. However, they probably won't play well with macros, for the same reason as pg encountered with implicit local variables: Lexical scope boundaries might pop up in unexpected places and change the meaning of "return;".

> Or is it really naive to think it could work out this way? :P

Well, I do have lots of suggestions. They go pretty deep into the fundamental design too. That said, they sort of lead to not programming in Arc. You'll see what I mean in a second (unless I go scatterbrained on ya again).

I think it would be best to treat this as a combinator library. You can have utilities that build not only JavaScript expressions and statements,* but also other intermediate forms, like lists of statements. I wouldn't be surprised to see http://en.wikipedia.org/wiki/Control_flow_graph pop up at some point. As the combinator library goes on, it can introduce more convenient construction syntaxes, including full DSLs.

If this is beginning to sound similar to the topic I just posted, "A Language Should Target Moving Platforms"[1], that's because it is. We're talking about generating JavaScript code. It's very closely related to the topic of generating code in general. :-p

I was developing Penknife not too long ago. It's a language much like Arc, but what's important here is that its expressions are compiled and macro-expanded to an intermediate format first--i.e. parsed into an AST--and it's easy to make closures whose ASTs can be inspected. My plan is for the Penknife AST itself to be extensible using user-defined types, and my goal is for non-Penknife ASTs to be made using the same syntax technology used elsewhere in Penknife, right down to using the same compiler.

So yeah, this is my apprach to a JavaScript DSL. ^_^ I don't expect you to go adopt the whole approach, but if you do want to pursue it, or if there's some aspect of it that would fit into your design too, be my guest.

* Fine-grained control is fine by me. :-p So is automagic. I want it all!

[1] http://arclanguage.org/item?id=12939

-----

1 point by akkartik 5433 days ago | link

"Lexical scope boundaries might pop up in unexpected places and change the meaning of "return;"."

I couldn't believe this, but I looked around and yes, it's true, "..the scope container in javascript is a function." (http://mark-story.com/posts/view/picking-up-javascript-closu...)

That seems bad. Don't create a new function in js everytime I do a let in arc. But I don't know if you can avoid doing so.

Perhaps one way to do it would be to play a little fast and loose with semantics. Assume let translates to the scope of the containing function.

-----

1 point by evanrmurphy 5433 days ago | link

> I couldn't believe this, but I looked around and yes, it's true, "..the scope container in javascript is a function."

Yep! Function scope + no tail-call optimization = a formidable challenge. ^_^

> Perhaps one way to do it would be to play a little fast and loose with semantics. Assume let translates to the scope of the containing function.

Interesting approach. I'll play around with this. There might also be weird hacks that could simulate let without costing a whole function on the stack. For example, you might be able to get away with something like this:

  (let x 5         var _temp = true;
    ... )          while (_temp) {
                     _temp = false;
                     var x = 5;
                     ...
                   }

Or this:

  (let x 5         var _temp = true;
    ... )          for (var x = 5; _temp = false; _temp)                           
                     ...
                   }

But I'm not sure yet if these work, or if they're less expensive than the straightforward let translation:

  (let x 5         (function() {
    ... )            var x = 5;
                     ...
                   }).call(this);

-----

2 points by rocketnia 5433 days ago | link

I tried these out, and they don't seem to help, at least in Chrome:

  var x = 1;
  {
      var x = 2;
      var y = "but why!?";
      alert( x );           // 2
  }
  alert( x );               // 2
  alert( y );               // but why!?
  
  var x = 1;
  do
  {
      var x = 2;
      var y = "but why!?";
      alert( x );           // 2
  }
  while( false );
  alert( x );               // 2
  alert( y );               // but why!?

The next thing you might try is mangling variable names. If you expect the JavaScript code to never use threads (which it doesn't, I think), you could temporarily replace the variable values using assignment.

For a high-level (and non-idiomatic) approach, it's possible to implement trampolining using exceptions. I've done it in about a page of JavaScript (right next to my Fermat's Last Theorem proof, of course of course :-p ), and I think the API can be as simple as this:

  - Use bounce( foo, foo.bar, 1, 2 ,3 ) instead of
      foo.bar( 1, 2, 3 ) in tail positions. This throws an
      exception containing information about the next call
      to make.
  - To begin a trampolining loop, call the function with
      trampoline( foo, foo.bar, 1, 2, 3 ). This should be
      done when calling functions at the top level and when
      calling functions from within functions that are to be
      exposed to other JavaScript utilities and DOM event
      handlers (which won't call trampoline themselves).

Okay, so determining when a closure could "escape" into the outside world might be hard. O_o To avoid that trouble, uh, keep a global variable saying that a trampolining loop is already in progress, and use trampoline( ... ) for almost every single call? Well, that'll complicate things a little. ^_^;

-----

1 point by evanrmurphy 5432 days ago | link

> The next thing you might try is mangling variable names.

I think this could work pretty nicely, actually:

  (let x 5         var _x;
    ... )          _x = 5;
                   ...

Here's an example with a global and local variable:

  (= x 6)            var x, _x;
  (let x 5           x = 6;
    (alert x))       
  (alert x)          _x = 5;
                     alert(_x); // => 5

                     alert(x); // => 6

If you use multiple lets with the same variable name in one scope, you would need to differentiate with additional underscores or something:

  (let x 5           var _x, __x;
    (alert x))         
  (let x 6           _x = 5;
    (alert x))       alert(_x); // => 5

                     __x = 6;
                     alert(__x); // => 6

Of course, there is still some risk of variable name collision, and the output isn't super readable. Maybe it's good to have two operators, `let` and `let!`. The former uses function scope, so it's safer and compiles to more readable JavaScript, but it increases the stack size. The latter (let!) is the optimized version that uses one of the schemes we've been talking about.

-----

1 point by rocketnia 5432 days ago | link

What's your policy regarding the ability for people to say (let _x 5 ...) to spoof the mangler? ;) Also, will the name "let!" conflict with ssyntax?

These questions shouldn't keep anyone up nights, but I thought I'd point them out.

-----

1 point by evanrmurphy 5432 days ago | link

> What's your policy regarding the ability for people to say (let _x 5 ...) to spoof the mangler? ;)

The underscore prefix system would be nice because it's pretty readable, but it's not necessary. In the worst case, we can use gensyms. I thought using underscores for internal variables might work because CoffeeScript does it [1], but maybe it won't. So you raise a good point. :)

> Also, will the name "let!" conflict with ssyntax?

Yes. I'm starting from scratch with the ssyntax. `!` is a regular character now. It can be used in names like `let!`, and when by itself in functional position it compiles to the JavaScript negation operator:

  (! x)   =>   !x

`.`, `:`, `~`, `[]`, `&` and friends are out, too. This may displease my Arc buddies, but I just found it too difficult to think in JavaScript when the gap in meaning between Arc and JS for these characters was so large. `.` compiles to the JS dot operator now, `{}` to the JS object literal and `[]` to the array literal (see [2] for some details). Note that quote ('), quasiquote (`), unquote (,), unquote-splicing (,@) and string quoting (") should all work as expected.

---

[1] http://jashkenas.github.com/coffee-script/#comprehensions

[2] http://arclanguage.org/item?id=12948

-----

1 point by akkartik 5433 days ago | link

Probably just different aesthetics. I sense that you're thinking in javascript and rocketnia is thinking in arc.

-----

1 point by aw 5434 days ago | link

  ({ a 1  b 2  c 3)

did you mean to have a closing curly bracket here, or is "{" acting like an operator?

-----

1 point by evanrmurphy 5434 days ago | link

I meant to leave it unclosed. Yes, it's acting as an operator - a convenience function for something like obj. (Using the modified reader from http://arclanguage.org/item?id=12866, there's no need to do |{| or anything like that to treat the curly brace as a symbol.)

-----