Arc Forum | (Edit: Oops, I replied out of order and didn't read shawn's comment with the eli...

Arc Forum

3 points by rocketnia 2354 days ago | link | parent

(Edit: Oops, I replied out of order and didn't read shawn's comment with the elisp examples before writing this.)

I suspect what pg means by it is primarily that it's tricky to do in Racket (though I'm not sure if it'd be because there are too few options or too many).

Essentially, I think it's easy to display a closure by displaying its source code and all the captured values of its source code's free variables. (Note that this might be cyclic since functions are often recursive.)

But there is something tricky about it, which is: What language is the source code in? In my opinion, a language design is at its most customizable when even its built-in syntaxes are indistinguishable from user-defined ones. So the ideal displayed format of a function would ideally involve some particular set of user-definable syntaxes. Since a language designer can't anticipate what user-defined syntaxes will exist, clearly this decision should ultimately be up to a user. But what mechanism does a user use to express this decision?

As a baseline, there's at least one straightforward choice the user can make: The one that expresses the code in terms of Arc special forms (`fn`, `assign`, `if`, `quote`, etc.) and Arc functions that aren't implemented in Arc (`car`, `apply`, etc.). In a language that isn't aiming for flawless customizability, this is enough.

Now suppose we try to generalize this so a user can choose any set of syntaxes to express things in terms of -- say, "I want the usual language, but with `=` treated as an additional built-in." If the code contains `(= x (rfn ...))`, then the macroexpander at some point needs to expand the `rfn` without expanding the `=`. That's not really viable in terms of Arc's macroexpander since we don't even know `(rfn ...)` is an expression in this context until we process `=`. So this isn't quite the right generalization; the right generalization is something trickier.

I suppose we can have every function printed in terms of its pre-macroexpansion source code along with printing all the macros and other macroexpansion-time state it happened to rely on as it macroexpanded the first time. That would narrow down the problem to how to print the built-in functions. And we could solve that problem in the language design by making it so nothing ever captures a built-in function as a first-class value, only as a late-bound reference to the global variable it's accessible from.

Or we could have the user specify their own macroexpander and make it so that whenever a function is printed, if the current macroexpander hasn't expanded that function yet, it does so now (just to determine how the function is printed, not how it behaves). This would let the user specify, for instance, that `assign` expands into `=` and `=` expands into itself, rather than the other way around.

These ideas are incomplete, and I think making them complete would be pretty tricky.

In Cene, I have a different take on this: If a function is printable (and not all are), then it's printable because it's a callable struct with a tag the printer knows about. It would be printed as a struct. The function implementation wouldn't be printed. (The user could look up the source code information based on the struct tag, but that's usually not printable.) There may be some exceptions at the REPL where information is printed that usually isn't available, because the REPL is essentially a debugging context and the debugger sees all. (Racket's struct inspectors express a similar debugger-sees-all principle, but I haven't seen the REPL take advantage of it.)

2 points by shawn 2354 days ago | link

You're hitting on a problem I've been thinking about for years. There are a few reasons this is tricky, notably related to detecting whether something is a variable reference or a variable declaration.

  (%language arc
    (let in (instring "  foo")
      (%language scm
        (let-values (((a b c) (port-next-location in)))
          (%language el
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))

To handle this example, you'll need to know whether each form is a function call, a variable definition, a list of definitions (let-values), a function call, and which target the function is being called for.

For example, an arc function call needs to expand into `(ar-apply foo ...)`

And due to syntax, you can't just handle all the cases by writing some hypothetical very-smart `ar-apply` function. If your arc compiler targets elisp, it's tempting to try something like this:

  (ar-apply let (ar-apply (ar-apply a (list 1))) ...

which can collapse nicely back down to

  (let ((a 1)) ...)

in other words, it's tempting to try to defer the "syntax concern" until after you've walked the code and expanded all the macros. Then you'd collapse the resulting expressions back down to the language-specific syntax.

But it quickly becomes apparent that this is a bad idea.

Another alternative is to have a "standard language" which all the nested languages must transpile tO:

  (%let in (%call instring "  foo")
    (%let (a b c) (%call port-next-location in)
      (|with-current-buffer| (%call generate-new-buffer "bar")
        (%call insert (prin1-to-string c)
          (%call current-buffer)))))

Now, that seems much better! You can take those expressions and easily spit out code for scheme, elisp, arc, or any other target. And from there it's just a matter of adding shims on each runtime.

The tricky case to note in the above example is with-current-buffer. It's an elisp macro, meaning it has to end up in functional position like (with-current-buffer ...) rather than something like (funcall #'with-current-buffer ...)

There are two ways to deal with this case. One is to hook into elisp's macroexpand function and expand the macros as you go. Emacs calls this eager macroexpansion, and there are some cases related to autoloading (I think?) that make this not necessarily a good idea.

The other way is to punt, and have the user indicate "this is an elisp form; do not mess with it."

The idea is that if the symbol in functional position is surrounded by pipe chars, then the compiler should leave its position alone but compile the arguments. So

   (|with-current-buffer| foo
     (prn (|buffer-string|)))

That works quite nicely, untll you try this:

  (|let| ((|a| 1) (|b| 2))
    (+ |a| |b|))

Then you'll be in for a nasty surprise: not only does it look visually awful and annoying to write, but it won't work at all, because it'll compile to something like this:

  (let (ar-funcall2 (a 1) (b 2))
    (ar-funcall2 _+ a b))

I am not sure it's possible to escape the "syntax concern". Emacs itself had to deal with it for user macros. And the solution is unfortunately to specify the syntax of every form explicitly:

https://www.gnu.org/software/emacs/manual/html_node/elisp/In...

  (def-edebug-spec let
       ((&rest
         &or symbolp (gate symbolp &optional form))
        body))

Ugh, augh, grawr. You can see how bad it would be to curse the user with having to do this for every macro they write.

Yet I am not sure it's possible to escape this fate. And it seems to work well in emacs.

Hopefully some of that might be helpful on your quest. The goal is worth pursuing.

-----

3 points by rocketnia 2354 days ago | link

I can't speak to elisp, but the way macro systems work in Arc and Racket, the code inside a macro call could mean something completely different depending on the macro. Some macros could quote it, compile it in their own way, etc. So any code occurring in a macro call generally can't be transformed without changing the meaning of the program. Trying to detect and process other macro calls inside there is unreliable.

I have ideas in mind for how macro systems can express "Okay, this macro call is over; everything beyond this point in the s-expression is an expression." But that doesn't help with Arc or Racket, whose macro systems aren't designed for that.

So something like your situation, where you need to walk the code before knowing which macroexpander to subject each part of it to, can't reliably treat the code as code. It's better to treat the code as a meaningless soup of symbols and parentheses (or even as a string). You can walk through the data and find things like `(%language ...)` and treat those as escape sequences.

(What elisp is doing there looks like custom escape sequences, which I think is ultimately a more concise way of doing things if new macro definitions are rare. It gets into a middle ground between having s-expression soup and having a macro system that's designed for letting code be walked like this.)

Processing the scope of variables is a little difficult, so my escape sequences would be a bit more verbose than your example. It's not like we can't take a Racket expression and infer its free variables, but we can only do that if we're ready to call the Racket macroexpander, which isn't part of the approach I'm describing.

(I heard elisp is lexically scoped these days. Is that right?)

This is how I'd modify the escape sequences. This way it's clear what variables are passing between languages:

  (%language arc ()
    (let in (instring "  foo")
      (%language scm ((in in))
        (let-values (((a b c) (port-next-location in)))
          (%language el ((c c))
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))

Actually, instead of just (in in), I might also specify a named strategy for how to convert that value from an Arc value to a Racket value.

Anyhow, once we walk over this and process the expressions, we can wind up with generated code like this:

  ; block 1, Arc code
  (fn (__block2 __block3)
    (let in (instring "  foo")
      (__block2 __block3 in)))
  
  ; block 2, Scheme code
  (lambda (__block3 in)
    (let-values (((a b c) (port-next-location in)))
      (__block3 c)))
  
  ; block 3, elisp code
  (lambda (c)
    (with-current-buffer (generate-new-buffer "bar")
      (insert (prin1-to-string c))
      (current-buffer)))

We also collect enough metadata in the process that we can write harnesses to call these blocks at the right times with the right values.

This is a general-purpose technique that should help with any combination of languages. It doesn't matter if they run in the same address space or anything; that kind of detail only changes what options you have for value marshalling strategies.

I think there's a somewhat more convenient approach that might be possible between Arc and Racket, since their macroexpanders both run in the same process and can trade off with each other: We can have an Arc macro that expands its body as Racket code (essentially Anarki's `$`) and a Racket macro that expands its body as Arc code. But there are some difficulties designing the latter, particularly in terms of Racket's approach to hygiene and its local macros, two things the Arc macroexpander has zero concept of. When we switch from Racket to Arc and back to Racket, the hygiene information and local macro scopes will probably be obliterated.

In your arcmacs project, I guess you might also be able to have an Arc macro that expands its body as elisp code, an elisp macro that expands its body as Racket code, etc. :-p So maybe that's the approach you really want to take with `%language` and I'm off on a tangent with this "escape sequence" interpretation.

-----