Arc Forum | I've fooled myself before, but I may have settled on the right data structure to...

Arc Forum

2 points by rocketnia 3333 days ago | link | parent

I've fooled myself before, but I may have settled on the right data structure to represent higher quasiquotations.

I define the data structure three times:

- Once as a sequence of types that represent progressively higher-degree quasiquotations, which use maps to hold orphaned nodes. (https://github.com/rocketnia/lathe/blob/5375d95cb7b748972a65...)

- Once as a single type that can represent arbitrarily high degrees of quasiquotation, but in a very weakly typed way. (https://github.com/rocketnia/lathe/blob/5375d95cb7b748972a65...)

- Once as a sequence of types that represent progressively higher-degree quasiquotations, which use type families to hold orphaned nodes. This is the most strongly typed of all these representations. (https://github.com/rocketnia/lathe/blob/5375d95cb7b748972a65...)

Here's the simplest one, the one in the middle:

  data HDExpr s m
    = HDExprMedia (m (HDExprNonMedia s m))
  data HDExprNonMedia s m
    = HDExprHole s [Map s (HDExpr s m)]
    | HDExprLayer (HDExpr s m) [Map s (HDExpr s m)]

It's not very self-documenting, so to start describing it, here's a Lispier pseudocode:

  <expr> ::= <s-expression where some leaves are <paren>>
  
  <paren> ::= (close <identifier> <list of <environment of <expr>>>)
  <paren> ::= (open-and-close <expr> <list of <environment of <expr>>>)

I say "s-expression" there, but we could have any monad there for our syntax. If we work in the s-expression monad, the usual quasiquote operations ` , are parens of degree 1. When the monad we're working in is Haskell's (Writer String), our syntax is effectively reader syntax, where the ( ) parens are degree 1 and ` , are degree 2.

(There are also parens of degree 0, but they're a bit weird: Once they open, they only close again by reaching the end of the syntax. So when we're in the s-expression monad, perhaps ' is a paren of degree 0. When we're in the (Writer String) monad, perhaps pressing enter at the end of a REPL command is a paren of degree 0.)

The degree of the paren is reflected in the length of the list of environments it contains. Degree-0 parens have no environments, degree-1 parens have exactly one, and so on. If we want to represent a degree-N expression, then the parens we use are limited in a specific way:

  <expr> ::= <s-expression where some leaves are <paren>>
  
  <paren> ::=
    (close <identifier>
      <for some (M < N), an M-element list of
        <environment of <expr>>)
  <paren> ::=
    (open-and-close <expr>
      <for some (M <= N), an M-element list of
        <environment of <expr>>)>

The (open-and-close ...) syntax begins a nested expression. The environments serve to fill the holes in the <expr>, resuming the previous level of nesting. Each element of the list fills holes of a different degree. Holes of degree higher than the number of environments in the list are not filled; they remain holes. For instance, in `(a b (c d ,e) f), the "(" before "c" has two holes in its expression (",e" and ") f"), but it only fills the hole for ") f". The hole for ",e" is a higher degree than the "(" paren, so it's not filled at such a local place. It's only filled by the "`" paren on the outside.

The (close ...) syntax begins a hole of degree equal to the number of elements of the list. The list of environments will be used when the hole is replaced with an expression. It'll fill in the (low-degree) holes of that expression.

Note that the list lengths correspond with the degrees of holes, but not with the degrees of expressions. In fact, a degree-N expression does not contain any expressions of degrees other than N. If we consider ourselves to be working with higher quasiquotations of an infinite degree N, but every (close ...) actually occurring in our data has finite degree, then we can represent our data using finite lists. We never need to select a finite value for N!

---

In Haskell, the strongly typed variations I wrote do require a finite value for N to be decided beforehand (and baked into the name of the type), because I'm pretty sure that vastly simplifies the type system features I would need to use to get it to work. :-p Roughly, the difficulty is that I need 0 to give me something of kind ( * ), 1 to give me something of kind ( * -> * ), 2 to give me something of kind (( * -> * ) -> ( * -> * )), and so on, so I would need kinds that depend on values. Agda or Idris might already be up for this task, but I doubt that's going to be the kind of effort I want to spend since I intended to build my macroexpander in (untyped) Racket to begin with.

There might just be a little more I want to tinker with in Haskell, because on my way to defining this data structure I started to develop some code for "higher monads," and it would be fun if this data structure turned out to be an instance of that concept. Still, at this point I'm probably ready to go back to Racket.

I'm even hopeful again that this macro system might play nicely with Racket's. Racket has some hygiene features[1] that walk recursively over the inputs and outputs of macros, expecting them to be s-expression-shaped with no holes. I wasn't sure that Racket's technique could be reconciled with higher quasiquotation. With this data structure, the exotic nesting of higher quasiquotations is converted to the traditional kind of nesting that Racket expects.

So, this might be a usable self-contained Racket library in the end -- not even a framework but a seamless library -- which is what I want, because that would make it easy to clarify the meaning of higher quasiquotation in terms of an existing ecosystem before I use it in Cene. :)

Since this thread and my code comments contain some walls of text, I'll see if I can convert them to a blog post soon.

---

[1] In particular, Racket uses a hygiene technique where it attaches a fresh scope label to a piece of syntax before macroexpanding and then flips the presence of that scope after macroexpanding so that the scope winds up attached to only the parts of the macro result that don't come from the input (making that region more local than the rest). Racket also has a "syntax taints" feature which lets macros attach "dye packs" to their results so that the identifiers occurring in those results can't be used by a client to access private module bindings.