Arc Forum | A rough description of Penknife's textual syntax

Arc Forum

A rough description of Penknife's textual syntax

3 points by rocketnia 5411 days ago | 3 comments

(Apologies for making this a really rough draft. ^^; I'm finding the need to talk about Penknife, but I won't have time to actually introduce Penknife as comprehensively as I want to in the foreseeable future, so this is a hurried introduction to get things started. Feel free to ask for dozens of clarifications. :-p )

Penknife's a language I'm making, which is intended to be customizable. It has a very uniform syntax that embraces the sequence-of-characters nature of code, so that the majority of imaginable custom syntaxes will all naturally fall within its domain, thanks to being based on sequences of characters.

This syntax...

  foo[text text text]

...compiles by compiling "foo" first to get a "parse fork", then forking it as an "op" using the body "text text text", which hasn't been parsed yet. The parse fork of foo will take care of the parsing itself.

So, parse forks. Parse forks are intermediate values that represent all the different meanings of an expression in various contexts. The op context is one example, and others include the top-level command context, the standalone expression context, and the settable-place context (where Arc uses 'setforms).

When a parse fork is forked in the op context, it's provided with the body as a sequence of characters, like "text text text" here. The parse fork parses that body manually according to its own behavior. Then the result is another parse fork, so that the operator form can be meaningful in various surrounding contexts. For instance, parsing foo[bar][baz] relies on the parse fork of foo[bar], which relies on the parse fork of foo... which is obtained by magic for our purposes. ^_^

This parsing method effectively allows several things all at once. It allows us to have operators that implement custom string syntaxes. (In fact, the string syntax I currently use in Penknife is q[This is a string.], and I'm not disappointed.) It also allows us to have macros.

But more interestingly than that, it allows us to have metafns; if a custom parse fork is "op"-forked with one body, it can return another custom parse fork that consumes yet another body, and so on until it's quite bodiful enough for anybody. In particular, Penknife already copies Arc's 'compose metafn, which is the only one I really care about anyway. :-p

== But... all text? ==

There's a caveat to foo[text text text] syntax. The body must have matched square brackets.

Penknife automatically strips out comments before doing any parsing, so there's that too. I'm using semicolons now, like Arc, but it turns out I use semicolons when writing in English, which is important for the contents of strings, so I've been reconsidering that decision.

== Brackets on the outside ==

The foo[bar][baz] syntax isn't that friendly when trying to indent big blocks of code, IMO. I'm used to Arc, where the parentheses are on the outside: ((foo bar) baz). So Penknife allows [[foo bar] baz], which is equivalent to foo[ bar][ baz].

Note the spaces in foo[ bar][ baz], so the other syntax is still necessary for the q[This is a string.] syntax and so forth.

== Infix syntax ==

Also, Penknife supports a variant of ssyntax, which I've hinted at here and there on the forum.

To implement that, there are two categories of identifiers in Penknife: Identifiers that include only letters, digits, and the characters in "+-/<=>", and identifiers that include only characters other than those, like . ! & : ' " , etc., with the exception of square brackets and whitespace. These are called "alpha" identifiers and "infix" identifiers respectively.

The Penknife expression a!!!!b is parsed the same way as !!!![a][b]. In general, if the Penknife expression (not including parts inside brackets) contains an infix identifier it doesn't start with, the last such infix identifier is taken as the operator for the parts before and after it.

The "doesn't start with" condition is important. It's just fine to have infix operators at the end: a! just becomes ![a][], taking the parts before and after like always. But if we try to have !a parse like ![][a], then that will parse like ![][[][a]], that will parse like ![][[][[][a]]], and we'll be caught in a loop.

This automatically takes care of a commonly encountered shortcoming of Arc's ssyntax. You can finally say [foo a b c].d!e.[bar f] in a fully consistent way.

Also, nothing else distinguishes infix identifiers from alpha identifiers. In Penknife, you can locally bind . and use it right away for a.b syntax.

In conclusion, Penknife's syntax supports the equivalents of Arc's metafns and ssyntax in a totally consistent way, without hacking the language. It also requires no inherent literal syntaxes whatsoever, just operators that provide them.

== Not very iconic ==

One criticism of Penknife's syntax is that it isn't homoiconic. A Penknife syntax operator accepts its body as a list of characters, which isn't very semantic.

However, I believe with the right tools, dealing with lists of characters won't be too bad. The default function call syntax just breaks its characters up into words and has the words parsed individually, and I've already set up a macro system that invisibly uses that parser so that you don't have to worry about it as much. I'm even satisfied with quasiquoting. Here's my current definition of let:

  [mac* let [var val] body
    qq.[[tf [\,var] \,body] \,val]]

(The tf operator is the most bare-bones kind of lambda.)
In general, I think the conveniences homoiconicity provides can be regained through the accumulation of enough parsing utilities.

1 point by evanrmurphy 5411 days ago | link

> It has a very uniform syntax that embraces the sequence-of-characters nature of code

How would you say these Penknife utilities that focus on character streams and a read macro system? Is there a big difference?

> In Penknife, you can locally bind . and use it right away for a.b syntax.

I'm certainly attracted to the idea of locally bound syntax. This makes me want to try and write a locally binding version of aw's extend-readtable [1].

> In general, I think the conveniences homoiconicity provides can be regained through the accumulation of enough parsing utilities.

Interesting conclusion. Maybe a focus on character streams is actually homoiconic in its own right, only with a finer granularity than lisp symbols? (Or am I butchering the concept of homoiconicity to suggest this?)

> (The tf operator is the most bare-bones kind of lambda.)

Why did you name it `tf`, and in which ways is it more bare-bones than other lambdas?

You made this remark about Penknife in a different thread [2]:

> It's a language much like Arc, but what's important here is that its expressions are compiled and macro-expanded to an intermediate format first--i.e. parsed into an AST--and it's easy to make closures whose ASTs can be inspected. My plan is for the Penknife AST itself to be extensible using user-defined types, and my goal is for non-Penknife ASTs to be made using the same syntax technology used elsewhere in Penknife, right down to using the same compiler.

I thought that one of the special things about lisp-family languages was that they essentially were ASTs. That is, unlike in most languages where the syntax is so complex that you're quite far removed from the parsing layer of compilation, in lisp using s-expressions you're essentially programming in parse trees. Can you help me understand the difference between this idea and what you're describing?

Last question: do you have an in-progress implementation of Penknife, or have you been designing it on paper so far?

Thanks for sharing!

---

[1] http://awwx.ws/extend-readtable0

[2] http://arclanguage.org/item?id=12947

-----

2 points by rocketnia 5411 days ago | link

> How would you say these Penknife utilities that focus on character streams and a read macro system [compare]? Is there a big difference?

Read macros are probably a more capable system in general, if only 'cause you can make a read macro that turns your language into Penknife. :-p In fact, Racket has Scribble, which is very similar to Penknife's syntax. (http://docs.racket-lang.org/scribble/reader.html#(part._.The...)

I don't dislike read macros. I'm just optimistic about having things like #hash(...), #rx"...", and `... be unnecessary, thanks to putting operators like hash[...], rx[...], and `[...] in the global namespace where they're treated consistently with other custom syntaxes. There's no room left for read macros in Penknife's syntax, but that's just how optimistic I am. :-p

I eventually intend for certain Penknife commands to be able to replace the reader/parser, though. That's not the same as a read macro since it spans multiple subsequent commands, but it's in a similar spirit, letting syntaxes interpret the code as a stream of characters rather than a stream of self-contained commands or expressions.

> Maybe a focus on character streams is actually homoiconic in its own right, only with a finer granularity than lisp symbols?

I don't know. I don't think so. Penknife generally treats syntax (textual, abstract, or whatnot) as a domain with its own type needs. I'm not making any conscious effort to have its syntax double as a convenient way to input common data types.

Indeed, there's no notion of an "external representation" for a Penknife value either, and I'm not sure how to approach that topic. That being said, once there's even one text-serialized form for Penknife values, it's trivial to make a Penknife syntax like "literal[...]" that deserializes its body. I don't know if that counts as homoiconic either.

> Why did you name it `tf`, and in which ways is it more bare-bones than other lambdas?

There are currently two kinds of closures in Penknife: thin-fns and hefty-fns.

A thin-fn (tf) is for when all you care to do with the value is call it. Their implementation doesn't bother doing more than it has to for that purpose; right now thin-fns are just represented by Arc 'fn values.

A hefty-fn (hf) is for when you might want to reflect on the contents of the closure, including its code and the variables it captures. I'm considering having most Penknife code use hefty-fns, just in case someone finds a use for that reflection, like rewriting a library to remove bugs or compiling certain Penknife functions to JavaScript. (The latter probably won't be an entirely faithful translation, 'cause [hf ...] itself doesn't have a good JavaScript equivalent.)

> I thought that one of the special things about lisp-family languages was that they essentially were ASTs.

They're like ASTs, but they're a little bit hackish. You typically only know an s-expression is a function call once you've determined it isn't a special form. If instead every list is a special form, then basically the car of the list tells you its type, and it's equivalent to what I'm doing. (Macro forms have no equivalent in Penknife ASTs, so I'm not comparing those.)

Still, rather than just using lists, I do expect AST nodes to have completely distinct Penknife types. This is so that extending Penknife functions for different AST nodes is exactly the same experience as extending them for custom types.

> Last question: do you have an in-progress implementation of Penknife, or have you been designing it on paper so far?

Whatever I've been talking about in the future tense is still on paper, but the present tense stuff is all here: https://github.com/rocketnia/penknife

First you have to load Lathe. Follow the instructions here: https://github.com/rocketnia/lathe

Also, if you can, I recommend using Rainbow for your Arc implementation, since it gives a noticeable speed boost, but I occasionally run it on Arc 3.1 and Anarki too. Jarc is almost supported, but I broke Jarc compatibility in a recent change because the speed was worst on Jarc anyway.

Penknife's broken up into multiple files, but they're not Lathe modules, and I don't have an all-in-one loader file for them yet either, so you sort of have to manage a dependency hell right now:

  ; pk-hefty-fn.arc
  ...
  ; This is a plugin for Penknife. To use it, load it just after you
  ; load penknife.arc and pk-thin-fn.arc.

  ; pk-thin-fn.arc
  ...
  ; This is a plugin for Penknife. To use it, load it just after you
  ; load penknife.arc and pk-util.arc.

Altogether, I think the load order should be Lathe first of all, then penknife.arc, pk-util.arc, pk-thin-fn.arc, pk-hefty-fn.arc, then pk-qq.arc. Then run (pkload pk-replenv* "your/path/to/pk-util.pk") to load some utilities written in Penknife--the slowest part--and run (pkrepl) to get a REPL. You'll have to look at the code to see what utilities are available at the REPL, but if you type "drop." or "[drop]", that'll at least get you back to Arc.

I haven't actually tried Penknife on the latest versions of Rainbow and Lathe, or Anarki for that matter. If it's buggy right now, or if you hack on it and introduce bugs, then entering "[drop]" may itself cause errors. Fortunately, you may be able to recover to Arc anyway by entering an EOF, using whatever control sequence your terminal has for that. If even that doesn't work, you're stuck. ^^

-----

1 point by evanrmurphy 5411 days ago | link

> How would you say these Penknife utilities that focus on character streams and a read macro system?

That wasn't a sentence. I meant to write: How would you say these Penknife utilities that focus on character streams and a read macro system compare?

-----