The symptom was different (a copy/paste issue) but the phenomenon was the same (files with .arc extension not behaving normally). The cause is indeed that Emacs thinks .arc means an archive file. The solution is here:
The variable you need to change is auto-coding-alist, like this:
(push '("\\.arc$" . utf-8) auto-coding-alist)
The reason this problem was so confusing is that the settings in auto-coding-alist take precedence over some of the other things that seem like they ought to work but don't, like
modify-coding-system-alist (described above) or putting a character-encoding directive like:
-*- coding: utf-8 -*-
at the top of the file. Since it was surprisingly hard to track this down and I only found it on my dozen-th or so Google search, I'm making this comment verbose in the hope that the current thread will surface more easily in the future.
Cool. I look forward to reading some new stuff about it. I'll just comment on some of the things here.
> Version 1.2 (today) introduces the shortcut "shift+b" to build program, which will show you the compiled Js output for Fire programs. You can also use "command+shift+b" to build and save to file.
But for a run button, I'd think you want this to be as obvious as possible. So don't even bury it in a menu -- make a header with a giant "run" button. Or maybe not "giant", but somewhere that's displayed when you open the page.
> Another person emailed me and said it should start with "hello world", and not Fib. That's coming soon.
I feel like you want to have both. But yes, it's useful to have Hello World.
I do want to play around with this, so hopefully I can understand what it is soon.
I agree with every point you made and am working on fixing them.
> mainly seems to take as an axiom that tree-structured^1 languages are better.
Correct! And also correct that I haven't proven this to be the case yet. I'm heading in that direction and think by the end of week 1 of the announcement there should be more evidence in that regard. I just launched version 1.2 just now. In a Flow program, you can now add a ">3d" block, which can have a "content" property where you can add some ETN code and see it somewhat visualized in 3d. Rough version, but starting to hint at what's to come. Basically imagine a 100,000 line program, where you can visualize and manipulate the source (and AST!) in 3D, and it all runs blazing fast. That's where we are headed with this. I believe 2D/3D languages may be better because 1) it seems the constraints imposed by the criteria that source must map to physical dimensions helps avoid anti-patterns (but i don't have a proof yet on why) 2) our brains are wired to work in 3D. Although those are theoretical guesses. For me, 95% of why I believe they are better is because of my experience working with them the past few months (which I'm trying to bring that experience to others asap).
But yes. Agreed that I haven't proved this yet.
> I open http://ohayo.computer/, and the first thing presented to me is the Apple "The Crazy Ones" video. This is an extremely offputting message.
Haha, thanks for that feedback. Last week was a week with little sleep, and didn't have time to make a proper intro screencast, so put that up as a placeholder because I thought the "ones who see things differently" was apropos. That won't be up much longer. Thanks for sharing your thoughts.
> I'm seeing far more self-promotion than argument
Agreed. I just do believe in the potential impact of ETNs, and truly believe if anything I'm underselling the impact they could have, and if I got hit by a bus before I could provide more evidence I wanted to make sure people took notice and continued where I left off. Low bus factor last week--but now that is much higher! Maybe once more evidence is out there I can revisit all those posts and tone them down.
Another person emailed me and said it should start with "hello world", and not Fib. That's coming soon. Thanks for this additional feedback. Also, the Fire editing is brand new. It was very theoretical until last week. So the UX has a long way to go. I made some improvements this morning but still it's quite shitty. Working on it.
> How do I actually run any code?
Version 1.2 (today) introduces the shortcut "shift+b" to build program, which will show you the compiled Js output for Fire programs. You can also use "command+shift+b" to build and save to file.
> The File menu has no "run" option.
Great suggestion! Will likely add next version.
Also, another suggestion I got is to have a "Quick tips" that shows the top 5 things to do (like double click to add a node, ? for help, et cetera). Coming soon.
> source code that corresponds to what's shown
Shift+u. Again, great feedback. I'll add a File toggle and also a quick tip.
> I really just want to see how these things work. I don't want to see unsupported claims. I want to be able to write some code, or modify existing code. Please don't make it so hard for me to understand what this is.
Agreed! Thanks! I also made a lot of speed and test improvements this morning and will have project editing very soon.
>  I'm not trying to pick on the presentation for the sake of picking on it, but the pictures are extremely difficult to read, making it difficult to understand the argument. At least rotate the pictures so they're right-side-up. Scan them if you can, and do your best to write them with your best handwriting -- I have similarly bad handwriting, so I don't present people with handwritten documents if I can avoid it.
Haha, great points!
>  There appear to be multiple terms for this. It's somewhat confusing, because the picture proofs talk about "pure tree languages", but other things talk about ETNs (Extends Tree Notation), or "geometric trees". I don't think "pure tree language" in the picture means anything different from "tree" in the text, but I'm not sure of that. It would be useful to pick one name for each concept and stick to it.
Correct. I haven't found the correct mathematical term for it (I've searched graph, braid/knot, set, and some other theories). I bet there is one. If not, maybe I'll standardize on "Geometric Tree".
I'm having a pretty hard time understanding what this is, and I think a large part of this is that claims are stated instead of proven or explained. I see that the author thinks this is the future, but "what this is is self-evidently better" seems to be the argument most put forward, without being backed up by examples.
> In Lisp, which uses parantheses for structure and allows arbitrary whitespace, there are many ways to write your code, and not all of those ways arrange your source code into “geometric trees”. That is, if you connect the nodes of your program with lines, sometimes those lines will intersect or be coincident.
The presentation below^0 mainly seems to take as an axiom that tree-structured^1 languages are better. If you already believe tree-structured languages are better, then yes, Lisp is not as good.
But if you're not (yet) convinced? Ok, then that's just a property of languages that Lisp doesn't have. It also doesn't have the property that it requires semicolons at the end of the line, but by itself, that proof doesn't mean Lisp is worse than Lisp-with-semicolons-at-EOL. What about ETNs are better? Yes, if you draw lines from each node to its parent, they intersect -- but why do I care? Drawing lines like that is something I've never done. I assume your argument is something like "it's easier to understand code when it's clear", but I don't see you even state that argument, nevermind make a case for it.
So, being the non-academic that I am, I want to go to see how the code works. I open http://ohayo.computer/, and the first thing presented to me is the Apple "The Crazy Ones" video. This is an extremely offputting message. Between that and the article submitted here (Programming is Now Two-Dimensional), I'm seeing far more self-promotion than argument. It's like listening to a professional wrestler cut a promo: "Yeaaaaah brother. What 'chu gonna do, when tree languages run wild on you", but when it comes to that wrestler getting into the ring, the wrestler is nowhere to be found.
And that aside, I can't even seem to run any code! After reading the README, I learn that there are two languages -- Flow and Fire. I'm not a data science guy, so I click on `fib.fire`. I see this: http://imgur.com/a/3sIxf. There's no connection of anything at all, just a dozen random blocks of noise. How do I actually run any code? I don't see any "run" buttons. The File menu has no "run" option. How do I see this work? Also, I have no idea how this is connected to tree-structured languages! There are no lines between things, as seems to be the motivating factor of this paradigm!
Looking at the "sneak peek" video (http://breckyunits.com/ohayo-sneak-peak.html), apparently there is some sort of source code that corresponds to what's shown. But unlike in the video, I can't click to drag around things. I can't see the source code. Why not?
I really just want to see how these things work. I don't want to see unsupported claims. I want to be able to write some code, or modify existing code. Please don't make it so hard for me to understand what this is.
 I'm not trying to pick on the presentation for the sake of picking on it, but the pictures are extremely difficult to read, making it difficult to understand the argument. At least rotate the pictures so they're right-side-up. Scan them if you can, and do your best to write them with your best handwriting -- I have similarly bad handwriting, so I don't present people with handwritten documents if I can avoid it.
 There appear to be multiple terms for this. It's somewhat confusing, because the picture proofs talk about "pure tree languages", but other things talk about ETNs (Extends Tree Notation), or "geometric trees". I don't think "pure tree language" in the picture means anything different from "tree" in the text, but I'm not sure of that. It would be useful to pick one name for each concept and stick to it.
Wow, what a quote! That's exactly how I feel about Ohayo. That seems like a better approach--not to be in a big hurry to get people to use it today. Thank you akkartik. Really appreciate that advice!
My only concern pre-launch and announcement, was that I was going to get hit by a bus and the world would have to wait longer for someone else to stumble upon (and popularize) TN and ETNs. Now that it's out there and a few thousand people have seen it, I can take this more sensible approach.
Btw, just pushed version 1.1.0 if anyone's interested.
UX still needs work, but I rushed adding a "3D block" to the flow language, (using the vis.js library), so you can start to see what "3D" code looks like.
That motivation makes sense. Bear in mind, though, that the "less modest" approach has a limited amount of gas. It will stop working at some point.
I can relate with having these questions and considering the different strategies as well. If you really think that this is going to be your life's work, it's reasonable to burn some 'reputation' to get the word out. However, I've often been wrong before. Now I tend to err on the side of playing a long game.
Over time I've gained respect for the essential wisdom of this quote:
"We knew that Google was going to get better every single day as we worked on it, and we knew that sooner or later everyone was going to try it. So our feeling was that the later you tried it, the better it was for us because we’d make a better impression with better technology. So we were never in a big hurry to get you to use it today. Tomorrow would be better." -- Sergey Brin, as retold by Seth Godin in "The Dip"
Applicable to ideas like here just as much as products.
I totally agree. Although perhaps I wouldn't have gotten as much feedback had I taken a more modest approach? Hard to say. I still haven't done a good job communicating the benefits of ETNs yet, stemming from their 2D/geometric nature. Almost got version 1.1 of Ohayo done which makes another step toward that.
Thanks for the comments! The thing that z, readable, and wisp all lack, is that with ETNs every node has a point (x,y,z). So in addition to your source code mapping directly to your AST, it also maps to physical, geometric space (The Z axis is your program/document, the y axis is your line number, and the x axis is the column #/indent level).
So what you now have with ETNs, is the power of Lisp, now in a regular 2D/3D "structure", that you can inspect, visualize, and manipulate in new ways.
> 1. ETN uses fewer nodes. Why?
That's not compared to Lisps. Sorry, that was not clear. The paper was targeted toward a broader programming audience.
> 2. No parse errors. I have to say that I was writing in c++ recently, and the parsing wasn't an issue. And as you say, it doesn't make nonsense programs correct.
Ohayo briefly shows the power of no parse errors. Your program is parsed by little "micro-parsers", so there's no monolithic parse failure and programs can recover/autocorrect gracefully. More demonstrations here will do a better job of explaining. More to come.
> 3. No semantic diffs.
It's actually only "Semantic diffs" (not "not semantic diffs"). As opposed to syntax diffs. So "(+ 1 2)" and "(+ 1 2)" mean the same thing semantically in Clojure, for example, but git will give you a 1 line diff because of the whitespace syntax diff. In an ETN, "+ 1 2" and "+ 1 2" generally would be different, and could cause an ETN error unless the ETN allowed blank words.
4. Nothing collides. After years of going through every possible edge case, you cannot break TN. There is nothing that collides with the syntax. See for yourself. Download the library and create a TN with it's line/children set to something you think will collide. The indentation and ability to do "getTailWithChildren" takes care of all possible edge cases. (Sorry, I realize I still need to explain this better as this is a common concern and it is very surprising to people (myself included), that there are no edge cases that don't work.
Sorry, the lines in the geometric mapping of the source code are coincident. In drawing B) in the visual proof, the edges which connect the child nodes to their parents intersect and/or are coincident. So when you put Lisp source code onto graph paper, and draw boxes around the nodes, and line segments for edges, it shows why Lisp source is not a geometric language (I define a geometric language as one where there are no intersecting or coincident line segments).
Now, figure A) shows the same Lisp code, formatted differently, in a way that is a geometric language. But as you can see, that code is standard TN/ETN. Or perhaps another way to put it is ETNs are just Lisps with a whitespace syntax and no parentheses. Another reader on HN pointed me to I expressions (https://srfi.schemers.org/srfi-49/srfi-49.html), which I hadn't seen before and is 90% of the way there to TN and ETNs. The creator of I-Expressions have communicated briefly over email now and are going to be talking soon.
Anyway, perhaps another term for Tree Notation/ETNs is "Geometric Lisp", or "2-Dimensional Lisp". I'm not wedded to the terms TN/ETNs, although I do think it's better to have new terms, because I think these will come to dominate the usage of Lisp.
Still working on more updates and evidence on why I think these will be so big.
Though it's hard once again to understand. When are two nodes coincident? By definition you can only have one character in one place on the screen. Is he talking about indentation, that the same level of indentation can mean different things? That seems true of ETN as well, from what I can tell.
The pencil scratchings on the screenshots don't help either.
I thought all the emphasis of how great it was made it harder for people to understand _what_ "it" was. Probably a good idea to keep an initial post like this matter-of-fact. Motivate it with just one simple strength that's easiest to communicate. Describe the broader context and implications in a separate post, later, after people understand what it is.
I've read the paper. From what I understood, the core idea is that source code maps to ast directly. Which is what lisp is already doing.
You say everything is a tree, but isn't this already so? Json, html, lisp, just to name a few languages. You call it 2D: an advancement in the y axis makes a "sibling" and in the x makes a "child". But children and siblings as a concept already exist, even though I haven't heard it called 2D programming yet.
Let's look at the example in the paper:
title Jack and Ada at BCHS
This is basically a whitespace based syntax for lisp.
((title Jack and Ada at BCHS) (visitors (mozilla 802)))
So, I'm sorry, but I see no innovation in this new approach. Of course I could be wrong.
I'll comment your points in the paper:
1. ETN uses fewer nodes. Why? It maps to the same tree as the languages I already mentioned.
2. No parse errors. I have to say that I was writing in c++ recently, and the parsing wasn't an issue. And as you say, it doesn't make nonsense programs correct.
3. No semantic diffs. I didn't really understand, to be honest, but "just one way to encode a program" is wrong, imho. There are always many ways to the same thing. For example "a + b" is the same ad "b + a", and "a <= n" is the same as "a < n + 1" (when "a" and "n" are ints).
4. Easy composition. Python uses indentation, which collides with the syntax of TN. As long as you have reserved characters(and not having is probably impossible) you'll have to do something with languages that use them.
By the way, the whole idea seems to be that you can have nicer syntax, while I think the biggest advantage of writing ast directly is that you can create syntax. Obviously I could be wrong, I could have missed the point, it's perfectly possible.
I had high hopes for Arc over a decade ago, then it languished, then pg stopped participating.... then I looked at Racket (then PLT-Scheme) since that's what Arc was built on and realized Racket was the language I was looking for :)
data HDExpr s m
= HDExprMedia (m (HDExprNonMedia s m))
data HDExprNonMedia s m
= HDExprHole s [Map s (HDExpr s m)]
| HDExprLayer (HDExpr s m) [Map s (HDExpr s m)]
It's not very self-documenting, so to start describing it, here's a Lispier pseudocode:
<expr> ::= <s-expression where some leaves are <paren>>
<paren> ::= (close <identifier> <list of <environment of <expr>>>)
<paren> ::= (open-and-close <expr> <list of <environment of <expr>>>)
I say "s-expression" there, but we could have any monad there for our syntax. If we work in the s-expression monad, the usual quasiquote operations ` , are parens of degree 1. When the monad we're working in is Haskell's (Writer String), our syntax is effectively reader syntax, where the ( ) parens are degree 1 and ` , are degree 2.
(There are also parens of degree 0, but they're a bit weird: Once they open, they only close again by reaching the end of the syntax. So when we're in the s-expression monad, perhaps ' is a paren of degree 0. When we're in the (Writer String) monad, perhaps pressing enter at the end of a REPL command is a paren of degree 0.)
The degree of the paren is reflected in the length of the list of environments it contains. Degree-0 parens have no environments, degree-1 parens have exactly one, and so on. If we want to represent a degree-N expression, then the parens we use are limited in a specific way:
<expr> ::= <s-expression where some leaves are <paren>>
<for some (M < N), an M-element list of
<environment of <expr>>)
<for some (M <= N), an M-element list of
<environment of <expr>>)>
The (open-and-close ...) syntax begins a nested expression. The environments serve to fill the holes in the <expr>, resuming the previous level of nesting. Each element of the list fills holes of a different degree. Holes of degree higher than the number of environments in the list are not filled; they remain holes. For instance, in `(a b (c d ,e) f), the "(" before "c" has two holes in its expression (",e" and ") f"), but it only fills the hole for ") f". The hole for ",e" is a higher degree than the "(" paren, so it's not filled at such a local place. It's only filled by the "`" paren on the outside.
The (close ...) syntax begins a hole of degree equal to the number of elements of the list. The list of environments will be used when the hole is replaced with an expression. It'll fill in the (low-degree) holes of that expression.
Note that the list lengths correspond with the degrees of holes, but not with the degrees of expressions. In fact, a degree-N expression does not contain any expressions of degrees other than N. If we consider ourselves to be working with higher quasiquotations of an infinite degree N, but every (close ...) actually occurring in our data has finite degree, then we can represent our data using finite lists. We never need to select a finite value for N!
In Haskell, the strongly typed variations I wrote do require a finite value for N to be decided beforehand (and baked into the name of the type), because I'm pretty sure that vastly simplifies the type system features I would need to use to get it to work. :-p Roughly, the difficulty is that I need 0 to give me something of kind ( * ), 1 to give me something of kind ( * -> * ), 2 to give me something of kind (( * -> * ) -> ( * -> * )), and so on, so I would need kinds that depend on values. Agda or Idris might already be up for this task, but I doubt that's going to be the kind of effort I want to spend since I intended to build my macroexpander in (untyped) Racket to begin with.
There might just be a little more I want to tinker with in Haskell, because on my way to defining this data structure I started to develop some code for "higher monads," and it would be fun if this data structure turned out to be an instance of that concept. Still, at this point I'm probably ready to go back to Racket.
I'm even hopeful again that this macro system might play nicely with Racket's. Racket has some hygiene features that walk recursively over the inputs and outputs of macros, expecting them to be s-expression-shaped with no holes. I wasn't sure that Racket's technique could be reconciled with higher quasiquotation. With this data structure, the exotic nesting of higher quasiquotations is converted to the traditional kind of nesting that Racket expects.
So, this might be a usable self-contained Racket library in the end -- not even a framework but a seamless library -- which is what I want, because that would make it easy to clarify the meaning of higher quasiquotation in terms of an existing ecosystem before I use it in Cene. :)
Since this thread and my code comments contain some walls of text, I'll see if I can convert them to a blog post soon.
 In particular, Racket uses a hygiene technique where it attaches a fresh scope label to a piece of syntax before macroexpanding and then flips the presence of that scope after macroexpanding so that the scope winds up attached to only the parts of the macro result that don't come from the input (making that region more local than the rest). Racket also has a "syntax taints" feature which lets macros attach "dye packs" to their results so that the identifiers occurring in those results can't be used by a client to access private module bindings.
Maybe I could actually use something like that ^ and $ syntax.
I'll need to generalize it to an infinite number of degrees:
Since Scheme's ,',',', technique doesn't generalize to other DSLs, I'll want to have at least one way to unquote from more than one level of nested quasiquotation at once:
Cene has more than one notation for doing that. (I can write a label on an outer layer, and then I can unquote all the way to a particular label.) I'd like to support various unquoting styles as macros, but maybe that macro system can expand to a notation like this, so getting this to work first seems best.
Finally, representing data structures with orphaned parts might be easy enough once I actually try using key-value tables to hold orphans like I've planned.
I think I like this approach even better than the one I reached in the blog post. It means that I actually can process an infinite number of degrees of quasiquotation "in the reader" rather than letting the top level of macroexpansion begin with an s-expression.
But I suppose this and the blog post tackle two different parts of the problem. The blog post's approach/challenges still apply to the process of parsing this syntax into an infinite-degree quasiquotation.
No, just like you can nest parentheses without spontaneously having quasiquotation, you can nest quasiquotations like you're talking about without reaching a higher degree of quasiquotation.
Suppose we have notations like so:
( ) parentheses
` , quasiquotation
^ $ the next higher degree of quasiquotation
(I would go on, but I'll run out of punctuation.)
S-expressions are bounded on both sides by single characters, and they nest within each other like this (labeling the layers as "a" and "b" and anything outside their lexical extent as "__"):
(a (b) a) --
If we want to look at just the "b" s-expression, it's simple to write that down on its own:
Quasiquotations are bounded on both sides by s-expressions, and they nest like this:
`(a `(b ,(a ,(--) a) (b) b) a ,(--) a (a) a) --
The meaning of "--" in the inner expressions hasn't changed: Every "--" is outside the lexical boundary of the `(a ...) quasiquotation, just as every "a" is outside the lexical extent of the `(b ...) quasiquotation. Occurrences of "--" can appear when the lexical extent closes on an s-expression (")") or a quasiquotation (","), and both of these are shown in the example.
We can isolate the "b" part pretty easily again, but we need to allow for holes in our data structure:
`(b ,(--) (b) b)
Quasiquotations of the next higher degree are bounded on both sides by quasiquotations, and they nest like this:
a ,(b (b) b) a ,(b) a `(a) a (a) a)
b ,(a) b `(b) b (b) b)
a ,(--) a `(a) a (a) a)
Occurrences of "--" can appear when the lexical extent closes on an s-expression (")"), a quasiquotation (","), or a quasiquotation of the next higher degree ("$"), and all of these are shown in the example.
This time, writing down just the "b" part gets tricky using s-expression-shaped syntax because one of the holes has orphaned sections inside (holes in the hole):
$`(-- ,(b (b) b) ,(b))
b ,(--) b `(b) b (b) b)
When we have more than one orphaned section in the same hole like we do here, we may need to use labels (or positions, or extra hole structure) to tell them apart so we can insert the "b" section into the "a" section deterministically. So far I haven't figured out how to represent this data in a way that works, let alone an elegant way.
For a concrete use case, look at it this way: When do we use ( and )? When our DSLs don't last all the way to the end of the file. When do we use ` and ,? When our DSLs have holes in them (although it might be unusual to hear this, because most Lisps couple these syntaxes together with a particular nested-list-generating DSL). When do we use ^ and $? When our DSLs have holes with holes in them. And so on, where higher degrees of quasiquotation have more and more intricate holes.
Let's say I'm writing a macro that implements a DSL where Common Lisp code and Arc code can be combined in the same function. (I'm going with Common Lisp so that we can't simply compile it to inline Racket code.) I may have Common Lisp s-expressions occurring under my Arc s-expressions and Arc s-expressions occurring under my Common Lisp s-expressions, but I want Arc variables to be visible in all the Arc parts and Common Lisp variables to be visible in all the Common Lisp parts. When we take a look at the lexical scope of any one local Common Lisp (or Arc) variable in that code, it's not simply shaped like an s-expression like traditional lexical scopes are; it has holes-with-holes wherever the Arc (respectively Common Lisp) expressions occur. So ^ and $ are a natural fit for the DSL:
To summarize this post, I'm trying to generalize macros and quasiquotation in several ways all at once, but most of the hard problems I'm encountering center on an idea I call "higher quasiquotation":
- A quasiquotation is data shaped like an s-expression with s-expression-shaped holes.
- A quasiquotation of higher degree is data shaped like a quasiquotation with quasiquotation-shaped holes.
I already want to define macros whose bodies are not s-expressions but quasiquotations, so I already want to move up one degree on this progression. To make sure this design is correct, I want it to work even if I move up an arbitrary number of degrees. If it works, I expect one macro system to work for quasiquotation macros, regular macros, and reader macros, all in the same generalized way.
Quite a while ago I silently made the decision to keep my opinions on language design to myself most of the time unless I had a language to show for it. :) Cene has come further along than any language I've made before. Part of that is because I learned some skills at work, and part of that is because I didn't want to bring it here until it was in a very polished state.
Cene has currently stalled a bit as I try to generalize the macro system to support quasiquotation macros, reader macros, and better error messages. This post touches on that topic a bit, particularly the error message part. In the 16 days since this post, I've tried to work on an implementation of this new macro system. My initial attempts were full of errors, and as I keep trying to organize my thoughts, I've learned a lot about Racket macros and category theory, but I still haven't found a way to implement the macro system. In a couple of days I'm planning to write another post about where I've gotten with that and what the difficulties are.