I'm sorry it took me so long to reply to this. Basically, I don't think I've successfully explained to anyone what a hypertee is, ever. There are some places in the comments in Punctaffy where I explain them, but I haven't put in the work to make it a very well-illustrated introduction.
Lately I realized I shouldn't be representing my higher quasiquotation/hypersnippet macro system's syntax with plain old hypertees anyway, but with something I'm calling "hypernests." So I implemented hypernests... in a flawed way that didn't actually serve the purpose I expected, so now I'm in the middle of some refactoring to fix them. It's hard for me to justify talking about the things I've built in Punctaffy when I know I can explain and motivate the topic a lot better once I have a working macro system to show for it.
It never seems like it should be that much work to just take out a piece of paper and draw up some diagrams for a blog post... but whenever I get started trying to explain like that, I usually realize I've been doing certain things wrong and need to refactor.
I hope to have something soon. I've written up a lot more unit tests, and the latest refactoring of the hypernest implementation is becoming as simple as I always hoped this kind of thing could be; the primary risk I anticipate is that it'll fall into infinite loops. I technically already have a working macro system for extensible `quasiquote` which could serve as a demo, but I'm pretty sure it breaks for operators of higher dimension than `quasiquote`, and that's what my refactoring is going to fix.
If it helps to write a very short explanation:
In quasiquotation syntax, the unquote operation is like a 1-dimensional closing bracket, just as the closing parenthesis is a 0-dimensional closing bracket. See, the unquoted part of the code starts at one (0-dimensional) location in the text and stops at another, so it's like a line segment. We can imagine 2-dimensional closing brackets which are shaped like quasiquotations, and so on.
The 1-dimensional closing bracket actually begins with a 0-dimensional bracket that opens a 1-dimensional region that must be closed by another 0-dimensional closing bracket.
The whole thing is the 1-dimensional closing bracket.
The parenthesis at the end is the 0-dimensional bracket that closes it.
If we write a single opening bracket, including all the closing brackets it needs, and all the closing brackets those closing brackets need, etc., I'm pretty sure we have an opetopic shape as used in higher category theory: The closing brackets are the various-dimensional source cells of the opetope.
If we label each of the closing brackets (of every dimension) of the single opening bracket with a data value -- or from another point of view, put an "unquoted expression" into every hole of our higher-quasiquotation-shaped syntax, then that's what I call a hypertee.
If we have a syntax with closing brackets and (nestable) opening brackets, and we put labels on all the closing brackets of the outermost opening bracket (labels which we can think of as "unquoted expressions") and labels on all the nested opening brackets (labels which we can think of as "operators" or "macro names" which apply to those opening brackets' contents), then that's what I call a hypernest.
Closing brackets have to be of a dimension strictly lower than the bracket they're closing. That's different from opening brackets; we can nest a high-dimensional opening bracket inside of a low-dimensional one.
Nesting opening brackets are pretty exotic if you consider them from the geometric standpoint of opetopes -- how does it make sense to have a low-dimensional shape with high-dimensional faces on it? -- but it's necessary for Punctaffy's syntax purposes. That's because we need to be able to write a quasiquotation operator of some specific dimension N that can quote any operator in the language, including those of dimension N or greater. This is why I've needed to move to hypernests for syntax lately, even though I spent a lot of time thinking I could get by with hypertees.
For separate compilation, it does seem clear that what gets serialized will be references like "the object [probably a function] named 'foo in module bar", and structures (s-expressions or otherwise) containing such references. Given that compilation implies macroexpansion, you do have to assume (or verify) that the macros from other modules are what they used to be—and that non-macros (used in functional position at least) are still non-macros. If you have a full-blown Makefile kind of build system, then by default I suppose every file's output depends on the contents of every other file that it uses; or, as an optimization, depends merely on the exact set and definitions of macros exposed from those files. (In the C++ system I encounter at work, code is separated into .cpp and .h files, and editing a .h file causes the recompilation of every .cpp file that recursively depends on it, but editing a .cpp file only causes its own recompilation. If you wanted to imitate that, I guess you'd put macros into a distinctively named set of files, and forbid exportable macros anywhere else.)
Thanks! I've sold out and have been working for a medium-sized company doing mostly C++ and bash (the latter is unbelievably useful) for the past 3.5 years. I make intermittent progress on the side doing other things.
> To handle special forms, a solution is to come up with a globally unique random prefix long enough that no one could generate it accidentally (e.g. "xVrP8JItk2Ot"), and then rename the special forms `assign`, `fn`, etc. to `xVrP8JItk2Ot-assign`, `xVrP8JItk2Ot-fn` etc;
Would this prefix be present in the source files of the language implementation? Or generated at runtime, or something else? If the latter, it seems like gensyms are the right approach. If the former, what's the use case—someone writing programs that assume someone might have redefined "assign" and want to work with the builtin? (If it's just hacking arc3.1 to do what we want, I think you can create more or less arbitrary unique objects (vectors, cons cells, a new kind of struct), give them bindings in the base Arc environment, and replace e.g. '(eq? (xcar s) 'quote) with '(eq? (xcar s) the-quote-object).)
Speaking of gensyms, I had a plan recently. I like PG's approach of just sequentially named variables, but (a) my ideal language would probably expose its interned-symbol table, and it'd therefore be more orthogonal and simpler for it to directly expose the "create a symbol without interning it" function, so uninterned symbols should be available for free; (b) it is possible that people will name a variable "gs1", so that's another reason to use true gensyms; (c) many macros use gensyms, but I might like to bootstrap a system that likes macros the expansion of which incurs zero side effects, and incrementing a gensym counter is a side effect. For (c) there might be another approach, but I came up with this one: Have the printer assign sequential numbers to each uninterned symbol it encounters (with a weak hash table). This has the nice effect that, when you do print macroexpansions, the numbers you see will begin at 1, rather than in the hundreds or thousands and varying depending on how many libraries have been loaded.
> (Making `fn` a macro also has the advantage that features such as default arguments can be implemented in Arc).
Yes, that is nice. I actually did this in my Lisp in Summer Projects submission many years ago.
> is there a reason to use different prefixes for macros and functions? We could have a read syntax which evaluated to the value of the symbol (whatever it is), and that would work for both functions and macros.
Eh, no strong reason. I'd thought about doing a reverse variable lookup for every value and printing the variable name if it existed, but it makes less sense if e.g. the value 10 gets printed as #V:<some global variable that happened to be bound to 10>. Also, I figured an IDE-type setup (such as DrRacket) might use different colors or fonts to distinguish macros/functions/special forms/other. Last, you do need a way to print lambdas that don't have a global name (and might not even have a local name), and I'm guessing you'd want that to appear as #f:<some attempt at indicating line number / REPL input>, looking analogous to but different from the global case.
 https://github.com/waterhouse/emiya : "[O]ptional arguments are implemented by redefining "fn" (Arc's equivalent of "lambda") as a macro that expands to a call to "underlying-fn", which is bound to the fn special object--again, by the user. Then everything is recompiled, so that this redefinition does not cost anything at runtime; some care is needed here, to avoid an infinite loop, if a function used to recompile "fn" itself uses "fn"."
Thanks for the pointer. He has an entire chapter on hygiene... And then there is this:
"There is no need to provide quotation here because, having failed to enforce the prohibition against embedding combiners in a macro expansion, we don’t need to embed their unevaluated names in the expansion."
It's nice that his primitive $vau grabs the current lexenv, as this enables another kind of macro-like behavior I've thought might be useful: taking subexpressions, fully macroexpanding them, and doing something with the result (e.g. determining whether a variable is ever used, and omitting a computation if not). I don't know how that would mesh with the interpreter's semantics, though...
I'll probably have to read and brood on this further.
That is, the function takes "something that's not quite itself" as an argument, and returns a function which does one step of computation and may do a "recursive" call using the thing that was passed into it. The implementation would therefore like to be:
(def fix (f) ;aka Y
((f (fix f)) i)))
But, if we're doing the entire thing with anonymous recursion, we can (laboriously) implement fix like this:
Every recursion step involves creating multiple lambdas. Eek. (It's even worse if you use the general, n-argument Y combinator, in which case you must use "apply" and create lists.) Whereas with hjek's non-curried approach, only a constant number of lambdas have to be created at runtime. (Optimizing compilers might be able to cut it down to 0.)
If you want to create a macro like afn or rfn, and want the user to be able to act like the function is named F and accepts just the parameter i, you can put a wrapper into the macroexpansion, like this:
(rfn F (i)
(+ 1 (F (item it)))
(f f i))
(fn (f i)
(let F (fn (i) (f f i))
(+ 1 (F (item it)))
And in this case, while the code does call for creating an F-lambda on every recursive call, I think it's easier for the compiler to eliminate it—I don't remember whether I'd gotten Racket to do it. (I think it probably did eliminate it when working with Racket code, but Arc, which generates all the ar-funcall expressions, might not have allowed that.)
The actual code for rfn will create a variable and then modify it, creating a lexical environment with a cycle in it. That's certainly a more straightforward approach. I figure the above is useful only if you're working in a context where you really want to avoid mutation or true cycles. (For example, I am considering a system that detects macros whose expansion is completely side-effect-free. It might be easier to use the above approach to defining iteration than to teach the system that rfn is "close enough" to being side-effect-free.)
Especially for modules, I do like using function and macro values in macro expansions.
Consider a module which contains a macro such as `for`. I can import the macro into my module simply by copying the macro value into my module, e.g.: `(= for other-module!for)`. By using function and macro values in the macro expansion (as you do in `for2`), the functions and macros that the imported macro uses don't also need to be copied into my module.
I haven't worried myself about how most terms end up getting a comma prefix, but I agree there could be an alternative syntax where injecting the value of a symbol (instead of the symbol itself) could be the default.
To handle special forms, a solution is to come up with a globally unique random prefix long enough that no one could generate it accidentally (e.g. "xVrP8JItk2Ot"), and then rename the special forms `assign`, `fn`, etc. to `xVrP8JItk2Ot-assign`, `xVrP8JItk2Ot-fn` etc; and then add `assign` and `fn` macros etc. that expand into the prefixed special form. Now `assign`, `fn`, etc. are macros that can be used as values in a macro expansion.
(Making `fn` a macro also has the advantage that features such as default arguments can be implemented in Arc).
I was at first concerned that using such a random prefix was ugly, however eventually I realized that no one ever needs to see the prefixed versions :-)
> (aka for them to not be "hackable")
I was indeed curious about maximizing hackability as a general principle... but with modules even if a macro such as `for` is imported into my default namespace and thus expansions such as `with` wouldn't use my version, in practice it's easy enough to create a new module with my version `with` etc. and simply load the `for` source into the new module. (Or simply reload the `for` source into my default module).
> (eval (list + 1 2))
This is unusual in Lisp implementations, and I was quite pleased when I discovered that Racket supported this.
(#M:with (i nil gs3342 1 gs3343 (#F:+ 10 1))
This is a nifty idea... hmm, is there a reason to use different prefixes for macros and functions? We could have a read syntax which evaluated to the value of the symbol (whatever it is), and that would work for both functions and macros.
That's because it is a hack (as mentioned in my original comment edit#1).
My comments are only intended provide whatever help I can towards the original posting context which suggested a strict CSP criteria.
None of these things have to be done. It's up to you to decide, so really the question becomes what are you doing it for? Are you building a news site for a community of a few thousand people in a niche group? or are you making a news app that others can buy into for their own product/uses? The latter would make me want to ensure it's CSP capable, while the former - not so much.
> It will be difficult to deal with some styling functions from Arc, like 'grayrange'...
I would just create 10 or 20 or whatever number of css entries that act as a segmented gradient (call them .color-reduct1 to .color-reduct10) then create a server side function that takes the output value of grayrange and picks one the css entries. Then add that class to the html element and you're good to go. It's not a perfect gradient but it would be enough that I doubt it would make any noticeable difference.
Js is also an option, but then you have to store and pass the score into the js calculation which requires much more work then the above solution. Plus it forces you to expose the score (which HN no longer does)
> I wonder in which file the CSP would need to be implemented in Arc, or whether it's easier to set them in an Nginx config.
If you want to make code that's generic and useable by others then it needs to be in arc (not everyone will use Nginx). I suggested using arc templates  already and I still think this is the right way go. Establish the base template definition in srv.arc and then each app can modify that base template from their app file. Additionally allowing defop to optionally pass in over-rides will make it dynamic if you need that variance.
I'm sure there are dozen ways to do it, but that's my suggestion anyway.
This came up during ar development, although I don't remember who brought it up. I think aw was about to implement it but was concerned that it would be annoying at the REPL to have all the program's bindings be looked up eagerly (aka for them to not be "hackable"). It could be annoying for mutually recursive macros, too, although those are already tricky enough that it probably wasn't a big concern at the time.
I remember recommending an upgrade: Instead of generating:
`(,my-func ,a ,b)
Generate something that preserves the late binding of mutable global variables, like this:
`((',(fn () my-func)) ,a ,b)
I seem to remember aw wasn't convinced this cruft was going to be worth whatever hygiene it gained, even after I recommended a syntactic upgrade:
`(,late.my-func ,a ,b)
Since aw values concision a lot, perhaps the issue was that most programmers would surely just write ,my-func in the hope that it would be rare (or even incorrect) to ever want to rebind it, and then other programmers would suffer from that decision.
But Pauan followed a design path like this in Nulan and/or Arc/Nu. In Pauan's languages... actually, I think I remember a couple of approaches, and I don't remember which ones were real.
One approach I think I remember is that `(my-func a b (another-func c) d) inserted mutable boxes for my-func and another-func into the result, but it would just implicitly unquote a, b, c, and d, because variables at the beginning of a list are likely to be mutable globals and variables in other positions are likely to refer to gensyms.
There might have even been an auto-gensym system in that quasiquote operator at some point.
I liked this approach a lot at the time. That was the same time I was working on Penknife. I was trying to avoid name collision with first-class namespaces in Penknife, but I still wanted macros to work, and I was trying to take a very similar approach. (I couldn't take exactly the same approach because my macroexpansion results were strings; instead, I think I allowed regions of the macroexpansion result to be annotated with a first-class namespace to use for name lookup.)
When Penknife's compile times were abysmally long, I started to realize that even if I found an optimization, this was going to be a problem with macros in general. Anyone can write an inefficient macro, and anyone who can put up with it can build a lot of stuff on top of it that other users won't be able to appreciate. So I started to require separate compilation in my language designs.
With separate compilation in mind as a factor for the language design, it no longer made sense to put unserializable values into the compiled code. Instead, in Penknife's case, I devised a system of namespace paths to replace them. The namespaces were still first-class values, but one thing you could do with a Penknife macro was get hold of its first-class definition-site namespace, so the macroexpanded code could refer to variables in distant namespaces by specifying a chain of names of macros to look them up from. And this kept things hackable, too, since you could mutate a macro's definition-time namespace explicitly (not that many programs or REPL sessions would bother to do that).
Not long after, I set a particular goal to make a language (Era) where the built-in functionality was indistinguishable from libraries. Builtins aren't hackable from within the language, so the module system needs to make it possible (and ideally easy) for people to write non-hackable libraries.
(Technically they only need to be non-hackable to people who don't know the source code, because once you know the source code, you know you're not dealing with builtins. I intend to take advantage of this to make the language hackable after all, but it's going to take essentially a theorem prover in the module system before it's useful to tell the module system you have the source code of a module, as opposed to just using that source code to compile a new module of your own behind the module system's back.)
Anyhow, this means I haven't put hackability at the forefront for a long time.
I think the embedding-first-class-values approach will work, and I think late binding is workable (by using late.my-func) and there's a workable variation of that late binding approach to enable separate compilation too (by using namespace paths made out of chains of macro names). So I like it, but I just have this stuff to recommend for it to make it really tick. :)
By the way, it's good to see you! I wondered how you were doing.
Check out John Shutt's thesis on Kernel: https://web.cs.wpi.edu/~jshutt/kernel.html. He takes your observation that quasiquotation is pointless to the limit, dropping it entirely and constructing his macros out of cons and friends.
I don't believe so, but it depends on what relationships and queries you have in mind.
I think you just need to maintain an index of story ids for each tag then alter the page list code to check each story against chosen tag indexes in order to apply custom filters. And you will also need to bypass the page caching operations for these filtered pages (just given the number of combinations that are possible).
Obviously db capabilities would make it (and everything else) much better, but it's not a showstopper.
> What's needed is the ability to pass custom headers from the application to srv.arc
There is the possibility of just putting the CSP into a meta tag within the page header, but I didn't suggest that because not all CSP options are available when using the meta tag.
I think you're right in that being able to dynamically add headers is the right way to go. When I moved from arc to clojure I did this by implementing something like arc templates  and used them to pass attributes through to the server ops. I ended up with a 'defop' like call that took an options hash-map argument (i.e. a template instance) which then generated the headers dynamically (with built in sane defaults).
Strict CSP settings are a form of whitelisting what js, css etc, is valid thus protecting from injection. Inline code for both js and css can't be whitelisted like header items can be so they will fail (unless you use the hash code hack mentioned for js).
"By controlling a little bit of text in the victim domain, the attacker can inject what appears to be a valid CSS string. It does not matter what proceeds this CSS string: HTML, binary data, JSON, XML. The CSS parser will ruthlessly hunt down any CSS constructs within whatever blob is pulled from the victim's domain...."
"A policy needs to include a default-src or script-src directive to prevent inline scripts from running, as well as blocking the use of eval() . A policy needs to include a default-src or style-src directive to restrict inline styles from being applied from a <style> element or a style attribute."
So it's just the 'style' attribute people worry about and strict CSP manages.