Arc Forum | Python-like modules

Arc Forum

6 points by rntz 6358 days ago | 30 comments

Although I like most of Arc/Anarki, I think the language is going nowhere without a decent module system. So I've made a simple one, with semantics based on Python's. It's not an exact replica, and it has its flaws; I'm not saying this is the best way, and certainly my implementation could be improved, but at least it's something. The file is in the Anarki git, as lib/module/python.arc.

Modules, once obtained, are essentially 'eval equivalents: you give them an expression, and they evaluate it within the module. The most common usage will probably be to get the value of a symbol, (mod 'symbol); for which we can use the ssyntax mod!symbol, just as you would a table. The most important underlying function is 'module-require, which takes a filename and, if it has not been module-required before, creates a new module and loads the file into that module and returns it. Otherwise it returns the module already created. The 'use macro is the primary entry point for using modules. It takes any number of "clauses": (use <clause>...). Clauses currently come in 4 flavors:

<name>, eg (use test). This is the simplest form. It module-requires "test.arc" as a module and binds it to 'test. This is "import <name>" in Python.

(as <name> <alias>), eg (use (as long/name foo)). This module-requires the file "long/name.arc" as a module and binds it to 'foo. This is "import <name> as <alias>" in Python.

(from <name> <syms>...), eg (use (from test myfun mymac)). This requires the file "test.arc" as a module and imports the symbols "myfun" and "mymac" from it into the current module (the toplevel is implicitly a module). This is "from <name> import <syms>, ..." in Python.

(all <name>), eg (use (all test)). This loads the file "test.arc" as a module and imports all global symbols in it into the current module. This is "from <name> import *" in Python.

This system has several major problems. The biggest is obviously macros. You cannot use macros from a module using the mod!symbol syntax, or the (mod 'symbol) form it translates to. This is a general problem with Arc, IMO: macros are not first-class. This can't be fixed in the general case without changing ac.scm to interpret, rather than translate, arc code.

Another problem is that (use (all <name>)) doesn't make the current module "inherit" from <name>, it just binds all symbols currently bound in <name>. So redefinition of a value in <name> won't affect a module that's done (use (all <name>)), nor will definition of a new variable be visible. This is due to the underlying "namespaces" from mzscheme I used to implement the module system.

3 points by almkglor 6358 days ago | link

CL somehow manages to make macros-in-packages work without resorting to first-class macros (which would make true compilers very, very, very hard to implement).

Perhaps the CL way would be a better style?

-----

2 points by rntz 6357 days ago | link

Supporting first-class macros is like adding 'eval to a language: it means there are always corner-cases which will be essentially uncompilable (you will always need the interpreter), but it doesn't mean you can't deal with the vast majority of cases efficiently. Type inference, already useful for compiling a latently-typed language like Arc or Scheme, could be extended to cover whether an expression in functional position might produce a macro. If it could, you'd have to fall back on the interpreter, otherwise not. In most cases, objects in functional position are globally-bound symbols or (fn...) expressions, which are trivial cases. The main exception is higher-order functions. Inlining or specializing those can fix this, and is often a good way to increase efficiency anyway.

Certainly a problem, but not an insurmountable one.

-----

1 point by stefano 6356 days ago | link

If you have a compiler available at runtime, you don't need an interpreter to support eval: you can just compile the code at runtime(once you have the code to eval) and then load the compiled code into the system.

-----

1 point by almkglor 6356 days ago | link

Hmm. I suppose we would need some sort of caching system to handle cases where a macro is modified.

Basically we don't compile sub-expressions until they're actually executed, and if they are, we check for macros in the expected places and then compile.

Then if a sub-expression is entered again we check if the macro code has changed and re-perform the macroexpansion if it has.

-----

1 point by stefano 6355 days ago | link

Caching seems necessary. A simple memoization could work, but hashing long pieces of code could be pretty slow. Maybe a bit telling if the macro has changed after its last execution would be faster.

-----

1 point by almkglor 6355 days ago | link

It could be done. Still, this feels like bending over to support something that I don't actually think will be really, really necessary. The pasted http://arclanguage.com/item?id=7451 seems good enough for making macros-in-modules, if you don't want to hack off symbols (i.e. the CL packages way).

As an aside, I think we do want to hack off symbols the CL packages way. Why? Well, suppose module1 wants to define an internally-used type called my-type, and module2 wants to define an internally-used type called my-type, too. If module1's symbols are the same as module2's symbols, then their entries in the call* table will conflict. But if module1's my-type is really 'my-type@module1 and module2's my-type is really my-type@module2, then they can both have their own types.

-----

1 point by rntz 6354 days ago | link

One simple and elegant solution to the latter problem is to use uniq'ed symbols for internal types. Or use an actual data structure, if you want the type to carry information.

Moreover, what happens if module1 depends on module2? Then, unless you have some way of hiding the symbols a module defines, they'll end up being the same symbol anyways. Of course, this (defining exports explicitly) is how CL handles things, but it's not very much in the style of exploratory programming. In fact, that's why I dislike CL's package system in a nutshell: it requires too much attention, attention I should be giving to code.

-----

1 point by almkglor 6354 days ago | link

> One simple and elegant solution to the latter problem is to use uniq'ed symbols for internal types.

But this would have to be an idiom, and the Lisp way is, there is no idiom: something has to encapsulate this away automatically.

If it's an idiom, it needs a macro.

> Of course, this (defining exports explicitly) is how CL handles things,

I would have thought this was safer. Basically, I'd define my code first outside of any package/module, then when I've got it mostly working and want to release then I'd put it in a module, hiding away my internal functions (which I might want to change later, and which I define as being "implementation-specific").

-----

1 point by rntz 6354 days ago | link

> If it's an idiom, it needs a macro.

So make one.

    (mac uniqtype (name) `(= ,name ',(uniq)))
    (uniqtype my-internal-type)

And then, instead of (annotate 'my-internal-type foo), you do (annotate my-internal-type foo).

> I would have thought this was safer. Basically, I'd define my code first outside of any package/module, then when I've got it mostly working and want to release then I'd put it in a module, hiding away my internal functions (which I might want to change later, and which I define as being "implementation-specific").

Naming conventions (eg. preceding dash or underscore for "private") are simpler than export lists, and don't restrict a power user from using "implementation-specific" functions if s/he has unanticipated needs - as someone inevitably will. They might cause name clashes, but unless you're using a symbol-mangling module system, this doesn't cause problems.

I also have various other issues with the CL package system, most of which are CL-specific and unrelated to the "mangling symbols" method of implementing modules in Arc. We could go back and forth with this argument for a long time, but it seems rather fruitless to do so, especially compared to actually implementing the ideas.

-----

2 points by almkglor 6353 days ago | link

> And then, instead of (annotate 'my-internal-type foo), you do (annotate my-internal-type foo).

The problem here is that the new type name is completely unreadable. What if the new module needs several new types? What if the module would very much like to use 'defcall on its types? 'defm?

IF something is being done in a language in a particular way, there's probably a reason why it's done that way. It might not be a good reason, but first we need to analyze it. As far as I can grok there are very good reasons for CL packages to hack on symbols, and they all have to do with the fact that symbols can be used for lots of things: function name identifiers, macro name identifiers, global variables, named enumerations.

-----

1 point by rntz 6352 days ago | link

> The problem here is that the new type name is completely unreadable. What if the new module needs several new types? What if the module would very much like to use 'defcall on its types? 'defm?

Unreadability could be solved by making a 'uniq function that takes a prefix instead of "gs", which 'uniqtype would use. If the module needs several new types, then it creates them; I don't see the problem here. The fact that 'defcall and 'defm assume static type names can be hacked around via 'eval, if necessary; but it's inelegant to have to do so. The reason why all this is inelegant is because everyone has assumed hardcoded symbols are the way to go for types. Only by layering something on top of that, something that makes hardcoded symbols not what they appear to be, such as a CL-like package system, can you get around this without changing existing conventions.

My conventions are different, probably because I'm partial to languages like Ruby, Python, and Smalltalk, where types are structured objects and not simple names. For example, my instinct on making tagged objects callable would be to use ((tag <object>) 'call) as the caller function. Then tables, functions, or even tagged objects could be used as types. This would eliminate the need for 'call*.

Of course, we have 'defcall rather than the above, so obviously this is not the one and only true way to do things. I honestly don't know what the best way of solving these problems is. My initial guesses seem to differ from yours, but that's all they are - initial guesses. I'm not bound to them. But it's hard to do a comparison when all you have are hypotheticals, and no implementation to play around with. So, are you gonna make a symbol-mangling system, or what?

-----

1 point by almkglor 6351 days ago | link

The main reason I'm more partial to names is SNAP, which is shared-nothing except for really, really static objects, such as code and symbols. Global variables have an overhead in assigning to them, and global structures are simply not mutable (the process effectively gets its own copy of the structure in the global variable, so any mutations occur in its own copy)

In theory it would be possible to add a type object that is dynamically created but is immutable once created, but attaching everything to such a type object would make dynamism a little slower. My plan had been to define polymorphic functions on (probably symbol) types, and replace 'call* with overloading of apply:

  (defcall foo (v x)
    (do-something-on-foo v x))
  =>
  (defm <base>apply ((t gs42 foo) x)
    (let v (rep gs42)
       (do-something-on-foo v x)))

> So, are you gonna make a symbol-mangling system, or what?

Been thinking of that somewhat; basically the most straightforward would be a symbol-conversion macro upon which any symbol mangling system can be built:

  (resymbol (foo foo@bar
             nitz nitz@bar)
    (+ (foo this) (nitz that)))
  =>
  (do (+ (foo@bar this) (nitz@bar that)))

-----

1 point by absz 6352 days ago | link

Speaking of "real gensyms," is there a reason uniq isn't simply defined (in ac.scm) to be

  (xdef 'uniq gensym)

? mzscheme's gensym has all the right properties:

  > (gensym)
  g26
  > (define g (gensym))
  > g
  g27
  > (equal? g 'g27)
  #f
  > (gensym 'prefix)
  prefix28
  > (gensym "string")
  string29

Does something break because they aren't "real" symbols? Or can we just (on Anarki) go ahead and make this change?

-----

2 points by conanite 6351 days ago | link

speaking of gensyms, is there a reason uniq isn't defined in arc? one less axiom? (on a related note, I don't know why sig is defined in ac.scm, it doesn't seem to be otherwise referenced there)

-----

1 point by absz 6351 days ago | link

Because if/when we have real gensyms, where if we have

  arc> (= gs (uniq))
  gs2003

then (is gs 'gs2003) returns nil, instead of t (which it currently does), we won't be able to define uniq in Arc. It's another datatype (the "uninterned symbol"), and thus it needs an axiom. The axiom could be something besides uniq (string->uninterned-symbol, for instance, or usym) if (isnt (usym "foo") (usym "foo")).

-----

1 point by conanite 6351 days ago | link

Aha. So uniq in its current form is really 'kinda-uniq. How important is it that gensyms aren't equal to each other? I mean, if "everyone knows" /gs[0-9]+/ is "reserved" for gensyms, then all we need do is not make symbols matching that pattern. Thus is the language minimaler.

I'm just being lazy. It can't be that difficult to implement ...

-----

2 points by absz 6351 days ago | link

For now, it's not: see http://arclanguage.org/item?id=7529 . Thanks to mzscheme, it's a one-line change :)

And does that really make the language more minimal? If we leave uniq as it is, we could move it to arc.arc, but we have the axiom that symbols of the form /gs\d+/ are forbidden; if we change uniq, uniq is an axiom, but we don't have to worry about formats.

-----

2 points by stefano 6350 days ago | link

Even if Arc were not based on mzscheme the change would be minimal: it takes exactly the same operations as creating a normal symbol, with the difference that it doesn't have to be interned.

-----

2 points by conanite 6350 days ago | link

"doesn't have to be", or must not be interned? I'm thinking the latter, if the point is to guarantee that no other symbol shall be equal to a gensym ...

-----

3 points by stefano 6349 days ago | link

You're right: it must not be interned.

-----

2 points by almkglor 6351 days ago | link

I don't think anything would break; care to try?

-----

1 point by absz 6351 days ago | link

Alright, done and on the git.

-----

2 points by stefano 6357 days ago | link

CL package system separates the symbols, not the values associated with them. Once a symbol is seen by the reader it is interned in the package currently in use. It works during read time, before compilation time (and thus before macro expansion time) and before evaluation time. This is really particular and creates confusion to people used to more common module systems. I've also had some problems with the CL package system before understanding that it worked at read time. I've read a few articles saying that the CL package system is flawed, but after using it for a while I've had no more problems with it.

-----

1 point by almkglor 6357 days ago | link

Yes, and it appears to be the best solution if we want to have compile-time macro support, because the alternative would be a very slow AST-traversal interpreter.

Speed is of course never an issue, except when it is. ^^

-----

1 point by rntz 6357 days ago | link

That's not the only alternative.

A lightweight but hackish alternative would be to special-case modules in the translation process. This could be done by checking, whenever you have an sexpr of the form "((<name1> <name2>) ...)", whether <name1> is globally bound to a module and <name2> to a macro within the module.

A more general way to do this would be, when you have an sexpr of the form "((...) ...)", before doing anything else, to macroexpand the sexpr in functional position. Then you could write modules as macros, expanding to the evaluation of their arguments in that module (the macro equivalent of what mine do via 'defcall now).

I think, but am not sure (so feel free to prove me wrong), that the real dichotomy is not between CL packages and AST interpretation, but between supporting first-class macros or the equivalent, and making the module system part of the language itself (hacking ac.scm) rather than creating it within the language.

A CL-like package system, for example, breaks the symbol-string isomorphism: what symbol a string becomes now depends upon what module you're in, which means you can't just mimic namespaces by 'uniq'ing all the symbols in a module that aren't imported from some other module: 'read, 'sym, etc. need to be changed so that they know from what module they're being invoked. Again, I'm not sure of this, but I think the abstraction will inevitably leak unless you change the translator itself to support your module system.

-----

1 point by almkglor 6357 days ago | link

> A more general way to do this would be, when you have an sexpr of the form "((...) ...)", before doing anything else, to macroexpand the sexpr in functional position. Then you could write modules as macros, expanding to the evaluation of their arguments in that module (the macro equivalent of what mine do via 'defcall now).

Hmm. Care to try implementing this?

-----

3 points by rntz 6357 days ago | link

http://pastebin.osuosl.org/9215

Done. The above links to the result of "git diff" after the changes. While I think it works (it loads arc.arc fine), I don't want to actually push it to anarki in case I've made a mistake and there's some existing code it breaks. With it, and my module system, you can do the following:

In "test.arc":

    (prn "test.arc evaluated")
    (mac mquote (x) `',x)

At the arc repl:

    Use (quit) to quit, (tl) to return here after an interrupt.
    arc> (load "lib/module/python.arc")
    nil
    arc> (use test)
    test.arc evaluated
    #3(tagged module #<namespace>)
    arc> (mac test* (x) (test x))
    #3(tagged mac #<procedure>)
    arc> (test*.mquote foo)
    foo

If this gets pushed to anarki, it would be trivial to rewrite lib/module/python.arc so the "(mac test* ...)" line is unnecessary.

-----

1 point by almkglor 6357 days ago | link

Hmm. Looks good, although I'm dubious about the second diff block (problem is that I don't have access to an Arc right now, so I can't quite see where that modification is)

As an aside - could we possibly do this without depending on mzscheme namespaces? It should be possible to have the macro instead be something of the form:

  (mac module-name (x)
    (case x
      member  gs42 ; where gs42 is a (uniq)-ed symbol
      member2 gs43
              (err:string "module does not contain member - " x)))

-----

1 point by rntz 6354 days ago | link

The second diff block is just a continuation of the change to 'ac-macro?. The modified function looks like this:

    (define (ac-macro? fn)
      (let ((fn (if (pair? fn) (ac-macex fn) fn)))
        (cond
         ((symbol? fn)
          (let ((v (namespace-variable-value (ac-global-name fn) 
                                             #t 
                                             (lambda () #f))))
            (if (and v
                     (ar-tagged? v)
                     (eq? (ar-type v) 'mac))
                (ar-rep v)
                #f)))
         ((and fn
               (ar-tagged? fn)
               (eq? (ar-type fn) 'mac))
          (ar-rep fn))
         (#t #f))))

As for doing it without mzscheme namespaces, yes, that could be done. That would be the CL way, more or less.

-----

1 point by tung 6351 days ago | link

Nice. I always wanted to see Python-style packages in Arc: simple, but gets the job done.

-----