Arc Forumnew | comments | leaders | submitlogin
A Possible Suggestion for Loading Arc Libraries
3 points by ignavus 5428 days ago | 14 comments
Hi all -- I'm new around these parts and not exactly sure of the protocol for suggesting additions to arc. I'm hoping that I"m not out of line by posting here.

At any rate, it'd be great if arc had a library load path, something similar to Perl's INC. To that end, I whomped up a couple functions, "use" and "reuse" that will search a number of directories:

  - the current directory
  - $arc_home/lib
  - the colon-separated directories in $ARCINC
for a library. Code and documentation can be found here:

  http://homepage.mac.com/ignavusinfo/arc/use.arc.txt
Along these lines, is there a CPAN analog for arc? I ask because one of the great strengths of Perl (in my opinion, I'm not looking to provoke any sort of argument) is both CPAN as a distribution network and packaging methodology. (Perl modules are packages so the same litany of commands -- perl Makefile; make; make test; make install -- will build and install almost any CPAN library.) If there's an effort underway to develop a similar standard for arc, can someone point me to it? If not, is there any interest in such a thing?

Note that I do use the env function from the git version of arc. Is that an amiable fork?



3 points by shader 5427 days ago | link

One thing we could do for a "cpan" style system would be, instead of having one designated library server, use more of a debian-package style system, where anyone can host a server if they like, and then the tool can just browse all of the locations in your server list. Then we can have people like CatDancer just add their server to the list on Anarki, so when anyone else pulls the latest version they get the newest server locations.

A server would effectively be an http folder with a list of arc files and folders in it. There could be a file in the directory with a standard name that the tool would look at first to see what libraries where available, and where they were located.

Then the files wouldn't even have to be located on the same server, they could be anywhere else on the internet, as long as some server has a reference to them.

What do you think? How would you design the file format for determining what libraries were available and where? And what if more than one server tried to host the same library? How would you know which was newest, or which one you meant?

-----

3 points by CatDancer 5427 days ago | link

While my attempt to use git as a way of sharing libraries and managing library dependencies (as opposed to, or in addition to, using git in the normal way as a version control system) didn't work out, I learned a number of useful things from it. So while I don't have all the details, I do have some important design principles:

Every release of a library should have a globally unique name: the publisher (we can use our Arc forum username), the name of the library, and the release number. An example of the full name of my toerr library would be "catdancer.toerr.3"

Every release of a library has to be immutable. Once I publish "catdancer.toerr.3", I must never change it. The reason for this is meta-data about a libraries, conflicts, and dependencies; I might say Fred's foo library release 19 doesn't work with my toerr release 3, but there's no way to know if this meta-data is valid if Fred or I are changing our released versions.

I can unpublish a release if I need to; if "catdancer.toerr.3" has a horrible bug I can remove it from my server and publish a fixed version "catdancer.toerr.4". But I must never reuse "catdancer.toerr.3" for a changed version.

The release number is used internally by the tools. A new release number doesn't mean that the version is better. Compared to "catdancer.toerr.2", "catdancer.toerr.3" may be an alpha version or a release candidate or broken in some way.

If you want to have a meaningful version number for your library, make it part of the library name: "foo0", "foo1", "foo2". That way you can release a fixed version of "foo2" if you need to.

We should also be able to publish meta-data about my libraries. For example, I should be able to publish that "catdancer.toerr.3" is, in my opinion, the best available version of the catdancer.toerr library. Unlike a particular library release, which can be deleted but must never change, meta-data can be updated. I can later say that "catdancer.toerr.4" is now better.

The tools should naturally let me abbreviate library names. If I say (load "toerr.arc") or the equivalent, and there's only a "catdancer" toerr library available (or maybe I have "catdancer" first in my search list), and my meta-data says that "4" is the best one, then it should load "catdancer.toerr.4" for me. But if I want to explicitly load "fred.toerr.19", I should be able to do that.

Every library should be available at an http location by username. For example, if you're using git and github, you can publish your repository using the github Project Page feature (http://pages.github.com/).

Now we'll have a list associating usernames with http locations:

  (("catdancer" "http://hacks.catdancer.ws/")
   ("fred"      "http://fred.github.com/arcstuff/"))
this list needs to be kept at some central location, but nothing else does.

Now the tool can download any library it needs to. Given "catdancer.toerr.2", it can lookup "catdancer" in the list of locations, and download "http://hacks.catdancer.ws/catdancer.toerr.2.arc

-----

1 point by shader 5427 days ago | link

I agree with several of your pronouncements, such as the fact that libraries need to be immutable, etc.

I think that the naming system should use "user/libname/version" so that it can be reflected in the directory structure of the lib folder of arc where they will be downloaded, and can be easily turned into a path. I'd prefer to make it a 'sym instead of a 'string for less typing ;)

Thus the library loading code would be effectively:

  (use libname)
where libname is just a symbol representing a path to which "$arc_home/lib/" and ".arc" are attached. This makes it easy to load files which are already in your lib folder that you wrote, or downloaded. It works with downloaded libraries, and can hopefully detect if you've already loaded it or not, and avoid reloading it. In theory since it is a library it won't be changing and thus only needs to be loaded once.

We can also have other functions that fetch the libraries, or make 'use fetch them automatically. In that case it will need to know how to find them. To that end I propose we add a file to the lib folder called something like "server-list" or something. The file is merely a list of http locations, such as hacks.catdancer.ws, and fred.github.com/arcstuff. Each location will be a folder that contains a file named "libs.arc" or similar, which contains all of the metadata necessary to find the libraries belonging to that server.

Possible meta-data: Name of library, file name and location, date updated, author, dependencies etc.

In theory the package system can determine by reading these files which version of the library is newest, (by version number and date) or if given more particular criteria, find an older version.

I'm basing my ideas off of the debian package system, which I have found very useful. The nice thing about it is that anyone can host a package server, and a package can be located on any server. They aren't tied to a particular server by their name, for instance. It also automatically downloads prereqs, which can be very handy.

Summary:

1) Library is named by symbol which can be easily converted to a path: catdancer/erp/0

2) Servers which have meta-data on finding packages, a la debian packages/yum/gems, etc. for ease in hosting and finding packages.

3) The 'use function, whatever it's called should be able to load local libraries as well as online ones. It should probably also be capable of taking in a direct web address like CatDancer's lib function.

-----

1 point by CatDancer 5427 days ago | link

"user/libname/version"

I'd prefer that slashes not be included in the library name, so that it doesn't have to be stored in different directories.

I'll also be able to more easily upload a library to an HTTP directory if I don't need to create subdirectories.

Instead, have a name that local tools can easily parse, and then they can store libraries in directories user/libname/version if they wish.

-----

1 point by shader 5427 days ago | link

Ok. Why not support both? That way someone can create a large and structured library with multiple sub-folders if they need to.

I was also hoping of making a version of your 'lib hack to manage the libs already in the lib folder, instead of just those on the web.

  (lib binary)
seems like much less typing than

  (load "lib/binary.arc")
and also has the advantage of not reloading it if it's already been loaded.

On a side note, how hard would it be to selectively reload individual functions?

  (reload example-fn)
Since the anarki help system keeps track of what file the function was declared in, it could presumably be used to automatically read in the file and eval the proper form. I'm just somewhat tired of working in a large library file and having to reload all of the functions, even if I don't need to. (I'm using a very impotent linux box for development, so it can get rather slow)

-----

1 point by CatDancer 5427 days ago | link

a large and structured library with multiple sub-folders

I don't think we need to design this system to do everything that somebody might someday need. There's already plenty of solutions for distributing large collections of files such as zip or tar files; we don't need to invent something to solve that problem.

-----

1 point by Adlai 5427 days ago | link

EDIT: I didn't explicitly say this here, but I will now: I think that the version numbers should make a definite statement that "catdancer.toerr.3" is an inferior predecessor to "catdancer.toerr.4". I think that in the interest of avoiding "dependency hell", people should be expected to use the latest stable version of a library. However, to ease backwards compatibility, if a library developer sees that the next stable version of their library will break compatibility with other libraries which depend upon their library, they should inform the developers of the other libraries about this, so that once the new library is released, other libraries can be quickly updated to work with the new version.

I think that a good system for version numbers (and who doesn't love copying good systems?) is to have two "latest" versions always available -- a stable version, and an alpha/beta version. The stable versions are the odd version numbers, and the alpha/beta are the evens. Obviously that could be switched, but everybody should use the same convention, and I think this one makes sense because a project starts at version 0, and then the first stable version would be 1.

If this system were used, than only the odd-numbered versions would be required to remain constant. The even-numbered versions could vary as the dev(s) fixed bugs or added features. Odd-numbered versions which depended on other libraries would have to depend on odd versions of those libraries. An even version could depend on any library.

I like the idea of using forum nicknames, because they're unique. URLs are (not entirely, thanks to Internet Explorer...) case-insensitive, so maybe pg should change the forum so that two usernames can't be case-insensitively equal? (If that's the case already, scratch what I just said...)

Meta-data can come in a separate file, named the same as the library. It should probably be some form of alist:

  File arc/keystones/foo.xyzlib/3/meta.arc:
  ((devs   ("John Foo"
            "Bob Baz"
            "Oscar Frozzbozz"))
   (stable T)
   (note   "This library does xyz.")
   (needs  ("catdancer.toerr.3"
            "rntz.pass-to-compiler.1")))
This last part could work because libraries would be uniquely identified by a string, as CatDancer explained above.

Also, some form of standard directory structure could be good. Each person would be able to customize where their lib/ directory would be, and what it would be called (in the example, my directory is arc/keystones/). However, within that directory, I think there should be some convention of how libraries would be organized. I think one that makes sense is that each library would have a directory, within which each version would have a separate directory. If this is nested too deeply, it could instead be a wide nesting -- arc/keystones/foo.xyzlib.3/

Within the library directory, the file which gets loaded should have some standard name too -- the most obvious one would be the name of the library. The directory could contain other files containing more code, and those files would be loaded (or required) by foo.xyzlib/3/xyzlib.arc. Meta-data would be in the file meta.arc.

-----

1 point by shader 5427 days ago | link

Customizability of the lib folder is probably a good idea, but it will probably done via hacking the code for the lib functions ;) It shouldn't be that hard to do anyway. With my naming scheme, you'd just change the string that was prepended to the library name.

I'm not sure that odd/even version numbers is such a good idea. It could be very confusing that way. I think that CatDancer's requirement of libraries to be static is a much more reliable concept. Otherwise like he said you'd need to check periodically for updates.

Libraries should also be able to depend on whatever they want. That's the author's decision. If they need the beta version, but they've tested it and know that what they have written is stable, then they should be allowed to publish it that way. They can always make a new version if they need a bug fix.

Also, since the version is just part of the lib name, you can have as many layers of minor version that you want i.e. 1.5.200906015.

-----

1 point by CatDancer 5427 days ago | link

Libraries should also be able to depend on whatever they want

Dependencies should actually be managed outside of the libraries themselves. For example, I have a library foo that depends on bar. Later a new version of Arc comes out that implements what bar did. Now foo doesn't need bar any more. But foo itself hasn't changed, so I shouldn't have to release a new version of foo just to say that it doesn't need bar with this new version of Arc.

Instead we publish dependency information about libraries. For example, I can say that foo needs bar 0 and arc 3, or just arc 4... something like:

  (needs (foo 0) (or ((bar 5) (arc3 0)))
                     ((arc4 0))))

-----

1 point by Adlai 5427 days ago | link

The odd/even numbering doesn't have to be exactly that way. It could also be something like foo/xyzlib/1b for the beta, and foo/xyzlib/1 for the "stable" version.

-----

1 point by CatDancer 5427 days ago | link

only the odd-numbered versions would be required to remain constant

the version numbers should make a definite statement that "catdancer.toerr.3" is an inferior predecessor to "catdancer.toerr.4"

There's a difference between a release number and a version number. A version number, as you say, can be used to indicate that a later release is better, or indicate the stable vs. alpha/beta status of a release, etc. The release number merely identifies releases.

For example, pg had several releases of arc3. Under my naming system, they would have been named "pg.arc3.0", "pg.arc3.1", "pg.arc3.2", etc.

Regardless of the alpha/beta/stable status of a release, two releases of a library should never be released with the same name and release number for several reasons:

- If I'm telling you about a bug in your library, then I can tell you which release if saw the bug in. If you change your library without giving it a new release number, then we won't know if I'm talking about your old release or your new release.

- It's clear when a tool such as my "lib" library which downloads a library from a URL needs to download a new release. If there's a new release number, and I want that new release, then "lib" knows it needs to download the new release. If the release can change at the same URL, then "lib" has to periodically check to see if the file at the URL has changed.

- Just because I think that a release of mine is a "alpha/beta" version doesn't mean that you might not want to keep using it.

-----

1 point by Adlai 5427 days ago | link

I see what you mean -- I was a bit confused about version vs release.

However, I still think that the name of the library should be "arc". Maybe the releases would be named "pg.arc.3.0", "pg.arc.3.1", etc. I just think that the library name should be distinct from the version and release numbers.

-----

2 points by shader 5428 days ago | link

Feel free to post what ever you feel like relating to arc on this forum. That's what it's for!

Anarki, as the github fork of arc is called, is a very popular version, as it comes with several libraries and tools which can be very helpful. Basically, when anyone creates an addition to arc and can't wait for pg to adopt it, they put it in Anarki. Currently it's a little bit out of date, because it hasn't caught up to the new arc3, but it's getting there.

The closest thing to the CPAN equivalent would be github, using something like CatDancer's lib function. Unfortunately, while that makes it easy to share libraries, it doesn't make it so easy to find them. That's why many of the libraries are just pushed into the lib folder in Anarki.

Welcome to the community!

-----

1 point by conanite 5427 days ago | link

I had a similar load-path concept in rainbow for a while (using "ARC_PATH" from the environment), but recently realised I could delete a lot of code by assuming everything was under the current working directory. But I'm not entirely sure that this assumption will scale as arc proceeds to world domination.

I wonder if the kind of packaging/distribution infrastructure you mention is a prerequisite for a great language, or is it the other way around? Afaik the rubygems concept was developed quite a bit after ruby became popular. There are some amazing libraries in anarki, and some of them are solutions looking for a problem. What application do you have in mind that would be simpler/shorter with CPAN-style support in place?

There was a time in Ireland (where I'm from) when the government built lots of roads because neighbouring England had a thriving economy, and lots of roads. It turned out the dependency was the other way around.

btw thanks for the new vocabulary: I'm going to "whomp" up some functions now :))

-----