Arc Forum | In principle I don't have a fully general solution yet :/ It's something I'm wor...

Arc Forum

3 points by akkartik 3574 days ago | link | parent

In principle I don't have a fully general solution yet :/ It's something I'm working on.

What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code. Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application. Then I would bundle the application with all its libraries included.

Of course this doesn't scale to large libraries, because managing a fork today involves an amount of work that ranges from non-trivial to intractable. But this would be my ideal.

Past writings on this subject: http://akkartik.name/post/libraries; http://akkartik.name/post/libraries2. My current project which tries to make fork-management tractable: https://github.com/akkartik/mu

Bear in mind that it's only a hard problem for collisions in the interface of the two libraries. Functions that are used only internally can be wrapped inside closures so they're only accessible to the library that cares about them.

3 points by rocketnia 3574 days ago | link

I've been noticing continuities between social code distribution, modularity, and variable scope. A guiding example is code verification:

  Unrecorded reasoning, existing mainly in our minds.
  -->
  Codebases dedicated to proofs or tests.
  -->
  Proofs or tests located in the codebase they apply to.
  -->
  A type/contract declaring a module interface.
  -->
  A type/contract annotation for a function definition.
  -->
  A type/contract annotation for an individual expression.
  -->
  A type/contract annotation for an individual built-in operator, but at
  this point it becomes implicit in the operator itself, and we just
  have structured programming, enjoying properties by construction.

Verification is a simplified version of a build process; it's just a build with a yes or no answer. So the design of a build system has similar continuity:

  Unrecorded how-to knowledge, existing mainly in our minds.
  -->
  Codebases or how-to guides dedicated to curated builds (e.g. distros).
  -->
  Build scripts and docs located in the codebase they apply to.
  -->
  Macroexpansion-time glue code, importing compiler extensions by name.
  -->
  Load-time glue code, importing runtime extensions by name.
  -->
  Service-startup-time glue code, obtaining dependency-injected fields
  by name.
  -->
  An expression, taking free variables from its lexical scope by name.
  (This is a build at "evaluation of this particular expression" time.)

There might be some rough parts in here. I might be taking things for granted that I don't want to, like taking for granted that we want unambiguous named references from one module to another. My point with this continuity is to note that if I don't want named imports, then maybe I don't want named local variables either; maybe tweaks to one design should apply to the other.

And this means that even local syntactic concerns extrapolate to social decisions about how we expect to deal with our unrecorded knowledge. Every design decision has a lot to go by. :)

---

Another exciting part is that I think nested quasiquotation shows us a more general theory of lexical locality. If we're dealing with syntax as text, then locations in that text have an order, and we can isolate code snippets at intervals along that order (and mark them with parentheses). Intervals are partially ordered by containment, so we can isolate code snippets at meta-intervals between an outer interval and multiple nonoverlapping inner intervals (and mark them with parentheses with nonoverlapping parentheses-shaped holes: quasiquotations).

That "nonoverlapping" part seems awkward, but I think there's a simple concept somewhere in here.

With this concept of intervals, I'm considering higher degrees of lexical structure past quasiquotation, and I'm considering what kind of parentheses or quasiquotations would exist for non-textual syntaxes.

A module system deals with a non-textual syntax: The syntax of a bundle of modules. If the modules have no order to them, then we don't even have parentheses to work with, let alone quasiquotation. But they can have an order to them. We can impose one from outside:

  Module A precedes module B.

And anything we can impose from outside, we might want to add as a module:

  Module A says, "..."
  Module B says, "..."
  Module C says, "Module A precedes module B."

This is prone to contradictions and ambiguities. If we can say how to resolve these ambiguities from the outside, we should be able to do so as a module:

  Module A says, "..."
  Module B says, "..."
  Module C says, "Module A precedes module B."
  Module D says, "Module B precedes module A."
  Module E says, "If module C and module D disagree, listen to module C."
  Module F says, "If module C and module D disagree, listen to module D."
  Module G says, "If module E and module F disagree, listen to module E."

This should lead to a very complete system of closed-system extensibility: For any given set of modules, if the set's self-proclaimed ordering between A and B is currently unambiguous, then we might as well listen to it! If we don't like it, we can add more contradictions and disambiguations until we do, right up to and including "Ignore all those other modules and do it like this." :)

With this ability to disambiguate when things go wrong, we can model lexical scope:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Result: foo = 4.

While both A and B have an export named "foo," this conflict is disambiguated by the fact that module A is treating {B, C} as a local scope. I intend this to mean that bar isn't at the top level either.

If we really want access to bar at the top level, we can refer to it again, and we can even be sloppy about it and make up for our sloppiness with disambiguations:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Module D says, "Export all imports from system {B, C}."
  Module E says, "If A and D export the same variable, listen to A."
  
  Result: foo = 4; bar = 4.

If we want, we can have the top-level bar see the version of foo exported by A, even though the version of bar used by A still uses the foo from B:

  Module A says, "Export foo = (import bar from system {B, C})."
  Module B says, "Export foo = 2."
  Module C says, "Export bar = foo + foo."
  
  Module D says, "Export all imports from system {A, B, C}."
  Module E says, "Export all imports from system {C, F}."
  Module F says, "Export foo = (import foo from system {A, B, C})."
  Module G says, "If D and E export the same variable, listen to D."
  
  Result: foo = 4; bar = 8.

Not easy enough to extend? Define some structure. Write modules that assign folksonomic tags to other modules or themselves, and then refer to the system of all modules with a given tag. Write modules that act as parentheses, and write modules that determine enough of an order to decide which modules those parentheses contain. Here's an example of the latter:

  Module A says, "Export foo = (import bar from range R1)."
  Module B says, "Export interval R1, and begin it here."
  Module C says, "Export foo = 2."
  Module D says, "Export bar = foo + foo."
  Module E says, "End interval."
  Module F says, "These modules are in order: B, C, D, E."

The flexibility is obviously really open-ended here, and it's going to be a challenge to make this a well-defined idea. :-p

-----

2 points by Oscar-Belletti 3574 days ago | link

>What I would like to happen is that my application only contains dependencies it really needs, and that each dependency includes no superfluous/dead interfaces or code.

Do you want to avoid the situation, wich happens in c, where when you need the sqrt you have to include the whole math file? I totally agree.

>Under these circumstances I would like to live in a world where I can go in and modify the libraries to have different names, with the difference in names making sense in the context of the application

This looks right to me. Perhaps it could be something like python's

    from library import function as good_name_for_your_project

>Then I would bundle the application with all its libraries included.

I'm not sure making a (even not full) copy of a library is a good idea because it would lead the user to have many copies of the same libraries. On my windows machine I ended having 4 versions of python! I think that common parts should be in common.

-----

3 points by akkartik 3574 days ago | link

> Do you want to avoid the situation.. where when you need the sqrt you have to include the whole math file? I totally agree.

Yes, definitely. In the Javascript world it's called tree-shaking: https://medium.com/@Rich_Harris/tree-shaking-versus-dead-cod...

> from library import function as good_name_for_your_project

What's happening here is that you're a) adding a feature in Python to support 'from..as', b) including an external library and c) continuing to keep around an old name that you don't really care about. You're essentially preserving the old name just because other people who your application doesn't care about use it.

Imagine a world where maintaining forks was tractable. Would this still be a good idea? Why not just do a search and replace and maintain a private fork, eliminating all this complexity in your private stack? Just delete 'from..as' from your private Python! :o)

> I'm not sure making a copy of a library is a good idea because it would lead the user to have many copies of the same libraries.

Yes, this is a fundamental difference in outlook/ideology. I think that copying isn't always bad. We culturally tend to emphasize the issues with copying a lot more than the costs of avoiding duplication.

A degenerate example is to observe that there are tons of 'e's in the novel I'm reading and try to deduplicate them. That is of course obviously farcical, but it at least serves to illustrate that there's a trade-off, and that always DRY'ing your code isn't obviously a good idea. Another example is to observe that the internet has many copies of the same libraries running at any given time. You can argue that they're on different machines, but then imagine a 'machine' consisting of multiple cores and private caches and non-uniform memory access and RAID-partitioned disks. Changing latency costs can make it reasonable to maintain multiple copies of some immutable data in a single 'machine'. Now consider that development is yet another cost that is open to variation. If (automatically) creating copies of something eases development, it's at least worth considering. For example, optimizing compilers can sometimes specialize a function differently for different callsites. That's duplication often inside a single binary, and it makes sense in some contexts.

The npm eco-system promiscuously duplicates dependencies inside the node_modules/ directory, so that is at least some evidence that the approach I'm suggesting isn't too insane :)

-----

2 points by Oscar-Belletti 3574 days ago | link

Ok, this maybe could be the way to go. Adapting little libraries isn't a problem, and it probably makes your program better. This defeats collisions, useless code and is ok for autoloading. But this approach will work only if our libraries will be small enough. For now this is ok.

Duplicating libraries isn't a problem: disk space for ease of development is an exchange which is getting more and more convenient.

For autoloading: the interpreter/compiler could load all .arc files in current directory (or current-directory/lib), or scan them for function definitions (without loading them) and making elisp autoload automatically for every function. I prefer the first option.

-----

2 points by digitalis_ 3574 days ago | link

One possibility for this bundling is that Arc looks first where it would expect a library to be (in an equivalent of npm_modules), then looks for it in the usual place (/usr/lib or wherever).

Or, if it all needs to be bundled, you could have symlinks for the libraries you don't change.

What do you think?

-----

2 points by digitalis_ 3574 days ago | link

Thing is...this is about as verbose as you can get!

If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)

[As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well. So we can all chill.]

-----

3 points by rocketnia 3574 days ago | link

Quality of a name is relative to a purpose. The more public we go, the more meanings compete for a single name, making us resort to jargon. If a language really only uses homogenous intensional equality, being able to call it = is a relief. If someone wants to build a side-by-side comparison of several versions of an extension, they might prefer for some of the names to be different in every version while others stay the same.

But it's not just names per se. In that side-by-side comparison, they might also want to merge and branch parts of the code whose assumed invariants have now changed; invariants can act as Schelling points, like invisible names. Modifying code is something we do sometimes, and I think akkartik wants to see how much simplicity we'll get if everyone who wants a simpler system has the tooling support to modify the code and make it simpler themselves.

Personally, I find it fascinating how to design a language for multiple people to edit the code at the same time, a use case that can singlehandedly justify information hiding, modules, and versioning. But I think existing module systems enforce information hiding even more than they have to, so that in the cases where people do need to invade that hidden information, they face unnecessary difficulties. I think a good module system will support akkartik's way of pursuing simplicity.

But... my module system ideas aren't finished. At a high level:

- You can invade implementation details you already know. You can prove this by having their entire code as a first-class value with the expected hash.

- You can invade implementation details if you can authorize yourself as their author.

-----

2 points by akkartik 3574 days ago | link

"If a name's already good, you're not going to change it; if it's bad, you should push that change upstream! (If the name's bad, it's likely that the original author didn't put much time into choosing the name, so I think it would be fairly straightforward to get that merged.)"

Not necessarily. 'Good' and 'bad' are not absolute, they are extremely contextual. A name that is good for a general-use library might be sub-optimal for your application, or vice versa. Subjective taste is also a thing. So while you should certainly send out a pull request for the change, our model of the world shouldn't rely on the change actually getting pushed.

In general it is amazing to me how often a blindingly obvious Pull Request gets rejected or just sits in the queue, untouched. There's lots of different kinds of people out there. Which is why I tend to think more like a barbarian[1] about collaboration: think of other people as islands with whom you might collaborate if the stars align. But don't rely on the collaboration. Be self-sufficient.

[1] http://www.ribbonfarm.com/2011/03/10/the-return-of-the-barba...

---

"As much as I love this idea of implicit importing, I'm sure the explicit side -- which'll let you change whatever names you like -- will need to be there as well."

I actually interpreted your original post that kicked off this thread as implicit loading since Arc has no notion of modules or import. So the question of changing names did not arise. That seemed like a tangent to the original question.

These seem like separate questions:

1. Should Arc know how to react with implicit symbols?

2. Should Arc provide namespaces?

One the one hand, you can have implicit loading without needing a module/namespace system. On the other hand, I don't see how you can have implicit loading in the presence of namespaces. Without the "from..as" construct how would your system know which library to load a symbol from, if there's a collision?

Summary: even if you have namespaces, you're still going to be doing your own collision-detection if you want implicit loading. What's the point of a module system then?

-----