Arc Forum | The use of + for non-numbers

Arc Forum

new | comments | leaders | submit

The use of + for non-numbers

5 points by waterhouse 5422 days ago | 10 comments

Currently, + can be used to add numbers or concatenate strings or lists.

PG mentioned this in his early essays on Arc. I believe this was his last word on the issue[0]:

"Using + to concatenate sequences is a lose.

This kind of overloading is just a pun. I found that it actually made programs harder to read, not easier, because I kept thinking I was looking at math code when I wasn't.

As several people have pointed out, concatenation isn't addition. It's not commutative, for example. Ok, you were right; we're tossing it."

But, for some reason, this feature is still part of Arc and is used liberally in arc.arc. For an illustration I find amusing:

  $ grep -c '(+ (list' arc.arc
  4

So here's what I did. I replaced the definition of + in ac.scm so that it (like -, /, expt, and so on) was exported verbatim from mzscheme; I redefined join as a replacement, so that it concatenated strings as well as lists (duct taping it together, with liberal use of $ to call Scheme functions); and I haphazardly went through files ending with ".arc", replacing instances of + that looked like they were supposed to concatenate rather than add, until Arc was able to load without errors. I had another Arc process already going from before I made these changes (and I of course backed up the files before I changed them).

I ran a couple of tests. The gist is, addition seems to work about twice as fast as it did before[1].

Additionally, I noticed a while back that calling + on small numbers consumes memory (i.e. produces garbage), while calling * on similar numbers doesn't. I was sure this was due to the type-checking. This experiment of mine seems to confirm this. With my mem-use macro [2]:

  arc> (mem-use (+ 1 2)) ; Original; + concatenates
  32
  arc> (mem-use (+ 1 2)) ; New; + does not concatenate
  0
  arc> (mem-use (* 3 4)) ; Either one
  0

I do a lot of math with Arc, and finding that addition is working half as fast is it could be because of some feature I plan never to use and was supposed to be dropped a long time ago... it really grinds my gears. So I am taking the trouble to write this up and announce it to the Arc Forum. I want to convince everyone (especially PG, who will presumably write the next official release of Arc) to agree to drop that feature. So here are some questions answered in advance:

"What should we use instead, to concatenate strings and lists?" I think join; at the moment it only touches lists, so we can extend it to use strings and that won't break any existing code.

"Won't that just make join run slower?" Slightly, yes; now joining the lists (1 2 3) and (4 5 6) ten thousand times takes ~90 msec instead of ~80. That's a smaller difference, and I don't think joining lists is at all likely to be the performance-critical part of your app. If you want a function that only joins lists, that function has historically been called append in other Lisps.

"Why are you coding things like that, where performance on numerical computations is apparently important, in Arc?" Because I like Arc. And its design goals (see "The Hundred-Year Language") certainly allow for making it possible to write incredibly fast code in Arc. If you wonder how Lisp code can be made to run fast, take a look at Common Lisp and its 'declare syntax.

"If someone makes an Arc implementation that does type analysis and/or allows type declarations (like in Common Lisp) so the compiler can speed things up, then wouldn't that fix the slowdown problem?" Yeah, well, a type analysis system like that is pretty nontrivial to implement. I don't think anyone's done it for Arc so far. I think most of us will be using some version of the mzscheme implementation for as far ahead as I care to look, and I think it would be pretty hard to tack onto the mzscheme implementation. Even if someone had already done it, it would be an obstacle for future implementers of Arc on other platforms. It is so much easier to simply guarantee that + is supposed to work on numbers only.

"I like that feature and use it." Well, I don't like it and I don't use it. Look again at PG's stated reasons for dropping +-concatenation; I agree with them and could repeat them myself.

I think I've made my case.

For those who want to hack their own Arc to fix + like I did. Here is a rough description of the changes (if someone feels like showing me how to use a diff program to display the nice patch changes, please feel free):

  - In ac.scm, replace (xdef + ...) with (xdef + +), and replace
    the body of ar-+2 with (+ x y).
  - In arc.arc, insert two lines so that the first four lines of
    the body of 'join are:
  (if (no args)
      nil
      ($.string? (car args))
      (apply $.string-append (map1 [coerce _ 'string] args))
    Duct tape and liberal use of $.  Feel free to improve it.
  - In arc.arc, strings.arc, html.arc, and srv.arc, look at
    every instance of the character + and decide whether it
    should be a 'join.  It's not hard, just tedious.  The 
    toughest decision involved a call to 'respond in html.arc
    (which seems it should be replaced).
  - This is just enough to make Arc load without errors.  It may
    not be everything.  Ideally PG/RTM would do it themselves, 
    but they seem to take their time between releases of Arc.

[0]http://paulgraham.com/arclessons.html

[1]Here are the tests.

  ; Original; + concatenates
  arc> (repeat 10 (time:do (= i 0) (repeat 1000000 (++ i)) i))
  time: 1367 msec.
  time: 1291 msec.
  time: 1252 msec.
  time: 1276 msec.
  time: 1254 msec.
  time: 1259 msec.
  time: 1267 msec.
  time: 1260 msec.
  time: 1282 msec.
  time: 1285 msec.

  ; New; + does not concatenate
  arc> (repeat 10 (time:do (= i 0) (repeat 1000000 (++ i)) i))
  time: 629 msec.
  time: 622 msec.
  time: 647 msec.
  time: 644 msec.
  time: 704 msec.
  time: 641 msec.
  time: 642 msec.
  time: 623 msec.
  time: 638 msec.
  time: 630 msec.

The difference is more pronounced when we increment lexical variables rather than global; the former is significantly faster. I'm making it put all the millisecond counts into one list:

  ; Original; + concatenates
  arc> (keep [isa _ 'int] (readall:tostring:repeat 30 (time:let
  i 0 (repeat 1000000 (++ i)) i)))
  (673 686 674 668 682 677 667 688 676 669 687 766 671 693 674
  671 685 669 670 698 671 675 685 671 719 685 673 672 685 670)

  ; New; + does not concatenate
  arc> (keep [isa _ 'int] (readall:tostring:repeat 30 (time:let
  i 0 (repeat 1000000 (++ i)) i)))
  (263 259 255 255 258 273 256 262 266 256 254 254 266 270 262
  256 257 342 262 261 275 258 263 264 258 259 256 262 268 262)

Also, amusingly, (-- i -1) runs slightly faster than (++ i) in the original, +-concatenating, version; in the first example, performing this replacement results in an average of around 1130 msec, rather than 1280 (these are eyeballed averages). Worst optimization trick ever. I'm tempted to say that this alone is so bad that even if the concatenation were a nice feature, it should be dropped immediately in horror.

[2]http://arclanguage.org/item?id=12255

3 points by rocketnia 5422 days ago | link

For what it's worth, I practically never add numbers. I actually use subtraction much more than addition! So I don't mind whatever these operations are named--and I mind the speed even less, actually--but I can testify that this change would sorta hinder my own code at this point. :-p

You've almost certainly encountered this comment above the definition of 'join, but I'll paste it here just in case it elucidates anything:

  ; Rtm prefers to overload + to do this

-----

3 points by garply 5422 days ago | link

I make heavy use of + for concatenation throughout my code. I prefer it for a few reasons:

1. I find myself concatenating lists frequently and I prefer that frequently used functions be short. join has 4 chars to +'s 1

2. I also find ++ convenient for modifying a list variable. What would you use for the equivalent for join? In Racket's style, it would be join!, but I don't see a good analogue in arc for your proposal.

3. I'm constantly doing string concatenation. + is good for that because it's a short function name and also because I expect it to work because it works like that in many popular high-level languages (python, ruby, javascript). I also don't like to use arc's "@" for string-escaping because I find thinking about whether or not I have to escape the "@" character is distracting.

-----

3 points by waterhouse 5419 days ago | link

With regard to (2), to destructively append the list '(1 2 3) to xs, you can:

  (zap join xs '(1 2 3))

"zap join" is several characters longer than ++, but zap has general utility.

I use the string function to concatenate strings. It seems to work identically to +, as long as the first argument is a string. I do give heterogeneous arguments to string usually, and I like seeing that they will clearly be coerced into a string.

I have a couple of ideas.

1. It would be little problem for me if Arc provided + as is and provided a "plus" function that worked only on numbers and allowed me to simply go (= + plus) and proceed normally. Unfortunately, that would break all the functions in arc.arc and so forth that use + to concatenate lists (which includes, among other things, the setforms function, which is used in expand=). It would be really nice if one could "freeze" the variable references in those functions, so that changing the global value of + wouldn't change what "+" in the bodies of those functions referred to.

2. If you use concatenation so much, perhaps we could allocate an ssyntax character for concatenation. & is currently used for andf (experiment with ssexpand to see). We could boot out that usage, or perhaps have "&&" become andf. Good/bad idea? (Precedent: my memory tells me the TI-89 uses & for string concatenation.)

-----

1 point by garply 5419 days ago | link

Regarding your second suggestion, we could also use . instead of &, as that's what Perl and PHP do - feels a little more natural to me. But . might cause me a little mental friction in differentiating between the different uses of . in (. "a" "b") and (do (= a '(1 2)) a.0).

To be honest, I'm still not crazy about the idea simply because I don't need the speed boost and + doesn't seem to cause me to use extra mental cycles when reading my code. I'd be open to it though if the community really wanted it that way.

We could vote and also ask PG what he thinks and then make a decision.

-----

1 point by prestonbriggs 5419 days ago | link

I wouldn't do anything for a speed boost at this stage. Premature optimization and all that.

Preston

-----

2 points by fallintothis 5422 days ago | link

String concatenation is particularly convenient after the + change for arc3.tar: http://arclanguage.org/item?id=9937. That's probably what I use + for most of the time.

I find thinking about whether or not I have to escape the "@" character is distracting

I find this is easier with proper syntax highlighting. My arc.vim ftplugin can detect if you have (declare 'atstrings t) and, if so, highlights the escaped parts of strings. That way, you know if @ is escaped just by glancing. But I don't mean to shamelessly plug, haha. I don't use atstrings either, but my reason is far lazier: in the middle of writing code, it's less effort to just use + than it is to declare then go back and start using @s.

-----

2 points by garply 5421 days ago | link

What other goodies does your arc.vim plugin have? Is your editor at all integrated with the arc repl? Lack of a repl that I could easily send arc code to was the reason I switched to emacs after years of using vim. These days, using emacs with viper (vim emulation mode), I don't miss vim at all.

-----

2 points by fallintothis 5413 days ago | link

Sorry, I'm not going to be the one to write SLIME for Vim. :P I'm afraid the only "goodies" I have are highlighting-related. Off the top of my head:

- It can detect macros that take rest parameters named body, then highlight and indent them correctly.

- It uses R5RS's grammar & mz-specific extensions to highlight numbers correctly -- even weird ones, like

  #e1.5
  -nan.0+2.5i
  1/2@+inf.0
  #x8a
  #o1E+2

- It notices paren errors involving [], like

  [(+ a b])

- It highlights escape sequences in strings, like

  "\x61b \t c\u0064\n"

- It does its best to highlight ssyntax characters so they're easy to read.

You can check out more at http://arclanguage.org/item?id=10147 or http://bitbucket.org/fallintothis/arc-vim. It hasn't changed much since I first submitted it, though I've noticed it fails to highlight 0-argument function names like queue and new-fnid. Been meaning to fix that for awhile.

-----

3 points by fallintothis 5422 days ago | link

if someone feels like showing me how to use a diff program to display the nice patch changes, please feel free

You mean like

  diff -u old-arc.arc new-arc.arc

or are you on Windows or something?

-----

1 point by waterhouse 5421 days ago | link

Aha! That works. Thank you!

-----