Arc Forumnew | comments | leaders | submitlogin
2 points by almkglor 5857 days ago | link | parent

I personally think that strings and symbols should be separate, largely because of their different uses.

That said, I did recently write an app which used symbols as de facto strings, and the text file format being used as a configuration/task description file by the user was just an s-expr format. The app wasn't written in Lisp or anything like it, which was why I settled for using symbols as strings (to make it easier on my readerfunction - I just didn't have a separate readable string type).

Given that experience, well, I really must insist that having separate strings and symbol types are better. In fact in the app I wrote it for, the config/taskdesc file was just a glorified association list (where the value is the 'cdr of the list, not the 'cadr)!

As for strings being lists/arrays of characters, yes, that's a good idea. We might hack into the writer and have it scan through each list it finds, checking if all elements are characters, and if so just print it as a string. We might add a 'astring function which does this checking (obviously with circular list protection) instead of [isa _ 'string].



2 points by bogomipz 5848 days ago | link

I think the strongest reason for separate strings and symbols is that you don't want all strings to be interned - that would just kill performance.

About lists of chars. Rather than analyzing lists every time to see if they are strings, what about tagging them? I've mentioned before that I think Arc needs better support for user defined types built from cons cells. Strings would be one such specialized, typed use of lists.

Also, how do you feel about using symbols of length 1 to represent characters? The number one reason I can see not to, is if you want chars to be Unicode and symbols to be ASCII only.

-----

2 points by sacado 5848 days ago | link

Symbols, ASCII only ? No way, I'm writing my code in French, and I'm now used to calling things the right way, i.e. with accents. "modifié" means "modified", "modifie" means "modifies", that's not the same thing, I want to be able to distinguish between both. Without accents, you can't.

Furthermore, that would mean coercing symbols into strings would be impossible (or at least the 1:1 mapping would not be guaranteed anymore).

-----

2 points by stefano 5848 days ago | link

From the implementation point of view representing characters as symbols is a real performance issue, because you would have to allocate every character on the heap, and a single character would then take more than 32 bytes of memory.

-----

2 points by sacado 5848 days ago | link

I think that's an implementation detail. You could still somewhat keep the character type in the implementation, but write them "x" (or 'x) instead of #\x and making (type c) return 'string (or 'sym).

Or, if you take the problem the other way, you could say "length-1 symbols are quite frequent and shoudn't take too much memory -- let's represent them a special way where they would only take 4 bytes".

-----

1 point by stefano 5847 days ago | link

This would require some kind of automatic type conversions (probably at runtime), but characters-as-symbols seems doable without the overhead I thought it would lead to.

-----