Arc Forumnew | comments | leaders | submitlogin
Can anybody else reproduce this bug with queues?
5 points by akkartik 5139 days ago | 17 comments

  (= q (queue))

  (def verify(q)
    (prn q)
    (if q.0
      ; The contract for queues: q.1 is reachable from q.0 
      (unless (reclist [is _ q.1] q.0)
        (prn "error"))))

  (repeat 1000000
    (prn "iter")
    (repeat rand.10
      (verify q)
      (enq 0 q))
    (prn "deq")
    (until (is 0 qlen.q)
      (verify q)
      (deq q)))
On EC2+ubuntu jaunty+mzscheme 4.2.4/4.1.3 this dies with an error 4 out of 5 times. Every now and then enq seems to reset the queue's tail pointer.

No error in snow leopard with mzscheme 4.2.2. Could y'all try it out on platform/version combos you have access to, and post the results?

Relevant is this comment above the implementation of enq:

  ; Despite call to atomic, once had some sign this wasn't thread-safe.
  ; Keep an eye on it.


5 points by palsecam 5139 days ago | link

No problem here. I repeated the 'repeat loop 3 times.

  $ lsb_release -a
  [...]
  Description:	Ubuntu 9.10
  Release:	9.10
  Codename:	karmic
  $ mzscheme -v
  Welcome to MzScheme v4.2.1 [3m], Copyright (c) 2004-2009 PLT Scheme Inc.
But bugs related to atomicity exist anyway, this is certain. The thread/atomicity stuff is a subtle mess.

Threads are never a solution. Message-passing / shared-nothing threads maybe, event-based maybe, something else maybe. But traditional "à la Java" threads have prouved to be a bad idea.

Why are threads in Arc, after all? For srv.arc. Which would do better w/ an event-based architecture.

And the GIL. Gosh. Python is learning the hard way how a GIL is painful and should probably be avoided from the start.

BTW:

  arc> (= var 0  tbl (obj var 0))
  #hash((var . 0))
  arc> (repeat 50000 (thread (++ var) (++ tbl!var)))
  nil
  arc> var
  50000
  arc> tbl!var
  50000
  ; OK, the above is normal and expected
  ; Now, let's sleep for a random time in each thread before to '++
  arc> (= var 0  tbl (obj var 0))
  #hash((var . 0))
  arc> (repeat 50000 (thread (sleep:/ (rand 40) (inc:rand 50)) (++ var) (++ tbl!var)))
  nil
  arc> var
  49817
  arc> tbl!var
  50047
  ; WTF?!
Huh I'm surprised, I knew 'assign wasn't atomic, but I thought it was OK for 'sref ('++ expands to '= which expands to a call to 'sref but in a 'atwith expression). Seems not. Is my test bogus somehow?

----

Some people, when confronted with a problem, think "I know, I will use threads." Now they have two problems.

-----

1 point by garply 5136 days ago | link

Died for me on Arch Linux too:

$ mzscheme --version Welcome to MzScheme v4.2 [3m], Copyright (c) 2004-2009 PLT Scheme Inc.

Also, I second the notion that shared-memory threads are a bad idea. I really like Termite's message-passing model.

-----

1 point by aw 5136 days ago | link

Died for me on Arch Linux too

Which test were you running, akkartik's or palsecam's?

-----

1 point by garply 5136 days ago | link

akkartik's code died on me with the first time I ran it, but I haven't been able to reproduce it.

This is what happens with palsecam's code:

arc> (= var 0 tbl (obj var 0))

#hash((var . 0))

arc> (repeat 50000 (thread (++ var) (++ tbl!var)))

nil

arc> var

49999

arc> tbl!var

50000

-----

1 point by palsecam 5136 days ago | link

  arc> var
  49999
is strange, but the code you tried is not the one demonstarted a bug. You should 'sleep in the threads (i.e: the second example in my comment).

-----

1 point by garply 5136 days ago | link

When I do the second example, I get results similar to yours.

-----

1 point by akkartik 5138 days ago | link

Did you really run that over 10 hours?

Update: I tried your test with lower sleep intervals and didn't see the error (sleep:/ (rand 40) (inc:rand 5000))

-----

2 points by palsecam 5137 days ago | link

> Did you really run that over 10 hours?

I never say I did.

  arc> (= var 0  tbl (obj var 0))
  #hash((var . 0))
  arc> (time:repeat 50000 (thread (sleep:/ (rand 40) (inc:rand 50)) (++ var) (++ tbl!var)))
  time: 9173 msec.  ; <-- not 10 hours...
  nil
  arc> var
  48152
  arc> tbl!var
  32039
> I tried your test with lower sleep intervals and didn't see the error

Even w/ lower intervals, it is buggy on my computer:

  arc> (= var 0  tbl (obj var 0))
  #hash((var . 0))
  arc> (time:repeat 50000 (thread (sleep:/ (rand 40) (inc:rand 5000)) (++ var) (++ tbl!var)))
  time: 5391 msec.
  nil
  arc> tbl!var
  49849
  arc> var
  49859
(Running on the Ubuntu/MzScheme combo described in previous comment, and on plain vanilla Arc 3.1 (ycombinator.com/arc/arc3.1.tar). Runned ~10 times, each time the results are different but never 50000)

-----

1 point by akkartik 5137 days ago | link

Yes, reproduced on ubuntu/mz4.2.4 (sorry I was making stupid mistakes last night when I tried it out). On snow leopard/mz4.2.2 the tbl!var is always at 50k, but the global var is always lower. I've seen it as low as 43866.

-----

1 point by akkartik 5139 days ago | link

Interesting. But there's no threads in the queue issue :(

-----

1 point by palsecam 5139 days ago | link

yes I know, my reply was actually more about the comment in arc.arc, sorry :-)

-----

2 points by waterhouse 5138 days ago | link

Ran for a very long time, but finally erred. Mac OS X 10.6, mzscheme 4.2.4.

  iter
  (nil (0) 0)
  ((0) (0) 1)
  ((0 0) (0) 2)
  ((0 0) nil 3)
  error
  Error: "set-cdr!: expected argument of type <pair>; given nil"
Ran it a second time; it erred within three seconds. Ran it a third time; it erred after a minute or two.

-----

2 points by waterhouse 5137 days ago | link

This is curious. I've ran it with out-of-the-box arc3.1 and out-of-the-box mzscheme 4.2.4 and 4.2.2, on the same computer (with Mac OS X 10.6.2). It's erred every time with 4.2.4 (I think two or three times very quickly and three or four times after a long time), and it has not erred once with 4.2.2 (after four or five tries). I suppose it must be a problem with mzscheme.

-----

1 point by akkartik 5138 days ago | link

Thank you! I haven't managed to get the error on macos. This is good to know.

-----

2 points by aw 5139 days ago | link

I got the error, mzscheme 4.2.3 on my Gentoo laptop, running plain arc3.1

  ...
  ((0 0) (0) 2)
  ((0 0 0) (0) 3)
  ((0 0 0 0) (0) 4)
  ((0 0 0 0) nil 5)
  error
  Error: "set-cdr!: expected argument of type <pair>; given nil"
  arc>
Update: ran it twice with 4.2.1, no error.

-----

2 points by thaddeus 5138 days ago | link

twice on Mac OSX 10.6.2 MzScheme 4.2.2

twice on Ubuntu Jaunty MzScheme 4.2.2

no errors.

-----

2 points by conanite 5139 days ago | link

Mac 10.5, mzscheme 4.2.1, ended with no error.

-----