Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test hang on Python connection tests on OS X #3954

Closed
gchpaco opened this issue Mar 20, 2015 · 13 comments
Closed

Test hang on Python connection tests on OS X #3954

gchpaco opened this issue Mar 20, 2015 · 13 comments
Labels
Milestone

Comments

@gchpaco
Copy link
Contributor

gchpaco commented Mar 20, 2015

This came up during work on #2622.

I am experiencing unusual hangs in the testing framework on next around connection.py. Specifically:

(rethinkdb-2.7.9)bishop :: rethinkdb/test/rql_test ‹next› » ./test-runner -v connections/connection.py2.7
Warning: did not find any tests for javascript
Warning: did not find any tests for ruby 2.0
Using rethinkdb binary /Users/graham/wd/rethinkdb/build/debug_clang/rethinkdb
    python 2.7 interpreter: /Users/graham/.pyenv/versions/rethinkdb-2.7.9/bin/python2.7, driver: /Users/graham/wd/rethinkdb/drivers/python
== Starting: connections/connection.py2.7 (T+ 1.2 sec)
Running py connection tests
test_auth_key (__main__.TestNoConnection) ... skipped 'Not testing default port'
test_connect (__main__.TestNoConnection) ... skipped 'Not testing default port'
test_connect_host (__main__.TestNoConnection) ... ok
test_connect_port (__main__.TestNoConnection) ... ok
test_connnect_timeout (__main__.TestNoConnection)
Test that we get a ReQL error if we connect to a non-responsive port ... ok
test_empty_run (__main__.TestNoConnection) ... ok
test_connect_correct_auth (__main__.TestAuthConnection) ... ok
test_connect_long_auth (__main__.TestAuthConnection) ... ok
test_connect_no_auth (__main__.TestAuthConnection) ... ok
test_connect_wrong_auth (__main__.TestAuthConnection) ... ok
test_close_does_not_wait_if_requested (__main__.TestConnection) ... ok
test_close_waits_by_default (__main__.TestConnection) ... ok
test_connect_close_expr (__main__.TestConnection) ... ok
test_connect_close_reconnect (__main__.TestConnection) ... ok
test_db (__main__.TestConnection) ... ok
test_noreply_wait_waits (__main__.TestConnection) ... ok
test_port_conversion (__main__.TestConnection) ... ok
test_reconnect_does_not_wait_if_requested (__main__.TestConnection) ... ok
test_reconnect_waits_by_default (__main__.TestConnection) ... ok
test_repl (__main__.TestConnection) ... ok
test_use_outdated (__main__.TestConnection) ... ok
runTest (__main__.TestPrinting) ... ok
runTest (__main__.TestBatching) ... ok
runTest (__main__.TestGetIntersectingBatching) ... Random seed: 13333679129784110953 >>> Timed out connections/connection.py2.7 after 300 seconds after 300.1 sec (T+ 301.4 sec) <<<

== Failed 1 test (of 1) in 301.63 seconds!
    connections/connection.py2.7

Here I'm using 2.7.9, but I've seen it in 2.6 as well.

@danielmewes danielmewes added this to the 2.0 milestone Mar 20, 2015
@Tryneus
Copy link
Member

Tryneus commented Mar 26, 2015

Just looked into this out of curiosity, and it looks like the server is waiting while part-way through writing a large response to the client (~400KB). There may be a bug in Tornado's BaseIOStream.read_bytes, I doubt this is a problem with the server. Might try different methods of reading in TornadoConnection._reader.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

It happens even in the original connection.py, tho. I doubt it's a problem with Tornado.

@Tryneus
Copy link
Member

Tryneus commented Mar 26, 2015

Ah, then, the problem is probably the server. Looking into it now.

@Tryneus
Copy link
Member

Tryneus commented Mar 26, 2015

It's looking like linux_event_watcher_t isn't waking up on the write available event. 'rdhup' isn't implemented for OSX, so it could be that 'out' is broken as well.

@danielmewes
Copy link
Member

@gchpaco was also looking into this on the client side. But that's interesting. Maybe I broke it with switching the OS X build to kqueue.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

At a suggestion from @danielmewes I tested using the older poll_event_queue_t, which works. So it would appear that the problem is in kqueue_event_queue_t. Digging.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

The hangup appears to be on this query:

            reference = t1.filter(r.row['geo'].intersects(query_circle))\
                          .coerce_to("ARRAY").run(c)

at least most of the time. With 3110841139272090394 as a seed it got as far as the cursor iteration.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

Or maybe not, it's nondeterministic to be sure.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

When I instrumented it it got further, but still hangs on a read halfway through chunking. If I had to guess, we're not using kqueue properly for sending large chunks.

@gchpaco
Copy link
Contributor Author

gchpaco commented Mar 26, 2015

To be clearer, it hangs in SocketWrapper.recvall when the Python code is expecting a very large read, and gets about ¾ of the way through it in two reads and then hangs indefinitely. The size is a bit odd (code is saying "saw chunk of size 212044" and then "saw chunk of size 49440" which does not add to any TCP buffer size I know of), but it appears to require multiple sends on the server side, and the server simply isn't sending the remaining data.

@danielmewes
Copy link
Member

Oh man I had made a stupid bug in the kqueue implementation that would stop it from listening for out/write events if someone was already listening for in/read events on the socket.

Putting the patch up soon.

@danielmewes danielmewes modified the milestones: 1.16.3, 2.0 Mar 26, 2015
@danielmewes
Copy link
Member

Up in CR 2747 by @gchpaco

@danielmewes
Copy link
Member

Fixed in v1.16.x c8d32fc , v2.0.x 55987a6 and next 734e6f5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants