Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web interface times out easily on slow connections #1746

Closed
danielmewes opened this issue Dec 6, 2013 · 17 comments
Closed

Web interface times out easily on slow connections #1746

danielmewes opened this issue Dec 6, 2013 · 17 comments
Assignees
Milestone

Comments

@danielmewes
Copy link
Member

I periodically get "You have been disconnected from the server" messages in the web interface on a certain cluster. After a few seconds it reconnects, and remains connected for a while before it happens again. I don't have any load on the server, so I don't think this is actually a server issue.

The RethinkDB cluster is a 3-node cluster running on CloudSigma in Switzerland, and I'm connecting to the web interface from our office. I think this might be due to the relatively high network latency.
ICMP ping shows round-trip times of between 250 and almost 500 ms. Could that be enough to cause the timeout?

It seems that our timeout might be too short. Or maybe we serialize requests, such that the high latency on the connection has an overly huge impact?

@neumino: Probably your domain. Any ideas?

@mglukhovsky
Copy link
Member

I have noticed this same issue on Digital Ocean, where there is sometimes very high latency.

@neumino
Copy link
Member

neumino commented Dec 7, 2013

The timeout is currently set to 15 seconds.
If we can't retrieve /ajax in 15 seconds, the interface will say that you are disconnected.

The content in /ajax and in /ajax/stat can be pretty big, so yep it's possible to get a timeout if your connection is not fast enough.
A of one node cluster and 10 tables (1 shard per table) requires about 125kB/s.

Two things can be done to fix this problem:

  • Send diff for /ajax
  • Report only the stats used by the ui in /ajax/stat (or less things at least).

@neumino
Copy link
Member

neumino commented Dec 7, 2013

Basically, the problem is not that much about latency but more about speed.

@danielmewes
Copy link
Member Author

I see. Thanks for the explanation.
I had a single table, but with 96 shards (3 replicas + 32 shards). I've since reduced the number of shards and indeed is hasn't happened since then.

I wonder if we could gzip the data as a quick work-around? Most browsers can decode gzipped content transparently.

@mlucy
Copy link
Member

mlucy commented Dec 7, 2013

That would be useful, and would also set the stage for offering gzipped JSON replies to clients, or for gzipping data before writing it to disk.

@mglukhovsky
Copy link
Member

Yes, gzipping would be huge for the web UI in general.

@jdoliner
Copy link
Contributor

jdoliner commented Dec 7, 2013

It would actually be pretty easy to make our HTTP server support gzipping.

@coffeemug
Copy link
Contributor

Note that this is related to #1392.

@coffeemug
Copy link
Contributor

Assigning to @Tryneus. We'll gzip the response of the http server and see what happens.

@Tryneus
Copy link
Member

Tryneus commented Jan 15, 2014

Working on gzip in the server now.

@Tryneus
Copy link
Member

Tryneus commented Jan 15, 2014

This is implemented and up in review 1130.

@coffeemug
Copy link
Contributor

@Tryneus could you post some bandwidth use numbers for the web ui with and without gzip? (chrome console should easily be able to tell you)

@neumino
Copy link
Member

neumino commented Jan 15, 2014

Right click on the top bar > task manager
That should give you the traffic per tab.

@Tryneus
Copy link
Member

Tryneus commented Jan 16, 2014

Oh, I already did that to test, but didn't want to bother typing it up. Here's what I found:
Test was done on a one-node cluster, because cluster overhead would just obscure the numbers we care about. Numbers are total time and latency as seen in the Network graph, averaged. Keep in mind this is over a very fast connection, so these numbers really just show that gzip isn't increasing latency, and that over a slower connection, this will help immensely.


<host>:<port>/ajax/stat uncompressed:
Size: 12.8k
Time: 19.4ms
Latency: 14.8ms

<host>:<port>/ajax/stat compressed:
Size: 1.3k
Time: 17.2ms
Latency: 15ms


<host>:<port>/ajax/directory uncompressed:
Size: 987 bytes
Time: 7.6ms
Latency: 5.8ms

<host>:<port>/ajax/directory compressed:
Size: 513 bytes
Time: 8.6ms
Latency: 6.8ms


I also did some tests with the data explorer using rows from Workload X, and got the following results (leaving out time/latency because they were practically indistinguishable):

r.db('test').table('stress').limit(1):
Uncompressed size: 2.3k
Compressed size: 1.5k

r.db('test').table('stress').limit(10):
Uncompressed size: 22.1k
Compressed size: 12.3k

r.db('test').table('stress').limit(100):
Uncompressed size: 220k
Compressed size: 119k

This request hit the stream batch size, and two responses were seen:
r.db('test').table('stress').limit(1000):
Uncompressed size: 284k, 310k
Compressed size: 153k, 167k


So, as you can see, we get a very good compression ratio on our stats, about 90%, while our directory and arbitrary JSON rows compress about 50%.

@coffeemug
Copy link
Contributor

👍

@Tryneus
Copy link
Member

Tryneus commented Jan 17, 2014

Approved and merged into next in commits 6a1630e and 3b94850. Will be in release 1.12.

@Tryneus Tryneus closed this as completed Jan 17, 2014
@danielmewes
Copy link
Member Author

This is awesome. That should improve things a lot when managing remote clusters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants