Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a limit to the number of in-flight requests that a server processes. #6207

Merged
1 commit merged into from Apr 2, 2015

Conversation

brendandburns
Copy link
Contributor

@smarterclayton
Copy link
Contributor

What happens when all the inflight requests are consumed by watches? Starvation?

@brendandburns
Copy link
Contributor Author

yes. I suppose we might want to have a separate counter for watches?

@brendandburns
Copy link
Contributor Author

Or we could just ignore watches for now. We managed to DOS an apiserver out of file descriptors by spamming it. That's bad :P hence this PR.

@smarterclayton
Copy link
Contributor

We are already talking about pushing some operations deeper into apiserver, so if it's high level for now when those other steps complete it will be easier to move this down.

@@ -66,6 +66,22 @@ func ReadOnly(handler http.Handler) http.Handler {
})
}

// MaxInFlight limits the number of in flight requests to the number of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@ghost
Copy link

ghost commented Mar 31, 2015

How do we monitor the number of inflight requests over time? I'm guessing that the current answer is "we don't", in which case please add an issue requesting this, at least.

@vmarmol vmarmol assigned ghost Mar 31, 2015
@vmarmol
Copy link
Contributor

vmarmol commented Mar 31, 2015

Assigning to @quinton-hoole, feel free to reassign.

@brendandburns
Copy link
Contributor Author

Comments addressed, and #6227 filed for the issue Quinton asked for.

@ghost
Copy link

ghost commented Mar 31, 2015

Getting back to smarterclayton@ 's comment above, surely with the low default that you've set to 20 inflight requests in this PR, watch requests will pretty much always use up all of those, and we'll just serve errors for everything else? Am I missing something?

@smarterclayton
Copy link
Contributor

Also clients can open watches pretty easily, so suddenly our watch could could be 50-100

----- Original Message -----

Getting back to smarterclayton@ 's comment above, surely with the low default
that you've set to 20 inflight requests in this PR, watch requests will
pretty much always use up all of those, and we'll just serve errors for
everything else? Am I missing something?


Reply to this email directly or view it on GitHub:
#6207 (comment)

@ghost
Copy link

ghost commented Mar 31, 2015

One option would be to set the default limit closer to the default per-process file handle limit of 1024. A better solution, as you mentioned, would be to split the limits for watches and other stuff into separate counters.

@alex-mohr
Copy link
Contributor

+1 to separate buckets for long-running requests vs. short ones. May also be worth having separate buckets for different operation types that have different dependencies -- e.g. reading from etcd vs. writing?

Is there a separate issue to call go's equivalent of setrlimit(RLIMIT_NOFILE) to something higher? filedescs are relatively cheap and 1024 is tiny. See golang/go#5949 for sample code?

@brendandburns
Copy link
Contributor Author

Ok, added a regexp to not count watch and proxy, expanded tests to cover that functionality.

@brendandburns
Copy link
Contributor Author

Please take another look.

block.Add(1)
sem := make(chan bool, Iterations)

re := regexp.MustCompile("[.*\\/watch][^\\/proxy.*]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v1beta3 we are not going to be using /watch - we'll be using a query param on the list response. I don't know where that leaves us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the implementation so the test passes when that goes live?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its there now, it's just no one is using it yet. It doesn't have to block this, but it's not clear to me how we tell that.

@brendandburns
Copy link
Contributor Author

Fixed gofmt issues (and installed the git hooks on my new machine... ;)

@ghost
Copy link

ghost commented Apr 2, 2015

Travis is still whining about a build failure. I don't know enough about Travis yet to know whether or not we can trust it's judgement. Other than that, I think that this can be merged.

@zmerlynn
Copy link
Member

zmerlynn commented Apr 2, 2015

Travis kicked.

@ghost
Copy link

ghost commented Apr 2, 2015

Travis still complaining :-(

@ghost
Copy link

ghost commented Apr 2, 2015

PS: zmerlynn@ when you say "Travis kicked", what exactly do you kick?

@smarterclayton
Copy link
Contributor

There's a restart build button in the travis UI if you have commit writes on the underlying repo

On Apr 2, 2015, at 2:26 PM, Quinton Hoole notifications@github.com wrote:

PS: zmerlynn@ when you say "Travis kicked", what exactly do you kick?


Reply to this email directly or view it on GitHub.

@brendandburns
Copy link
Contributor Author

Travis is now happy. I can haz mergez?

@brendandburns
Copy link
Contributor Author

(and you need to login to travis via github)

ghost pushed a commit that referenced this pull request Apr 2, 2015
Add a limit to the number of in-flight requests that a server processes.
@ghost ghost merged commit 4a2000c into kubernetes:master Apr 2, 2015
@timothysc
Copy link
Member

Can folks please label this as perf for tracking.

@roberthbailey roberthbailey added the sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. label Apr 6, 2015
@roberthbailey
Copy link
Contributor

@timothysc done.

@timothysc
Copy link
Member

@wojtek-t have you assessed the impact of this and #6203 yet?

@wojtek-t
Copy link
Member

wojtek-t commented Apr 6, 2015

Sorry - we have a holiday in Poland today - will take a look tomorrow.

@fgrzadkowski
Copy link
Contributor

Side effect of this PR is that density test is creating pods much slower than earlier (~6 pods per second). For 50 node cluster and rc with 1500 replicas (30 pods per node) this gives 300 seconds just to create rc.

@fgrzadkowski
Copy link
Contributor

Actually my previous comment should be in #6203

@timothysc
Copy link
Member

@fgrzadkowski thx, I figured this would be the case, we're currently working through issues with running load-balanced apiservers as a result.

@abhgupta, @smarterclayton, @rrati ^ FYI.

@jayunit100
Copy link
Member

related ???

  • I'm seeing very slow proxy returns in networking.go in my cluster.
  • However, i don't see this when i run locally (i.e. hack/local-up-cluster.sh), so only in a real cluster do these 6 second latencies for requests happen.
INFO: Proxy status call returned in 6.015694382s
INFO: About to make a proxy status call
INFO: Proxy status call returned in 6.013796121s 
...

@timothysc
Copy link
Member

The cluster response curve from *this & #6203 are bad, imho this is not the correct route to take, or at least default off until good defaults can be found. Time to fill a pool has regressed sharply.

image

~2.3 pods per second which is a fairly drastic decrease from about a week ago.

@wojtek-t
Copy link
Member

wojtek-t commented Apr 8, 2015

@timothysc - I agree. I observed a very similar scheduling rate and indeed it's much worse than before it.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/scalability Categorizes an issue or PR as relevant to SIG Scalability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API server should limit the number of concurrent requests it processes