Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: using inefficient math library causes significant slowdown #18126

Closed
xiang90 opened this issue Dec 3, 2015 · 11 comments
Closed

scheduler: using inefficient math library causes significant slowdown #18126

xiang90 opened this issue Dec 3, 2015 · 11 comments
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence.
Milestone

Comments

@xiang90
Copy link
Contributor

xiang90 commented Dec 3, 2015

We did a few benchmark and profiling tests for the scheduler for 1000 nodes cluster.

We found the scheduler component is the bottleneck of starting pods. And it spents most of the time in an external pkg (https://github.com/kubernetes/kubernetes/blob/master/pkg/api/resource/quantity.go#L27) that handling math operations.

Profiling result for schedule 1000 pods over 1000 nodes:
https://storage.googleapis.com/profiling/org.svg

By giving the library a closer look, I found it does too many unnecessary allocations and causes GC pressure.

By mocking an efficient version of rounding function (do not handle overflow), we reduced nearly 50% of the latency for secluding 1000 pods(23s vs 53s). As we scheduled more pods, the effect getting more significant since it generates more objects per schedule.

New profiling result for schedule 1000 pods over 1000 nodes:
https://storage.googleapis.com/profiling/fix.svg

Any suggestion on how we should improve this?

/cc @hongchaodeng @lavalamp @philips

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 3, 2015

Also /cc @davidopp @wojtek-t @gmarek @timothysc

@lavalamp
Copy link
Member

lavalamp commented Dec 3, 2015

Probably just doing the rounding once per pod (instead of per pod-node combo as I'm sure it does now) would be sufficient--caching or doing something like a constructor.

Alternatively, writing a special-purpose Value/MilliValue function probably could solve this.

@timstclair is making some changes to quantity right now.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 3, 2015

Alternatively, writing a special-purpose Value/MilliValue function probably could solve this.

That is what I did for the mock up. This seems not to be very hard. Or we can fix the upstream library if we want.

@timstclair Any idea?

@timstclair
Copy link

I think once #17548 goes in, we should just optimize ScaledValue(). The math around scaling values up without overflow is a little tricky though (I think we just need to grab the right chunk of bits q.Amount().UnscaledBig().Bits(), and twiddle them to get the base10 scaled value.

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 3, 2015

@timstclair Great! Let me know when it is merged. I am happy to work on the optimization.

@gmarek
Copy link
Contributor

gmarek commented Dec 3, 2015

@xiang90 - this sounds very promising. Thanks a lot!

@davidopp davidopp added team/control-plane priority/backlog Higher priority than priority/awaiting-more-evidence. labels Dec 3, 2015
@davidopp davidopp added this to the v1.2 milestone Dec 3, 2015
@wojtek-t
Copy link
Member

wojtek-t commented Dec 3, 2015

@xiang90 - great job!

@timothysc
Copy link
Member

👍

xiang90 added a commit to xiang90/kubernetes that referenced this issue Dec 4, 2015
The original scale function takes around 800ns/op with more
than 10 allocations. It significantly slow down scheduler
and other components that heavily relys on resource pkg.
For more information see kubernetes#18126.

This pull request tries to optimize scale function. It takes
two approach:

1. when the value is small, only use normal math ops.
2. when the value is large, use math.Big with buffer pool.

The final result is:

BenchmarkScaledValueSmall-4	20000000	        66.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkScaledValueLarge-4	 2000000	       711 ns/op	      48 B/op	       1 allocs/op

I also run the scheduler benchmark again. It doubles the throughput of
scheduler for 1000 nodes case.
@davidopp
Copy link
Member

davidopp commented Dec 9, 2015

ref/ #18418

@kevin-wangzefeng
Copy link
Member

/cc @HardySimpson

@xiang90
Copy link
Contributor Author

xiang90 commented Dec 9, 2015

Closed by #18170

@xiang90 xiang90 closed this as completed Dec 9, 2015
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Jan 25, 2018
simplify openshift controller lease

Origin-commit: 2166a2a977f12b6ce930a8fefdf67ea6c266064a
openshift-publish-robot pushed a commit to openshift/kubernetes that referenced this issue Feb 27, 2018
simplify openshift controller lease

Origin-commit: 2166a2a977f12b6ce930a8fefdf67ea6c266064a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

No branches or pull requests

8 participants