New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ForkJoinPool-based EventLoopGroup #2250
Comments
Depending on the amount of breakage, we might be able to put this into 4.1+. However, it's almost certain we will have to change the signature of some constructors, so 5.0 might be a better target. Doing some prototyping will give us a better idea. |
Play is using ForkJoinPool |
@trustin could you give some more details on how you would ensure the Channel is always handled with the same Thread ? |
@trustin could you answer my question if you have 5 minutes... would really like to understand it in a better way. Thanks buddy! |
@normanmaurer I believe the reason why a channel will always be handled by the same thread is that "Subtasks generated in tasks run by a given worker thread are pushed onto that workers own deque." (Source: http://gee.cs.oswego.edu/dl/papers/fj.pdf). So if an event loop keeps submitting itself for execution it will never be the case that a thread is idle with an empty queue and thus the work stealing mechanism should not be triggered. @trustin I would also please be interested in the rationale behind choosing a My understanding is that the main feature of a Why not keep the current implementation and use a Thanks! |
@jakobbuchgraber Thanks... yeah this makes sense now but I wonder how this will be any better then just do what you suggest and use a ThreadPoolExecutor. So @trustin please enlighten us ;) |
Among other things, of which I am happy to elucidate, a big advantage of using FJ or a similar mechanism here is to allow for managed blocking.
|
@mariusaeriksen ah I see... Makes more sense now :) Did you benchmark a prototype yet (in netty or finagle) or was it just an idea? Thanks for chime in |
@mariusaeriksen It would be extremely nice if you would explain your thinking a bit more :). Also, what would be the main use case for FJ.ManagedBlocker? |
Let me reply to a few different issues here. First, I don’t think Netty should actually do any pool management/scheduling directly — what I suggested to @trustin was more a rephrasing of the selector loops so that they are amenable to a third-party (e.g. finagle) to schedule them in whatever manner is appropriate. In other words: the proposal is to separate Netty from scheduling of threads, and the suggestion is to make each run of the select loop a “task.” @normanmaurer said:
The idea here is actually to avoid this altogether. We can ensure that a channel belongs to a single selector, but that should be the extent of the guarantees afforded to Netty. Being able to migrate channels between selectors would be a bonus. — Now let me try to motivate this a little more. Not being able to block in a Netty-managed is a giant abstraction leak, and also substantially complicates the cost model for programs. We’d like to strike a better balance: it should be o.k. to block, at some expense. Secondly, we have a number of applications for which work distribution is highly modal. There’s either very cheap work, or very expensive work. In the current arrangement, with threads pinned, we get a lot of head-of-line blocking, for even infrequent expensive tasks. This has undue systemic effects: for example, while a thread is handling an expensive task, a timeout may trigger, which might cause a perfectly good request to fail, causing elevated error rates. You might say: but those tasks should be handled by a separate thread pool! See my above comment: in large software, we don’t have a good enough grip of our cost models — indeed, the cost of a request may not even be apparent until very late in the process, complicating scheduling even further. In other words, manually managing computational resources does not scale. Furthermore, down the line there will be support in Linux for user-kernel space thread scheduling. This is inherently compatible with the model proposed above, and would allow us to handle all blocking, not just managed blocking, cheaply. |
that makes sense. thanks @mariusaeriksen . On a lower level view, would this mean that we have to manually put memory barriers in place for custom fields in ChannelHandlers to be visible to other threads or does the FJ framework take care of this? |
What @mariusaeriksen says is basically about allowing a user to dictate whatever thread scheduling he or she wants. We currently can't do that because we create threads for the user. We should instead let the user create threads and decide which thread to run the I/O loop. To make this happen, an I/O event loop has to be transformed into a Because most users don't want such a mighty power to control everything, we should provide the default thread pool (or scheduler) implementation that yields the same behavior with the current Netty thread model, but an advanced user like Finagle will want to implement its own scheduler for full control. @mariusaeriksen, please feel free to correct me if I understood incorrectly. I believe this will introduce quite a bit of core changes in Netty, but it shouldn't be difficult to make the changes. I also think it's OK even if the scheduler (ForkJoinPool or whatever that is) runs the I/O task from different threads for each run as long as the handler methods are invoked in correct order and the visibility problems are ensured by the scheduler. Actually, we won't have any visibility issues with almost any thread pool implementations we got today. |
From the implementation perspective, I think we can achieve this like the following: Step 1. Convert all SingleThreadEventLoop implementations such as NioEventLoop into Runnables that schedules themselves again. SingleThreadEventLoop needs quite a bit of changes. Perhaps it is safe to rename it to 'AbstractEventLoop' because it's agnostic to any thread model. Also, make the EventLoop implementations public so that a user can schedule it by themselves. At this point, a user like Marius can use his own scheduler and plug Netty's event loop into it. Misc. idea: Instead of making an EventLoop a Runnable, we could just add a method like 'asRunnable()'. This way, an ordinary user will not see 'run()' method which they should never call by themselves. Step 2. Modify MultithreadedEventLoopGroup and its subclasses such as NioEventLoopGroup so that it works with the changes made in the step 1. Internally, MultithreadedEventLoopGroup could use ForkJoinPool, but it's not absolute necessity (because at step 1 we made this completely customizable by a user.) Things to consider:
|
or:
and for client:
or:
This is primarily because |
This idea makes me think:
Thoughts? |
If we ca make it happen without force-push I would prefer it. Force push is a mess as we rewrite the history
|
We can do that of course with a giant revert patch + re-application commits. |
At least this would keep history etc... Maybe we should see how much effort it take and if it turns out to be too complicated fallback to force the push. WDYT?
|
what was the issue with deregister in the first place? http://netty.io/wiki/new-and-noteworthy-in-5.x.html#wiki-h3-4 |
@jakobbuchgraber we thought it is pretty hard to manage correct event order etc. I think this is not the case anymore with the proposed changes. |
Motivation: Due to the complexity of handling deregistration and re-registration of a channel, we previously decided to remove the deregister() operation completely to simplify our code. However, we realized that it shouldn't be that complicated to implement it during our discussion about making I/O scheduling more flexible and more customizable [1], and thus the removal of deregistration and re-registration is unnecessary now. Modification: - Revert commit c149f4b - Revert commit e743a27 - Make some additional adjustments Result: - deregister(), fireChannelUnregistered(), and channelRegistered() were added back.. - Channel constructors do not require an EventLoop anymore. [1] #2250
Motivation: As discussed in #2250, it will become much less complicated to implement deregistration and reregistration of a channel once #2250 is resolved. Therefore, there's no need to deprecate deregister() and channelUnregistered(). Modification: - Undeprecate deregister() and channelUnregistered() - Remove SuppressWarnings annotations where applicable Result: We (including @jakobbuchgraber) are now ready to play with #2250 at master
Motivation: As discussed in #2250, it will become much less complicated to implement deregistration and reregistration of a channel once #2250 is resolved. Therefore, there's no need to deprecate deregister() and channelUnregistered(). Modification: - Undeprecate deregister() and channelUnregistered() - Remove SuppressWarnings annotations where applicable Result: We (including @jakobbuchgraber) are now ready to play with #2250 at master
Motivation: As discussed in #2250, it will become much less complicated to implement deregistration and reregistration of a channel once #2250 is resolved. Therefore, there's no need to deprecate deregister() and channelUnregistered(). Modification: - Undeprecate deregister() and channelUnregistered() - Remove SuppressWarnings annotations where applicable Result: We (including @jakobbuchgraber) are now ready to play with #2250 at master
All of this sounds good to me. Excited to see this being worked on! |
@mariusaeriksen same here... I will mentor @jakobbuchgraber during GSOC and so help him to move forward with it. |
I've recently been struggling to migrate channels to different event loops in 4.0 and wound up digging through the open issues to see what is going on with deregister/register. I work on a product that uses Netty under the covers to aggregate multiple backend databases connections and expose a virtual database on frontend sockets. Our workloads aren't as modal as what @mariusaeriksen has described, but the pinned threads are a performance/manageability issue for us. It is counter-intuitive, but having the event loop pinned to a channel to avoid locking, queuing, context switching, etc... actually makes it harder for us to avoid doing these things. Which channels we can use together without locking becomes fixed at channel construction time, and we either have to add locks/atomics/queues to everything so it all works with multiple event loops in play, or we have to organize our work around event loops, creating arenas/pools based on the loops. Anyway, I'm super happy to see a split between threading and netty core. I realize most users won't care, but it will make my life much better. Uh, short version: +1. :P |
@sgossard thanks for your input. I appreciate it. 👍
What's the criteria? How does your netty app make the decision? |
Sorry, my previous post made it sound like we dynamically choose between using locks or arenas. Our previous design used JDBC pooling on the backend, which required multiple threads to process backend connections in parallel. The JDBC driver we were using buffered resultsets in memory which really didn't scale, so we adapted the backend pools to make DB calls via Netty. Our adoption of Netty has been iterative, so our current design still uses multiple event loops and locking/atomics/queuing to ensure thread safety. We are looking to improve our performance and scalability by reducing thread synchronization. Given the existing Netty 4.x threading model, we were considering setting up separate arenas in our backend pools for each event loop, but that was going to require us to make significant changes. I'm excited about the design proposed here, because migrating a channel to a different event loop is much more natural in our existing design. |
So I modified Netty to use a I had a discussion with @normanmaurer earlier today where we talked about the granularity of the parallelism that we would like to introduce with this change. The current implementation keeps everything the same, in that each eventloop is a task that submits itself to the FJP. (Is that good enough for Finagle's use case @mariusaeriksen? Would be great if you could chime in.) However, in order for Netty to be able to use FJP features like managed blocking and work stealing, we need to increase that granularity level while at the same time keeping the guarantee that a One idea by @normanmaurer is to make each Also, @trustin could you please explain in some detail what you had in mind when you wrote
Was it what I just explained? WDYT? Thanks. [1] https://gist.github.com/normanmaurer/987992648523c0fbe16d |
@jakobbuchgraber one small correction here: My idea was not that every I/O event becomes a task but more all events that are not triggered from the EventLoop itself (like it is now) + register and deregister all the time. |
Granularization is a bit tricky. Tasks that are too fine-grained can end up producing a lot of scheduling overhead, while you loose opportunities for parallelism and fine-grained managed blocking for tasks that are too coarse. My opinion is this: separate mechanism from policy. Netty should not be opinionated about this, but rather let the user choose. One way to do this is to create an executor abstraction which is well-defined and which must provide certain guarantees. The default implementation should probably just be pinned threads as today. It's very important (for my use, at least) that Netty provide a means by which the user injects the actual executor. |
@trustin can you also chime in ? |
@normanmaurer will do. didn't have a chance to read the code yet. |
@trustin
If that's good enough we are done. However, what @normanmaurer and I discussed was whether it would make sense to decouple Netty's task execution from the |
Having a per-channel task queue sounds very interesting. It will indeed make the channel migration extremely simple, which is awesome. Also, we can exploit the full potential of managed blocking. However, I guess it will have its own challenges:
Perhaps we could keep the current behavior (i.e. event loop task invokes pipeline directly) and use an alternative |
@normanmaurer and I were in a meeting earlier today and we think that it's best not to have a task queue per Some of the reasons for this are:
|
Motivation: Prior to this commit, Netty's non blocking eventloops were each assigned a fixed thread by which all of the eventloop's i/o and compute work would be performed. While this is a fine approach for most users of Netty, some advanced users require more flexibility in scheduling the eventloops. Modifications: Remove all direct usages of Threads in MultithreadEventExecutorGroup, SingleThreadEventExecutor et al., and introduce an Executor abstraction instead. The way to think about this change is, that each iteration of an eventloop is now a task that gets scheduled in a ForkJoinPool. While the ForkJoinPool is the default, one also has the ability to plug in his/her own Executor (aka thread pool) into a EventLoop(Group). Result: Netty hands of thread management to a ForkJoinPool by default. Users can also provide their own thread pool implementation and get some control over scheduling Netty's eventloops.
Motivation: Prior to this commit, Netty's non blocking eventloops were each assigned a fixed thread by which all of the eventloop's i/o and compute work would be performed. While this is a fine approach for most users of Netty, some advanced users require more flexibility in scheduling the eventloops. Modifications: Remove all direct usages of Threads in MultithreadEventExecutorGroup, SingleThreadEventExecutor et al., and introduce an Executor abstraction instead. The way to think about this change is, that each iteration of an eventloop is now a task that gets scheduled in a ForkJoinPool. While the ForkJoinPool is the default, one also has the ability to plug in his/her own Executor (aka thread pool) into a EventLoop(Group). Result: Netty hands of thread management to a ForkJoinPool by default. Users can also provide their own thread pool implementation and get some control over scheduling Netty's eventloops.
Related issue: #2250 Motivation: Prior to this commit, Netty's non blocking EventLoops were each assigned a fixed thread by which all of the EventLoop's I/O and handler logic would be performed. While this is a fine approach for most users of Netty, some advanced users require more flexibility in scheduling the EventLoops. Modifications: Remove all direct usages of threads in MultithreadEventExecutorGroup, SingleThreadEventExecutor et al., and introduce an Executor abstraction instead. The way to think about this change is, that each iteration of an eventloop is now a task that gets scheduled in a ForkJoinPool. While the ForkJoinPool is the default, one also has the ability to plug in his/her own Executor (aka thread pool) into a EventLoop(Group). Result: Netty hands off thread management to a ForkJoinPool by default. Users can also provide their own thread pool implementation and get some control over scheduling Netty's EventLoops
can be closed? |
Yes ... Just link to the PR
|
That's the pull request that resolved this issue: #2681 |
Closing - Cleaning up 5.0.0.Alpha2 release assignments. Let me know if this is in error.... |
Motivation: As discussed in netty#2250, it will become much less complicated to implement deregistration and reregistration of a channel once netty#2250 is resolved. Therefore, there's no need to deprecate deregister() and channelUnregistered(). Modification: - Undeprecate deregister() and channelUnregistered() - Remove SuppressWarnings annotations where applicable Result: We (including @jakobbuchgraber) are now ready to play with netty#2250 at master
Idea from @mariusaeriksen.
Imagine that we turn a
SingleThreadEventLoop
into aRunnable
(and alsoSingleThreadEventExecutor
into aRunnable
).. and schedule it into a
ForkJoinPool
.The advantage of this change is:
ForkJoinPool
without a hiccup. (i.e. less knobs to play with while achieving same or better performance.)The problem of this change is:
inEventLoop()
has to be implemented differently.)The text was updated successfully, but these errors were encountered: