New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High CPU usage with SO_LINGER and sudden connection close (4.0.26.Final+) #4449
Comments
soo you have a lot of DirectByteBuffers and MemoryRegionCache entries, but no channel outbound buffer entries there? what exactly are you doing with these buffers? what's your usecase like http proxy etc.? By the way, that's how a memory leak in your code could look like. Not releasing the buffers properly in all cases. Have you tried running with resource leak detector on paranoid mode? |
Thanks Ninja. Replies follow. No this is not a HTTP proxy but a typical request/response model service. We are checking for channel active before writing. I don't think, it is related to that though, because the same should be producible with older Netty version also (may be not in form of too many objects but atleast in CPU usage). |
Accidently closed. Reopening. |
so verify this with leak profiler on paranoid first, because that's how it looks from the screenshot |
Just ran the leak detector with PARANOID level. Was expecting logs from class io.netty.util.ResourceLeakDetector. However, no leak related logs found in 5 min. Though, I will redo the test with bigger period. |
@ashish-tyagi could you check whether these "leaked" buffers have any references or are left with no references for gc? |
@ashish-tyagi can you try to adjust the cache sizes of the PooledByteBufAllocator ? |
@ninja- These buffers (DirectByteBuffer) get collected by GC within a second or two. During high CPU periods, the memory churn is really high. Millions of buffers get created and destroyed in couple of seconds. When we hit the low CPU periods, we can hardly notice presence of DirectByteBuffer objects. The high memory churn is evident in the screen shots attached. During high churn (high CPU also), the memory usage increases by 1-2 GB within a second and then it comes back down to same level, in next second. In contrast, during the low memory churn (low CPU), the memory increases by 2GB in 3 min (i.e 11 MB/sec). Nothing changes from incoming requests perspective. |
@normanmaurer We are using the default cache sizes. Can you please suggest test values for the cache sizes? |
@ashish-tyagi - It depends on which size buffers you allocate the most. You can also use the PoolArenaMetric exposed by PooledByteBufAllocator to get more insight. |
@ashish-tyagi ping |
Sorry guys, it is a lot of time without an update. This was holiday season in India. Will give you an update by tomorrow. |
Will try to produce in a standalone test case. |
Thanks!
|
@ashish-tyagi yeah looks like there might be a bug involved, could you consider moving to epoll for now? |
I’m sure if there is a bug it has nothing to do with using epoll or not …
|
well if there is it seems to come from nio backend at least |
Tried to figure out the exact version between 4.0.18.Final and 4.0.29.Final, where the problem started producing. It seems that it started producing with 4.0.26.Final (changing this issue title to reflect that). However, the behavior of 4.0.26.Final is different from 4.0.29.Final in that 4.0.26.Final is heavy on GC too, because of this issue: #3913, which got fixed here: 275e2f0#diff-cfab6ab9111be41b2f39beb8d154e79fR540 Adding CPU/memory usage and memory profile of versions 4.0.26.final to 4.0.28.Final. |
Produced the issue in a standalone test case. Server Environment Steps to produce
Server code import static io.netty.handler.codec.http.HttpHeaders.Names.CONNECTION;
import static io.netty.handler.codec.http.HttpHeaders.Names.CONTENT_LENGTH;
import static io.netty.handler.codec.http.HttpHeaders.Values.KEEP_ALIVE;
import io.netty.bootstrap.ServerBootstrap;
import io.netty.channel.ChannelHandlerContext;
import io.netty.channel.ChannelInitializer;
import io.netty.channel.ChannelOption;
import io.netty.channel.EventLoopGroup;
import io.netty.channel.SimpleChannelInboundHandler;
import io.netty.channel.nio.NioEventLoopGroup;
import io.netty.channel.socket.SocketChannel;
import io.netty.channel.socket.nio.NioServerSocketChannel;
import io.netty.handler.codec.http.DefaultFullHttpResponse;
import io.netty.handler.codec.http.FullHttpRequest;
import io.netty.handler.codec.http.FullHttpResponse;
import io.netty.handler.codec.http.HttpHeaders;
import io.netty.handler.codec.http.HttpObjectAggregator;
import io.netty.handler.codec.http.HttpResponseStatus;
import io.netty.handler.codec.http.HttpServerCodec;
import io.netty.handler.codec.http.HttpVersion;
public class TestServer {
public static void main(String[] args) throws Exception {
new TestServer(4000).start();
}
private final int port;
private final int availableProcessors = Runtime.getRuntime().availableProcessors();
public TestServer(int port) {
this.port = port;
}
public void start() throws Exception {
getServerBootstrap().bind(port).sync();
System.out.printf("Netty Server started at port [%s]\n", port);
}
public ServerBootstrap getServerBootstrap() {
ServerBootstrap bootstrap = new ServerBootstrap();
final EventLoopGroup bossGroup = new NioEventLoopGroup(1);
final EventLoopGroup workerGroup = new NioEventLoopGroup(availableProcessors);
bootstrap.group(bossGroup, workerGroup).channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel ch) throws Exception {
ch.pipeline().addLast(new HttpServerCodec());
ch.pipeline().addLast(new HttpObjectAggregator(1048576));
ch.pipeline().addLast(new EchoHandler());
}
});
bootstrap.childOption(ChannelOption.SO_LINGER, 1000);
return bootstrap;
}
public static class EchoHandler extends SimpleChannelInboundHandler<FullHttpRequest> {
@Override
public void channelRead0(final ChannelHandlerContext ctx, FullHttpRequest req) throws Exception {
if (ctx.channel().isActive()) {
FullHttpResponse response = new DefaultFullHttpResponse(HttpVersion.HTTP_1_1, HttpResponseStatus.OK,
req.content().copy());
HttpHeaders.setIntHeader(response, CONTENT_LENGTH, req.content().readableBytes());
HttpHeaders.setHeader(response, CONNECTION, KEEP_ALIVE);
ctx.channel().write(response);
ctx.flush();
}
}
}
} |
@ashish-tyagi will try to reproduce and fix... thanks! Did you also try using netty 4.0.33.Final ? |
@ashish-tyagi btw what is a little bit confusing is that in your original report you was talking about PoolThreadCache while in your reproducer you are using not the PooledByteBufAllocator. Is this by mistake or not ? |
Hi @normanmaurer, We use PooledByteBufAllocator.DEFAULT in production. The reproducer works both with or without PooledByteBufAllocator. I originally believed that there are too many instances getting created, and that is what is causing the CPU load. With this reproducer, I think the root cause lies somewhere else (may be a infinite loop). Also, the issue produces only when SO_LINGER is set. |
@normanmaurer The issue of high CPU usage remain in 4.0.33.Final also. The memory profile is different though from 4.0.29.Final. I am changing the title again to put high CPU as the main issue, |
@ashish-tyagi thanks for the clarification. |
@normanmaurer Sure would try this by tonight. |
@normanmaurer Yes returning null fixed the issue. |
Interesting... Will habe a look now
|
@ashish-tyagi I now tried to reproduce it for 2 hours without any luck :( As you can reproduce it quite easily can you give me some more infos about the client that runs ab ? like what version, what os etc ? |
@ashish-tyagi are you talking about a "short spike" a.k.a 1 sec or a long time of high cpu usage ? |
Hi @normanmaurer, I looked at the code and it seemed that the CPU spike should remain for a time equal to the SO_LINGER. However, in my setup, I have seen that sometimes the CPU usage comes down in 3-4 seconds, but most of the times it stays high even after 15 minutes. There is no new traffic. |
Client details: As far as I know, all software are standard available, and there are no custom patches. |
@ashish-tyagi can you do me a last favour and alter NioSocketChannel.closeExecutor like this: @Override
protected Executor closeExecutor() {
try {
if (javaChannel().isOpen() && config().getSoLinger() > 0) {
return ForkJoinPool.commonPool();
}
} catch (Throwable ignore) {
// Ignore the error as the underlying channel may be closed in the meantime and so
// getSoLinger() may produce an exception. In this case we just return null.
// See https://github.com/netty/netty/issues/4449
}
return null;
} Would be interesting to know if this helps at all. |
Hi @normanmaurer, Tried the new code. It is still the same. Btw you can ask me for testing a code as many times as you like. I really appreciate your effort for this issue in particular and for open source in general. I can not share the environment where this issue produces easily, as that belong to corp network. I am trying to produce the issue in Google cloud. If that works I will share the machine instances with you. |
@ashish-tyagi you could also try sampling CPU with VisualVM to confirm where the busy loop is. |
@johnou The CPU profile is already attached in the comment dating Nov 14, 2015. The busy loop is NioEventLoop which keeps on reading the same key, which keeps on failing because of underlying connection being closed, which causes Netty to schedule a close() in an executor. |
The executor which issues a close() gets stuck. I see these stack traces. These are not recovering either. Ideally the close() should block max for 1 sec (=SO_LINGER).
|
I measure the time for which the threads were stuck in close(), it was equal to SO_LINGER. I was under the assumption that SO_LINGER is in milliseconds, turns out that it is in seconds. |
@normanmaurer The problem is that if the channel is not closed in IO thread, the IO thread goes in an infinite loop, issuing close() again and again. You can easily produce this issue even without SO_LINGER. Just comment out the code in |
@ashish-tyagi thanks a lot for all the infos. I now know what is wrong and started working on a fix. Without all your help I would never found the issue! |
@normanmaurer any chance the fix for this will make it into netty-4.0.34.Final? |
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: - Add preClose() method that can be overriden. - Implement it for NIO and EPOLL implementions when SO_LINGER is used. Result: No more cpu spin.
@ashish-tyagi @johnou was on vacation... here is the proper fix: #4548 :) |
Nice. I would test and verify. |
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
This was fixed |
@ashish-tyagi Could I ask what is the monitoring tool in the picture? |
@liboLiao VisualVM |
Motivation: We need to remove all registered events for a Channel from the EventLoop before doing the actual close to ensure we not produce a cpu spin when the actual close operation is delayed or executed outside of the EventLoop. Modifications: Deregister for events for NIO and EPOLL socket implementations when SO_LINGER is used. Result: No more cpu spin.
We are using Netty 4.0.29.Final in one of our services. We are seeing high CPU usage (~100%) in certain periods and almost negligible CPU usage in remaining periods. Though there is no change in request rate. The request rate itself is very small (less than 100 per second per box). The service mostly does an inmemory cache lookup. The CPU and memory graph is below:
We found that what differentiated periods of high CPU usage and periods of low CPU usage,was the amount of objects created per second (Notice the zigzag during high CPU phase and a smooth line during low CPU). When we did memory profiling during the high CPU periods, we saw that more than 90% instances are of PoolThreadCache$MemoryReagionCache$Entry, DirectByteBuffer, Recycler$DefaultHandle and Integer. The screenshot is below:
When we tried Netty 4.0.33.Final in one of our boxes, we found that the DirectByteBuffer instances remained the same but the instances of PoolThreadCache$MemoryReagionCache$Entry dropped drastically (~600). This was possibly because of this PR: #4071
I scanned Netty commit logs to find all the changes that have went in for PoolThreadCache. There was one major PR (#2264), that went with Netty 4.0.18.Final.
We tried with the previous Netty version 4.0.17.Final with one of the production instances. There was hardly any CPU usage for this instance. The graphs and memory profiler report are attached below. These were taken at the same time as the previous graphs, under exactly same load.
The text was updated successfully, but these errors were encountered: