New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mds: fix client cap/message replay order on restart #7199
Conversation
for (map<inodeno_t, list<MDSInternalContextBase*> >::iterator p = cap_reconnect_waiters.begin(); | ||
p != cap_reconnect_waiters.end(); | ||
++p) | ||
mds->queue_waiters(p->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this the right place to requeue any remaining cap flushes? I don't think it has much to do with exporting caps...in fact I don't think we should even have any leftover flushes, should we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we should not have any leftover flushes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case can we put it somewhere less buried?
And perhaps we should print out central log warnings for any unhandled cap flushes rather than just blindly applying them? Not sure, but I generally don't like quiet cleanups of bugs like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quiet. Locker::handle_client_cap will print error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, true, but those just get buried in the mds log and don't look like a client reconnect bug. Anything remaining here is a specific, special category of unknown ino, right?
Maybe we don't need anything special flagging it but such things make me nervous.
I think you're right about this basic scheme being fine. Please remember to describe what testing you've performed when submitting PRs (especially for stable branches!). I know it'll be annoying to set up but we need a test to make sure this behavior doesn't regress in the future. |
I run a modified ceph-mds, which does not write journal and crashes on setattr. then run following commands fstest create testfile 0644 The second chown make the modified ceph-mds crash. Then run unmodified ceph-mds. Without the this fix, 'fstest lstat testfile uid,gid' return 65534,65533, with this fix, it return 65534,65532 |
Yeah, we need to automate that test. I'm not sure if we should just add another config kill point or if the timing is certain enough that we can just kill the mds via a ceph-qa-suite task set up to push this process. |
During MDS recovers, client may send same cap flushes twice. The first time is when MDS enters reconnect stage (only for flushing caps which are also revoked), the second time is when MDS goes active. If we send cap flushes when enters reconnect stage, we should avoiding sending again. Signed-off-by: Yan, Zheng <zyan@redhat.com>
Client may flush and drop caps at the same time. If client need to send cap reconnect before the caps get flushed. The issued caps in the cap reconnect does not include the flushing caps. When choosing lock states we should consider the flushing caps. The check for caps haven't been flushed is wrong, fix it. Signed-off-by: Yan, Zheng <zyan@redhat.com>
a750c36
to
23c7923
Compare
When handling client caps in clientreplay stage, it's possible that corresponding inode does not exist because the client request which creates inode hasn't been replayed. To handle this corner case, we delay handling caps message until corresponding inode is created. Fixes: ceph#14254 Signed-off-by: Yan, Zheng <zyan@redhat.com>
Client re-send cap flush when MDS restarts. The cap flush message may release some caps even if the corresponding flush is already completed. Fixes: ceph#13546 Signed-off-by: Yan, Zheng <zyan@redhat.com>
The option creates long-standing unsafe MDS requests. It help in testing unsafe request related corner cases. Signed-off-by: Yan, Zheng <zyan@redhat.com>
…-fs-testing #7199 Reviewed-by: Greg Farnum <gfarnum@redhat.com>
mds: fix client cap/message replay order on restart Reviewed-by: Greg Farnum <gfarnum@redhat.com>
No description provided.