Bug #12297
closedceph-fuse 0.94.2-1trusty segfaults / aborts
0%
Description
Running CephFS for data volumes on a compute cluster.
ceph-fuse aborts and leaves the mount point unusable (transport endpoint not connected).
Except of client log incl. 20 latest ops:
-20> 2015-07-13 12:08:13.905733 7f60be7fc700 20 client.412413 trim_caps counting as trimmed: 100015d4d13.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.875809 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60ac8c6320) -19> 2015-07-13 12:08:13.905744 7f60be7fc700 10 client.412413 put_inode on 100015d4d13.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.875809 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60ac8c6320) -18> 2015-07-13 12:08:13.905755 7f60be7fc700 20 client.412413 trying to trim dentries for 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) parents=0x7f5fb43408e0 0x7f60acc44830) -17> 2015-07-13 12:08:13.905767 7f60be7fc700 15 client.412413 trim_dentry unlinking dn raw_reads.136.raw_reads.296.N2.las in dir 100012fa7e7 -16> 2015-07-13 12:08:13.905770 7f60be7fc700 15 client.412413 unlink dir 0x7f60ac59d5a0 'raw_reads.136.raw_reads.296.N2.las' dn 0x7f5fb43408e0 inode 0x7f60acc44830 -15> 2015-07-13 12:08:13.905773 7f60be7fc700 20 client.412413 unlink inode 0x7f60acc44830 parents now -14> 2015-07-13 12:08:13.905774 7f60be7fc700 10 client.412413 put_inode on 100015d4d14.head(ref=3 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830) -13> 2015-07-13 12:08:13.905785 7f60be7fc700 15 client.412413 unlink removing 'raw_reads.136.raw_reads.296.N2.las' dn 0x7f5fb43408e0 -12> 2015-07-13 12:08:13.905789 7f60be7fc700 20 client.412413 trim_caps counting as trimmed: 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830) -11> 2015-07-13 12:08:13.905801 7f60be7fc700 10 client.412413 put_inode on 100015d4d14.head(ref=2 ll_ref=1 cap_refs={} open={} mode=120777 size=74/0 mtime=2015-07-13 12:08:05.893082 caps=pAsLsXsFscr(0=pAsLsXsFscr) 0x7f60acc44830) -10> 2015-07-13 12:08:13.905840 7f60a3fff700 10 client.412413 _async_dentry_invalidate 'ch' ino 0 in dir 100009de0f2.head -9> 2015-07-13 12:08:13.905849 7f606bfff700 15 client.412413 de raw_reads.124.raw_reads.296.C2.las off 6170202972637346 = 0 -8> 2015-07-13 12:08:13.906701 7f60a06f7700 2 -- 192.168.2.22:0/20333 >> 192.168.6.5:6810/4247 pipe(0x7f607c030cf0 sd=2 :37085 s=2 pgs=28199 cs=1 l=1 c=0x7f607c034f90).reader couldn't read tag, (11) Resource temporarily unavailable -7> 2015-07-13 12:08:13.906738 7f60a06f7700 2 -- 192.168.2.22:0/20333 >> 192.168.6.5:6810/4247 pipe(0x7f607c030cf0 sd=2 :37085 s=2 pgs=28199 cs=1 l=1 c=0x7f607c034f90).fault (11) Resource temporarily unavailable -6> 2015-07-13 12:08:13.906808 7f60be7fc700 1 client.412413.objecter ms_handle_reset on osd.61 -5> 2015-07-13 12:08:13.906818 7f60be7fc700 1 -- 192.168.2.22:0/20333 mark_down 0x7f607c034f90 -- pipe dne -4> 2015-07-13 12:08:13.907127 7f60be7fc700 10 monclient: renew_subs -3> 2015-07-13 12:08:13.907135 7f60be7fc700 10 monclient: _send_mon_message to mon.mon-i1 at 192.168.6.50:6789/0 -2> 2015-07-13 12:08:13.907141 7f60be7fc700 1 -- 192.168.2.22:0/20333 --> 192.168.6.50:6789/0 -- mon_subscribe({mdsmap=4312+,monmap=14+,osdmap=175910}) v2 -- ?+0 0x7f60ac68f8b0 con 0x3b50160 -1> 2015-07-13 12:08:13.907279 7f606bfff700 -1 *** Caught signal (Segmentation fault) ** in thread 7f606bfff700 ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) 1: ceph-fuse() [0x6235ca] 2: (()+0x10340) [0x7f60c9dd8340] 3: (std::string::assign(std::string const&)+0x1c) [0x7f60c8ff748c] 4: (Client::_readdir_cache_cb(dir_result_t*, int (*)(void*, dirent*, stat*, int, long), void*)+0x39a) [0x55263a] 5: (Client::readdir_r_cb(dir_result_t*, int (*)(void*, dirent*, stat*, int, long), void*)+0xfc5) [0x596045] 6: ceph-fuse() [0x546d2d] 7: (()+0x13e76) [0x7f60ca20fe76] 8: (()+0x1522b) [0x7f60ca21122b] 9: (()+0x11e49) [0x7f60ca20de49] 10: (()+0x8182) [0x7f60c9dd0182] 11: (clone()+0x6d) [0x7f60c875547d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. 0> 2015-07-13 12:08:13.908290 7f60be7fc700 1 -- 192.168.2.22:0/20333 <== mon.3 192.168.6.50:6789/0 266 ==== osd_map(175910..175910 src has 172194..175910) v3 ==== 222+0+0 (8575416 0 0) 0x7f60b00008c0 con 0x3b50160
The OSD mentioned in OP 8 is up and running, but might take some time to respond due to current backfilling operations. The compute jobs are setup to write output to the same file, so several cephfs clients are trying to open a single file for writing.
Updated by Greg Farnum almost 9 years ago
- Project changed from Ceph to CephFS
- Category set to 45
Do you have a core file from this that you can look at with gdb, and the ceph debug packages installed? (You can pull them down if they aren't already present.)
If you can look at it and just paste the output of "backtrace" here that would be very helpful; as it is I'm not sure which of the string users are broken.
Updated by Burkhard Linke almost 9 years ago
Sorry, no core file available. I have a number of hosts with CephFS running without any problems; some other hosts show the described behaviour.
Since some of the compute jobs are still pending I may reactivate one of the nodes in the cluster and hope that it fails again (not 100% reproducible..).
How to I manipulate the limits for fuse mounts? Is setting the limit prior to mounting enough?
Updated by Greg Farnum almost 9 years ago
You mean the ulimit for dumping the core file? That will depend on how you're invoking it; if you're just running "ceph-fuse ..." then you can set "ulimit -c unlimited" in the shell before doing so and that will dump the core, yes.
Updated by Burkhard Linke almost 9 years ago
(gdb) bt
#0 0x00007f70d7bf020b in raise (sig=7)
at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1 0x000000000062368d in reraise_fatal (signum=7)
at global/signal_handler.cc:59
#2 handle_fatal_signal (signum=7) at global/signal_handler.cc:109
#3 <signal handler called>
#4 Client::_readdir_cache_cb (this=this@entry=0x38a8ab0,
dirp=dirp@entry=0x7f6f7c018400, cb=cb@entry=
0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>,
p=p@entry=0x7f707d7cac20) at client/Client.cc:6048
#5 0x0000000000596045 in Client::readdir_r_cb (this=0x38a8ab0,
d=d@entry=0x7f6f7c018400,
cb=cb@entry=0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>, p=p@entry=0x7f707d7cac20) at client/Client.cc:6171
#6 0x0000000000546d2d in fuse_ll_readdir (req=0x7f6f9c004ce0,
ino=1099533194016, size=4096, off=129, fi=<optimized out>)
at client/fuse_ll.cc:582
#7 0x00007f70d8027e76 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#8 0x00007f70d802922b in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#9 0x00007f70d8025e49 in ?? () from /lib/x86_64-linux-gnu/libfuse.so.2
#10 0x00007f70d7be8182 in start_thread (arg=0x7f707d7cb700)
at pthread_create.c:312
#11 0x00007f70d656d47d in clone ()
Updated by Greg Farnum almost 9 years ago
That's pretty bizarre. Signal 7 is SIGBUS "Bus error (bad memory access)", which is separate from SIGSEGV. And given the surrounding code that's pretty well guarded against. In GDB again can you print the value of "dn" and (if it's valid) dn->inode?
[run gdb]
f 4
p dn
p dn->inode
Or you can join us on irc, oftc.net #ceph-devel and ping gregsfortytwo. Thanks!
Updated by Burkhard Linke almost 9 years ago
(gdb) f 4
#4 Client::_readdir_cache_cb (this=this@entry=0x38a8ab0, dirp=dirp@entry=0x7f6f7c018400, cb=cb@entry=
0x547820 <fuse_ll_add_dirent(void*, dirent*, stat*, int, off_t)>, p=p@entry=0x7f707d7cac20) at client/Client.cc:6048
6048 client/Client.cc: No such file or directory.
(gdb) p dn
$1 = (Dentry *) 0x342e746e65696c63
(gdb) p dn->inode
Cannot access memory at address 0x342e746e65696c9b
Updated by Greg Farnum almost 9 years ago
Okay, I think I see what happened, one more please.
[run gdb]
f 4
p dir
p dir->dentry_list
And if you could take the log for the ceph-fuse instance you're looking at now and upload it with ceph-post-file that is also likely to be helpful.
Updated by Burkhard Linke almost 9 years ago
(gdb) p dir
$2 = (Dir *) 0x7f709000f9a0
(gdb) p dir->dentry_list
$3 = {_front = 0x7f6f8002a260, _back = 0x7f70b83aaa60, _size = 9}
The log file has a size of 13 GB and the host has no direct connection to the internet. I can transfer it to another host and upload it, but given its size I would prefer to only upload a part.
Updated by Greg Farnum almost 9 years ago
- Status changed from New to 12
Zheng, can you take a look at this and see if you need the log or can identify the issue? I presume something's gone wrong in advancing to the at_cache_name dentry, but I'm not quite sure how.
Updated by Greg Farnum almost 9 years ago
- Priority changed from Normal to Urgent
Updated by Zheng Yan almost 9 years ago
I can't figure out what's wrong neither
Updated by Greg Farnum almost 9 years ago
We've got logs and core file via ceph-post-file
c00fb309-e2a1-482c-b093-1733affb2901 gzipped core dump
60fbd6a7-ffde-4cf0-988b-a80737a1d1ec ceph-fuse log file
distribution is ubuntu 14.04
ceph-fuse --version: ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
Updated by Greg Farnum almost 9 years ago
Okay, I dug into this today and I think our readdir is a bit broken (obviously). In particular, we're dropping the client lock while invoking the callback, and I can see in the log that one of the times we dropped the lock a whole bunch of dentries got unlinked. So then we get back the lock and try to look at the next dentry in our xlist iterator (which was advanced to a valid dentry while we still had the lock), but the dentry we're trying to look at has (it appears) been freed!
I'm not sure how this is supposed to work; investigating now.
Updated by Greg Farnum almost 9 years ago
- Status changed from 12 to Fix Under Review
I think I have a fix for this issue but I want to get a few eyes on it before handing it off for user testing.
Updated by Greg Farnum almost 9 years ago
Burkhard, if you can test out the hammer-12297 branch on one of the boxes seeing this bug and find out if that fixes it, that would be helpful. I've not yet figured out a good way to reproduce the issue.
Updated by Greg Farnum almost 9 years ago
- Assignee set to Greg Farnum
- Source changed from other to Community (user)
Updated by Burkhard Linke almost 9 years ago
Branch is deployed to the compute cluster host and the application causing the problem is started.
Debug output and core file generation is enabled. Since the problem is not 100% reproducible all cluster nodes are running the new branch.
Testing will take some hours, and if no ceph-fuse process is crashed after that time it may indicate that the fix is successful.
Updated by Burkhard Linke almost 9 years ago
Cluster has been processing data without any problem during the weekend. I consider the bug fixed.
Updated by Greg Farnum almost 9 years ago
- Status changed from Fix Under Review to Pending Backport
- Priority changed from Urgent to Normal
- Backport set to hammer
Excellent! This is merged to master now.
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved