Bug #14080
closedceph-disk: use blkid instead of sgdisk -i
0%
Description
On CentOS 7, on OpenStack (udevadm settle may take 45 sec or more). This blocks testing.
DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb INFO:ceph-disk:Running command: /usr/bin/udevadm settle Traceback (most recent call last): File "/sbin/ceph-disk", line 4036, in <module> main(sys.argv[1:]) File "/sbin/ceph-disk", line 3990, in main args.func(args) File "/sbin/ceph-disk", line 2849, in main_destroy zap(base_dev) File "/sbin/ceph-disk", line 1289, in zap raise Error(e) __main__.Error: Error: Command '['/usr/bin/udevadm', 'settle']' returned non-zero exit status 1
Files
Updated by Loïc Dachary over 8 years ago
Nothing conclusive on dmesg. After a while (a few minutes at most), it comes back to normal.
Updated by Loïc Dachary over 8 years ago
Could be that it takes more than 120 seconds.
udevadm settle [options] Watches the udev event queue, and exits if all current events are handled. --timeout=seconds Maximum number of seconds to wait for the event queue to become empty. The default value is 120 seconds. A value of 0 will check if the queue is empty and always return immediately.
Updated by Loïc Dachary over 8 years ago
- Status changed from New to Fix Under Review
- Assignee set to Loïc Dachary
Updated by Loïc Dachary over 8 years ago
- Blocks Bug #13970: ceph-disk list fails on /dev/cciss!c0d0 added
Updated by Loïc Dachary over 8 years ago
- Blocks deleted (Bug #13970: ceph-disk list fails on /dev/cciss!c0d0)
Updated by Loïc Dachary over 8 years ago
- Blocked by Bug #13988: new OSD re-using old OSD id fails to boot added
Updated by Loïc Dachary over 8 years ago
- Subject changed from ceph-disk: zap mail fail on udevadm settle to ceph-disk: may timeout on udevadm settle / partprobe
Updated by Loïc Dachary over 8 years ago
Immediately after a disk was zapped, udevadm settle waited for ~2mn30 on an OpenStack instance: this is more than the default 2mn timeout.
2015-12-16 12:06:28,567.567 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb 2015-12-16 12:06:29,074.074 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating 2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:backup header from main header. 2015-12-16 12:06:29,075.075 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk: 2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options 2015-12-16 12:06:29,708.708 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables. 2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk: 2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk! 2015-12-16 12:06:29,709.709 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk: 2015-12-16 12:06:30,854.854 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:**************************************************************************** 2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk 2015-12-16 12:06:30,855.855 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended. 2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:**************************************************************************** 2015-12-16 12:06:30,856.856 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table. 2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot. 2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or 2015-12-16 12:06:30,857.857 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:other utilities. 2015-12-16 12:06:30,858.858 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb 2015-12-16 12:06:31,877.877 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Creating new GPT entries. 2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table. 2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The new table will be used at the next reboot. 2015-12-16 12:06:31,878.878 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:The operation has completed successfully. 2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb 2015-12-16 12:06:31,879.879 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 2015-12-16 12:08:56,623.623 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb 2015-12-16 12:08:57,140.140 INFO:tasks.workunit.client.0.target167114224086.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
Updated by Mykola Dvornik over 8 years ago
CentOS Linux 7.1.1503 Core
sdb is a spinning drive (HGST HUS724020ALA640)
solved by putting time.sleep(5) before command_check_call(['partprobe', dev]) in ceph-disk
8251 [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdb 8252 [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:d92ad27f-caa5-47d7-a2e2-a403eda815ca --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdb 8253 [004-s-ragnarok][DEBUG ] Creating new GPT entries. 8254 [004-s-ragnarok][DEBUG ] The operation has completed successfully. 8255 [004-s-ragnarok][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdb 8256 [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle 8257 [004-s-ragnarok][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdb 8258 [004-s-ragnarok][WARNING] Error: Error informing the kernel about modifications to partition /dev/sdb1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdb1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. 8259 [004-s-ragnarok][WARNING] Error: Failed to add partition 1 (Device or resource busy) 8260 [004-s-ragnarok][WARNING] Traceback (most recent call last): 8261 [004-s-ragnarok][WARNING] File "/sbin/ceph-disk", line 3576, in <module> 8262 [004-s-ragnarok][WARNING] main(sys.argv[1:]) 8263 [004-s-ragnarok][WARNING] File "/sbin/ceph-disk", line 3530, in main 8264 [004-s-ragnarok][WARNING] args.func(args) 8265 [004-s-ragnarok][WARNING] File "/sbin/ceph-disk", line 1893, in main_prepare 8266 [004-s-ragnarok][WARNING] luks=luks 8267 [004-s-ragnarok][WARNING] File "/sbin/ceph-disk", line 1594, in prepare_dev 8268 [004-s-ragnarok][WARNING] raise Error(e) 8269 [004-s-ragnarok][WARNING] __main__.Error: Error: Command '['/sbin/partprobe', '/dev/sdb']' returned non-zero exit status 1 8270 [004-s-ragnarok][ERROR ] RuntimeError: command returned non-zero exit status: 1 8271 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdb /dev/sda 8272 [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs
Updated by Matthew Taylor over 8 years ago
As I had mentioned in ceph-users (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html), I worked around this issue by using partx instead of partprobe:
[root@kvsrv02 ~]# diff -u /usr/sbin/ceph-disk_orig /usr/sbin/ceph-disk
@ -35,6 +35,7 @@
--- /usr/sbin/ceph-disk_orig 2015-12-16 05:05:19.636866273 +0000
+++ /usr/sbin/ceph-disk 2015-12-16 05:27:07.905817825 +0000
import shlex
import pwd
import grp
+from subprocess import call
"""
Prepare:
@ -1218,7 +1219,8
@
"""
LOG.debug('Calling partprobe on %s device %s', description, dev)
command_check_call(['udevadm', 'settle'])
- command_check_call(['partprobe', dev])
+ call(['partx', '-a', dev])
+ #command_check_call(['partprobe', dev])
command_check_call(['udevadm', 'settle'])
def zap(dev):@
I also tried what Mykola did on a different Ceph node (sleep for 5 before invoking partprobe) without any success:
http://pastebin.com/raw/Q26CeUge
More info if required about the Ceph node that I tested the above on (our cluster all run identical hardware):
@[root@kvsrv03 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
[root@kvsrv03 ~]# uname -r
3.10.0-327.3.1.el7.x86_64
[root@kvsrv03 ~]# ceph --version
ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299)
[root@kvsrv03 ~]# lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sda
[0:0:1:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdb
[0:0:2:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdc
[0:0:3:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdd
[0:0:4:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sde
[0:0:5:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdf
[0:0:6:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdg
[0:0:7:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdh
[0:0:8:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdi
[0:0:9:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdj
[0:0:10:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdk
[0:0:11:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdl
[0:0:12:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdm
[0:0:13:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdn
[0:0:14:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdo
[0:0:15:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdp
[0:0:16:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdq
[0:0:17:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdr
[0:0:18:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sds
[0:0:19:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdt
[0:0:20:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdu
[0:0:21:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdv
[0:0:22:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdw
[0:0:23:0] disk ATA INTEL SSDSC2BG40 DL27 /dev/sdx
[0:2:0:0] disk DELL PERC H730P Mini 4.25 /dev/sdy
[root@kvsrv03 ~]# smartctl -i /dev/sdx
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-327.3.1.el7.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSC2BG400G4R
Serial Number: BTHC509204U0400VGN
LU WWN Device Id: 5 5cd2e4 04b79b8f8
Add. Product Id: DELL
Firmware Version: G201DL27
User Capacity: 400,088,457,216 bytes [400 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (unknown minor revision code: 0x006d)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Dec 16 23:08:32 2015 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled@
Updated by Loïc Dachary over 8 years ago
@Matthew : using partx is appealing (and that's what ceph-disk used in previous versions of Ceph). The reason partprobe was prefered is that it has a behavior that is easier to predict and debug.
In the past that perceived stability / robustness was a matter of opinion and debate. Now that we have ceph-disk integration tests, we can actually verify by running it over and over if it's true or not. These integration tests are only two month old and it's too early to draw a conclusion. But I'm hopeful that we'll eventually have enough tests involving border cases to try and substitute partprobe with partx to verify it is at least as good.
So, in a nutshell, I'm not saying partx should not be used, nor that partprobe is definitely more stable. But I'm currently working with partprobe to make a more robust ceph-disk that works everywhere.
Regarding the error mentionned in http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-December/006780.html I think the right fix is to increase the udevadm settle timeout and retry partprobe a few times should it give up too soon (because it will eventually succeed).
Would you mind trying https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk to verify it works for you ? I'm in the process of testing it and it would be great to have your feedback (the details of the changes are at https://github.com/ceph/ceph/pull/6926)
Updated by Matthew Taylor over 8 years ago
Loic,
15 seconds sleep (failed): http://pastebin.com/g5J5G5Wq
25 second sleep (failed): http://pastebin.com/mjMfsAFa
35 second sleep (failed): http://pastebin.com/hsrPLYzi
40 second sleep (failed): http://pastebin.com/ShbvQKZG
45 second sleep (successful): http://pastebin.com/PAyADKiJ
Using partx instead of partprobe, no sleep (successful): http://pastebin.com/dasLxxt8
Note: disks were zapped prior to running 'ceph-deploy disk prepare'
I tried just running sgdisk outside of ceph-deploy, which was successful, including running partprobe: http://pastebin.com/VbQii1gS
Note: I had to adjust the partition name as bash was trying to look for a file named 'journal' since there was a space there.
I will try https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk as requested, please give me a few minutes.
Updated by Matthew Taylor over 8 years ago
Tried using https://raw.githubusercontent.com/dachary/ceph/ff273c9654b27b629c3e9e8f06c868ce225eb458/src/ceph-disk (successful):
Tested on another node to clarify (also successful):
I'll finish preparing another 10 or 15 disks now.
Updated by Matthew Taylor over 8 years ago
- File kvsrv01.log kvsrv01.log added
I looped through 9 disks on one of our Ceph servers using the ceph-disk script above, 8 failed and 1 completed.
Log is attached!
Updated by Loïc Dachary over 8 years ago
I filed http://tracker.ceph.com/issues/14099 so ceph-deploy stops calling partx / partprobe (I noticed from your log that it does).
Updated by Loïc Dachary over 8 years ago
@Matthew the log is very useful, thank you. It would be great if you could repeat the same operation without ceph-deploy so that there are no interfering partx -a launched by ceph-deploy while ceph-disk is running. If it turns out to fail as well, it will narrow down the source of errors.
Note that I think partx -a looks like it succeeds because it is non blocking (i.e. it signals the kernel about the partition change but does not wait for a confirmation in the same way partprobe does). In your case (i.e. only adding different disks with colocated journals) it works fine because there is no chance of a race.
Updated by Loïc Dachary over 8 years ago
[kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:cddb96da-6ab0-49fc-9bdc-16c40fe59e03 --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdq [kvsrv01][DEBUG ] The operation has completed successfully. [kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdq [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored) [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored) [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03 [kvsrv01][WARNING] DEBUG:ceph-disk:Journal is GPT partition /dev/disk/by-partuuid/cddb96da-6ab0-49fc-9bdc-16c40fe59e03 [kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid [kvsrv01][WARNING] DEBUG:ceph-disk:get_dm_uuid /dev/sdq uuid path is /sys/dev/block/65:0/dm/uuid [kvsrv01][WARNING] DEBUG:ceph-disk:Creating osd partition on /dev/sdq [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/sgdisk --largest-new=1 --change-name=1:ceph data --partition-guid=1:8261ff6e-545b-42ee-99b1-92ac86bc0af3 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be -- /dev/sdq [kvsrv01][DEBUG ] The operation has completed successfully. [kvsrv01][WARNING] DEBUG:ceph-disk:Calling partprobe on created device /dev/sdq [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored) [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored) [kvsrv01][WARNING] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [kvsrv01][WARNING] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdq [kvsrv01][WARNING] DEBUG:ceph-disk:partprobe /dev/sdq failed : unknown error (ignored) [kvsrv01][WARNING] Traceback (most recent call last): [kvsrv01][WARNING] File "/sbin/ceph-disk", line 4055, in <module> [kvsrv01][WARNING] main(sys.argv[1:]) [kvsrv01][WARNING] File "/sbin/ceph-disk", line 4009, in main [kvsrv01][WARNING] args.func(args) [kvsrv01][WARNING] File "/sbin/ceph-disk", line 1935, in main_prepare [kvsrv01][WARNING] luks=luks [kvsrv01][WARNING] File "/sbin/ceph-disk", line 1634, in prepare_dev [kvsrv01][WARNING] update_partition(data, 'created') [kvsrv01][WARNING] File "/sbin/ceph-disk", line 1262, in update_partition [kvsrv01][WARNING] raise Error('partprobe %s failed : %s' % (dev, error)) [kvsrv01][WARNING] __main__.Error: Error: partprobe /dev/sdq failed : unknown error [kvsrv01][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sdq
The above suggests that trying partprobe in a loop three times is not always enough. The first time it works but the second time it fails, presumably because the kernel needs more time to complete the update of the partition table. The report that partx -a succeeds, suggests that.
Updated by Loïc Dachary over 8 years ago
@Matthew could you please try with the updated https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk ? A diff will show you that it differs from the previous one as follows:
- fix the error message
- loop 5 times instead of 3
- sleep 60 seconds on each loop
Thanks !
Updated by Loïc Dachary over 8 years ago
In the following we see that partprobe fails immediately the first time. Since it happens right after a udevadm settle on a machine that is not doing anything except run the test, it is unlikely because a udev event is in flight. Waiting 60 seconds resolves the error which suggests an asynchronous event related to kernel partition tables was in progress and finished.
2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb 2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes. 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s) 2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb 2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
Updated by Matthew Taylor over 8 years ago
Loic,
I've emailed you directly! :)
Matt.
Updated by Loïc Dachary over 8 years ago
- Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Loïc Dachary over 8 years ago
- Copied to Backport #14145: infernalis: ceph-disk: use blkid instead of sgdisk -i added
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Fix Under Review
Updated by Loïc Dachary over 8 years ago
- Subject changed from ceph-disk: may timeout on udevadm settle / partprobe to ceph-disk: use blkid instead of sgdisk -i
Updated by Loïc Dachary over 8 years ago
- Related to deleted (Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1')
Updated by Loïc Dachary over 8 years ago
- Related to Bug #14094: ceph-disk list: OSError: [Errno 2] No such file or directory: '/dev/vdb1' added
Updated by Loïc Dachary over 8 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Darrell Enns over 8 years ago
I'm still seeing some failures, even with the 5x 60s delay.
System info:
OS: centOS 7.2.1511
HBA: IBM ServeRAID 8k/8k-l8 (Adaptec AAC-RAID), configured as 1 disk per volume
Ceph: Infernalis (9.2.0-0.el7)
[ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on zapped device /dev/sdf [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_btrfs [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_btrfs [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_btrfs [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_btrfs [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type [ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid [ceph01][WARNIN] INFO:ceph-disk:Will colocate journal with data on /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid [ceph01][WARNIN] DEBUG:ceph-disk:get_dm_uuid /dev/sdf uuid path is /sys/dev/block/8:80/dm/uuid [ceph01][WARNIN] DEBUG:ceph-disk:Creating journal partition num 2 size 5120 on /dev/sdf [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/sgdisk --new=2:0:5120M --change-name=2:ceph journal --partition-guid=2:bc22d537-a8e6-4bfb-8bc8-60d24652763a --typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdf [ceph01][DEBUG ] The operation has completed successfully. [ceph01][WARNIN] DEBUG:ceph-disk:Calling partprobe on prepared device /dev/sdf [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdf2 un til you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] (ignored, waiting 60s) [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdf2 un til you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] (ignored, waiting 60s) [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdf2 un til you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] (ignored, waiting 60s) [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdf2 un til you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] (ignored, waiting 60s) [ceph01][WARNIN] INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600 [ceph01][WARNIN] INFO:ceph-disk:Running command: /sbin/partprobe /dev/sdf [ceph01][WARNIN] DEBUG:ceph-disk:partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/sdf2 un til you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] (ignored, waiting 60s) [ceph01][WARNIN] Traceback (most recent call last): [ceph01][WARNIN] File "/sbin/ceph-disk", line 4031, in <module> [ceph01][WARNIN] main(sys.argv[1:]) [ceph01][WARNIN] File "/sbin/ceph-disk", line 3985, in main [ceph01][WARNIN] args.func(args) [ceph01][WARNIN] File "/sbin/ceph-disk", line 1909, in main_prepare [ceph01][WARNIN] luks=luks [ceph01][WARNIN] File "/sbin/ceph-disk", line 1506, in prepare_journal [ceph01][WARNIN] return prepare_journal_dev(data, journal, journal_size, journal_uuid, journal_dm_keypath, cryptsetup_parameters, luks) [ceph01][WARNIN] File "/sbin/ceph-disk", line 1421, in prepare_journal_dev [ceph01][WARNIN] update_partition(journal, 'prepared') [ceph01][WARNIN] File "/sbin/ceph-disk", line 1261, in update_partition [ceph01][WARNIN] raise Error('partprobe %s failed : %s' % (dev, error)) [ceph01][WARNIN] __main__.Error: Error: partprobe /dev/sdf failed : Error: Error informing the kernel about modifications to partition /dev/sdf2 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/ sdf2 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. [ceph01][WARNIN] Error: Failed to add partition 2 (Device or resource busy) [ceph01][WARNIN] [ceph01][ERROR ] RuntimeError: command returned non-zero exit status: 1
Updated by Wes Dillingham over 8 years ago
I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-disk
I am running:
CentOS Linux release 7.2.1511 (Core)
Controller:
Dell Perc h700
I am seeing approximately a 30% osd build success rate.
Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?
Updated by Wes Dillingham over 8 years ago
Wes Dillingham wrote:
I am experiencing an identical issue with the updated ceph-disk as provided here:
https://raw.githubusercontent.com/dachary/ceph/f4cd0d45be68900c223f779e7d80b85828a27382/src/ceph-diskI am running:
CentOS Linux release 7.2.1511 (Core)Controller:
Dell Perc h700I am seeing approximately a 30% osd build success rate.
Is the above ceph-disk the latest attempt at resolving this issue or might there be another version?
Edit: I am colocating my journal and data partitions:
ceph-deploy osd create --fs-type xfs ceph-osd06:/dev/sdb
Updated by Wes Dillingham over 8 years ago
I adjusted it to try 20 times instead of 5 (waits 20 minutes) and only got a negligible improvement (1 more osd was deployed out of 36) so I am not sure that this is in fact a timing issue.
I also get the following error in /var/log/ceph/ceph-osd.admin.log each time it iterates over attempting to probe the disk, not sure if related.
2016-02-01 14:46:05.958049 7f0bca77c900 0 ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 27389
2016-02-01 14:46:05.958063 7f0bca77c900 -1 unrecognized arg --get-device-fsid
Updated by Loïc Dachary over 8 years ago
There are a number of other problems that were fixed recently and that should be backported. I'm not sure how we should do it just yet though. Ideally most of https://github.com/ceph/ceph/pull/6926 and https://github.com/ceph/ceph/pull/5999 should be in infernalis and we should go over each modification to figure out what to backport.
Updated by Loïc Dachary over 8 years ago
- Backport changed from infernalis to infernalis,hammer
Updated by Loïc Dachary over 8 years ago
- Copied to Backport #14602: hammer: ceph-disk: use blkid instead of sgdisk -i added
Updated by Loïc Dachary over 8 years ago
- Status changed from Pending Backport to Resolved