Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bluestore: latest and greatest #6896

Merged
merged 197 commits into from Jan 3, 2016
Merged

bluestore: latest and greatest #6896

merged 197 commits into from Jan 3, 2016

Conversation

liewegas
Copy link
Member

rename newstore -> bluestore
use a block device
back rocksdb with BlueRocksEnv, which share block device(s) with bluestore

@yuyuyu101
Copy link
Member

COOL, so fast

@dzafman
Copy link
Contributor

dzafman commented Dec 11, 2015

Jenkins:
os/bluestore/bluefs_tool.cc:38:27: error: no matching function for call to 'BlueFS::mount(int, int)'
int r = fs.mount(0, 4096);

@dzafman
Copy link
Contributor

dzafman commented Dec 11, 2015

Once it builds I'm tempted to modify the test/ceph_objectstore_tool.py to vstart using --bluestore option and test using --type bluestore to see what happens.

@liewegas
Copy link
Member Author

On Fri, 11 Dec 2015, David Zafman wrote:

Once it builds I'm tempted to modify the test/ceph_objectstore_tool.py to vstart using --bluestore option and test using --type bluestore to see
what happens.

That'd be great!

@dzafman
Copy link
Contributor

dzafman commented Dec 14, 2015

4 problems found:
keyvaluestore_rocksdb_options as empty string is not handled maybe because RocksDBStore::ParseOptionsFromString() never calls rocksdb::GetOptionsFromString() in that case. I set to "write_buffer_size=1024,max_write_buffer_number=2" as a work around.

Code to handle bluestore_bluefs_env_mirror won't link so I just disabled it because config is false by default anyway.

_open_bdev can't delete bdev without first calling bdev->close();

Bluestore disklabel is overwritten after --mkfs is used to create it, so OSDs just crash with bad decode on start-up with vstart. I saw the the dev/osd0/block had the disklabel after it was written, but when I let the mkfs complete and checked again.

@dzafman
Copy link
Contributor

dzafman commented Dec 14, 2015

See changes in dzafman@a99dbfd

@dzafman
Copy link
Contributor

dzafman commented Dec 14, 2015

The empty keyvaluestore_rocksdb_options isn't an issue. It might be related to either the rocksdb build in my tree at one point, or misidentified the cause of a mount failure.

@liewegas liewegas force-pushed the wip-bluestore branch 2 times, most recently from e3acbcf to d13458d Compare December 14, 2015 22:14
@dzafman
Copy link
Contributor

dzafman commented Dec 19, 2015

I was able to use vstart.sh after making _check_or_set_bdev_label() alway return success instead of even calling _read_bdev_label() which fails.

The test/ceph_objectstore_tools.py test unfortunately relies on looking at the filestore format. So instead of trying to rewrite that test, I created some objects and ran a good portion of the commands with ceph-objectstore-tool and using --type bluestore. I saw no failures. Maybe we should have a generic non-filestore ceph-objectstore-tool test which can supplement the current test. It could then also be used for keyvaluestore or any future objectstores.

liewegas and others added 23 commits January 1, 2016 13:08
The write process may do a read/modify/write on a stripe.  In order to
allow multiple writes to coexist within the same transaction, we need to
be able to "see" our writes.

Clear the "cached" stripe values when the last TransContext touching an
onode is finished.  In theory we could pin memory with a constant stream
of updates to an object; we may need to address that later.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
In particular, we may get a shard specified along with hobject_t min or
max (from PGBackend::objects_list_range()).

Signed-off-by: Sage Weil <sage@redhat.com>
It didn't like k=1 with the default profile.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
This lets us do leak checking.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
It's possible for the IO to be in flight when the caller closes the
writer handle (although dangerous of them).  Queue the IOContext for
async cleanup when we sync everything to disk.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
This makes all the IOs effectively synchronous (while holding the lock),
which isn't great, but at least it's correct.  We can contemplate async
later..

Signed-off-by: Sage Weil <sage@redhat.com>
Just like ceph-osd.

Signed-off-by: Sage Weil <sage@redhat.com>
Assume journal symlink is present.

Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Sage Weil <sage@redhat.com>
New backends don't work if it's off.

Signed-off-by: Sage Weil <sage@redhat.com>
Fixes: ceph#14210
Signed-off-by: xie.xingguo <xie.xingguo@zte.com.cn>
Signed-off-by: YiQiang Chen <cyqsign@163.com>
Signed-off-by: Ning Yao <zay11022@gmail.com>
Signed-off-by: Sage Weil <sage@redhat.com>
Signed-off-by: Sage Weil <sage@redhat.com>
liewegas added a commit that referenced this pull request Jan 3, 2016
bluestore: latest and greatest
@liewegas liewegas merged commit fbd056b into ceph:master Jan 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants