Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secondary index traversal takes 50% longer than primary index traversal #4862

Closed
danielmewes opened this issue Sep 18, 2015 · 6 comments
Closed
Assignees
Milestone

Comments

@danielmewes
Copy link
Member

On a table with 1 million documents and a simple secondary index on a numeric field, I'm getting the following performance:

r.table("test2").map(r.row.toJsonString()).count()
-> ~2.0s
r.table("test2").between(1, 5, {index: 'survey_id'}).map(r.row.toJsonString()).count()
-> ~3.1s

I used the script from #4569 (comment) to generate the data.

I wouldn't expect such a drastic slow-down from traversing the secondary rather than the primary index tree.

@danielmewes danielmewes self-assigned this Sep 18, 2015
@danielmewes danielmewes added this to the 2.2 milestone Sep 18, 2015
@danielmewes
Copy link
Member Author

The slowdown appears to come from us checking that the secondary index key is in the range on every document:

         // Check whether we're out of sindex range.
         ql::datum_t sindex_val; // NULL if no sindex.
         if (sindex) {
            // Secondary index functions are deterministic (so no need for an
            // rdb_context_t) and evaluated in a pristine environment (without global
            // optargs).
            ql::env_t sindex_env(job.env->interruptor,
                                 ql::return_empty_normal_batches_t::NO,
                                 sindex->func_reql_version);
            sindex_val = sindex->func->call(&sindex_env, val)->as_datum();
            if (sindex->multi == sindex_multi_bool_t::MULTI
                && sindex_val.get_type() == ql::datum_t::R_ARRAY) {
                boost::optional<uint64_t> tag = *ql::datum_t::extract_tag(key);
                guarantee(tag);
                sindex_val = sindex_val.get(*tag, ql::NOTHROW);
                guarantee(sindex_val.has());
            }
            if (!sindex->range.contains(sindex_val)) {
                return continue_bool_t::CONTINUE;
            }
         }

I don't think we actually need to do that, unless the current secondary index key is truncated.
@mlucy Does it sound correct to you to skip that check for non-truncated keys?

@mlucy
Copy link
Member

mlucy commented Sep 18, 2015

Yes, it should be perfectly fine to skip that step for non-truncated keys.

@danielmewes danielmewes modified the milestones: 2.1.x, 2.2 Sep 18, 2015
@coffeemug
Copy link
Contributor

@danielmewes -- could this also be shipped in a point release?

@danielmewes
Copy link
Member Author

@coffeemug It could if necessary. I'd prefer to ship it in 2.2 otherwise.

@danielmewes
Copy link
Member Author

Implemented this together with @mlucy. This is in CR 3282 by @VeXocide

@danielmewes
Copy link
Member Author

With the improvements:

r.table("test2").between(1, 5, {index: 'survey_id'}).map(r.row.toJsonString()).count()
-> ~2.2s

(i.e. only about 10% more time than for the primary key case, vs. 50% more before)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants