Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update substring search to use the Two Way algorithm #26327

Merged
merged 4 commits into from Jun 30, 2015

Conversation

bluss
Copy link
Member

@bluss bluss commented Jun 15, 2015

Update substring search to use the Two Way algorithm

To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Many thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining next_back in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the byteset addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());

For this particular case, two way is not much faster than the naive
implementation it replaces.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@bluss
Copy link
Member Author

bluss commented Jun 15, 2015

cc @BurntSushi, you know substring search very well, cc @Kimundi about the pattern API

@bluss
Copy link
Member Author

bluss commented Jun 15, 2015

I'll show some benchmark results. My workspace for this is temporarily hosted as a repo, if you want to check the quickcheck & benchmark implementations.

find and rfind are current libcore's str::find(&str) and str::rfind(&str). tw_contains is two-way forward search to the first match (if any) and tw_contains_rev is two-way reverse search, and tw_paper is an unmodified transcription of two-way from the paper.

I got a bit pessimistic when I tried out the different bad cases I could find. The performance bug #25483 is the benchmark called aa_100k below. From experience, the main improvement in runtime for that benchmark comes from the byteset optimization, not due to the TW algorithm itself.

Look at the short_short and short_long benchmarks for something of a more normal case for substring search.

Edit: Updated to current benchmarks. The old results are archived.

running 45 tests
test bad_naive::find                                    ... bench:       2,235 ns/iter (+/- 31) = 33 MB/s
test bad_naive::rfind                                   ... bench:       2,184 ns/iter (+/- 60) = 33 MB/s
test bad_naive::tw_contains                             ... bench:         418 ns/iter (+/- 201) = 177 MB/s
test bad_naive::tw_contains_rev                         ... bench:         528 ns/iter (+/- 119) = 140 MB/s
test bad_naive::tw_paper                                ... bench:         585 ns/iter (+/- 124) = 126 MB/s
test bad_naive_reversed::find                           ... bench:         657 ns/iter (+/- 41) = 112 MB/s
test bad_naive_reversed::rfind                          ... bench:         696 ns/iter (+/- 216) = 106 MB/s
test bad_naive_reversed::tw_contains                    ... bench:         263 ns/iter (+/- 22) = 281 MB/s
test bad_naive_reversed::tw_contains_rev                ... bench:         507 ns/iter (+/- 341) = 145 MB/s
test bad_naive_reversed::tw_paper                       ... bench:         588 ns/iter (+/- 56) = 125 MB/s
test pathological_aa_100k::find                         ... bench:     943,632 ns/iter (+/- 16,813) = 105 MB/s
test pathological_aa_100k::rfind                        ... bench:   1,004,048 ns/iter (+/- 9,096) = 99 MB/s
test pathological_aa_100k::tw_contains                  ... bench:       3,386 ns/iter (+/- 49) = 29533 MB/s
test pathological_aa_100k::tw_contains_rev              ... bench:       3,764 ns/iter (+/- 102) = 26567 MB/s
test pathological_aa_100k::tw_paper                     ... bench:     706,501 ns/iter (+/- 1,893) = 141 MB/s
test pathological_aab_100k::find                        ... bench:   3,708,442 ns/iter (+/- 38,586) = 80 MB/s
test pathological_aab_100k::rfind                       ... bench:   3,820,770 ns/iter (+/- 76,434) = 78 MB/s
test pathological_aab_100k::tw_contains                 ... bench:   1,004,961 ns/iter (+/- 66,086) = 298 MB/s
test pathological_aab_100k::tw_contains_rev             ... bench:   1,063,171 ns/iter (+/- 278,171) = 282 MB/s
test pathological_aab_100k::tw_paper                    ... bench:   2,236,639 ns/iter (+/- 100,281) = 134 MB/s
test pathological_two_way_10k::find                     ... bench:     279,777 ns/iter (+/- 1,326) = 107 MB/s
test pathological_two_way_10k::rfind                    ... bench:     297,624 ns/iter (+/- 12,199) = 100 MB/s
test pathological_two_way_10k::tw_contains              ... bench:     114,635 ns/iter (+/- 28,467) = 261 MB/s
test pathological_two_way_10k::tw_contains_rev          ... bench:       2,961 ns/iter (+/- 22) = 10131 MB/s
test pathological_two_way_10k::tw_paper                 ... bench:     223,103 ns/iter (+/- 8,931) = 134 MB/s
test pathological_two_way_20k::find                     ... bench:     937,518 ns/iter (+/- 4,423) = 106 MB/s
test pathological_two_way_20k::rfind                    ... bench:     997,079 ns/iter (+/- 4,680) = 100 MB/s
test pathological_two_way_20k::tw_contains              ... bench:     362,096 ns/iter (+/- 34,481) = 276 MB/s
test pathological_two_way_20k::tw_contains_rev          ... bench:       4,975 ns/iter (+/- 130) = 20100 MB/s
test pathological_two_way_20k::tw_paper                 ... bench:     729,971 ns/iter (+/- 17,161) = 136 MB/s
test pathological_two_way_20k_reversed::find            ... bench:     679,464 ns/iter (+/- 3,195) = 88 MB/s
test pathological_two_way_20k_reversed::rfind           ... bench:     706,299 ns/iter (+/- 5,581) = 84 MB/s
test pathological_two_way_20k_reversed::tw_contains     ... bench:       3,209 ns/iter (+/- 33) = 18697 MB/s
test pathological_two_way_20k_reversed::tw_contains_rev ... bench:     179,106 ns/iter (+/- 18,190) = 334 MB/s
test pathological_two_way_20k_reversed::tw_paper        ... bench:     447,336 ns/iter (+/- 20,443) = 134 MB/s
test short_long::find                                   ... bench:      28,582 ns/iter (+/- 4,112) = 89 MB/s
test short_long::rfind                                  ... bench:      30,111 ns/iter (+/- 432) = 84 MB/s
test short_long::tw_contains                            ... bench:       2,398 ns/iter (+/- 665) = 1063 MB/s
test short_long::tw_contains_rev                        ... bench:       2,233 ns/iter (+/- 78) = 1142 MB/s
test short_long::tw_paper                               ... bench:      15,741 ns/iter (+/- 431) = 162 MB/s
test short_short::find                                  ... bench:         201 ns/iter (+/- 3) = 278 MB/s
test short_short::rfind                                 ... bench:         402 ns/iter (+/- 28) = 139 MB/s
test short_short::tw_contains                           ... bench:         102 ns/iter (+/- 6) = 549 MB/s
test short_short::tw_contains_rev                       ... bench:         116 ns/iter (+/- 7) = 482 MB/s
test short_short::tw_paper                              ... bench:         419 ns/iter (+/- 6) = 133 MB/s

'search: loop {
// Check that we have room to search in
if self.position + needle.len() > haystack.len() {
if start == haystack.len() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check be done outside of the loop? We will only return Done on the first iteration, if ever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no real loop -- it's used to return back to this point. So this if check can be visited multiple times per call to next.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I see now, that's a good point. it evolved into this.

@bluss
Copy link
Member Author

bluss commented Jun 16, 2015

Thanks for your extensive review @gereeter! I've pushed a commit to address some of this, with some cleanup. I also realized I could move the break for Reject until after the fast reject by the byteset. This improves the benchmarks for the forward search uniformly.

@bluss
Copy link
Member Author

bluss commented Jun 16, 2015

The benchmark list was very long, so here's just some improvements after the later Rejects

commit 1:
test pathological_aa_100k::tw_contains                  ... bench:      16,289 ns/iter (+/- 18) = 6139 MB/s
test pathological_aa_100k::tw_contains_rev              ... bench:       7,196 ns/iter (+/- 197) = 13896 MB/s
test short_long::tw_contains                            ... bench:       7,733 ns/iter (+/- 100) = 329 MB/s
test short_long::tw_contains_rev                        ... bench:       5,910 ns/iter (+/- 41) = 431 MB/s
test short_short::tw_contains                           ... bench:         164 ns/iter (+/- 0) = 341 MB/s
test short_short::tw_contains_rev                       ... bench:         169 ns/iter (+/- 1) = 331 MB/s

commit 1 + 2:
test pathological_aa_100k::tw_contains                  ... bench:       3,533 ns/iter (+/- 420) = 28304 MB/s
test pathological_aa_100k::tw_contains_rev              ... bench:       3,840 ns/iter (+/- 252) = 26041 MB/s
test short_long::tw_contains                            ... bench:       3,322 ns/iter (+/- 18) = 767 MB/s
test short_long::tw_contains_rev                        ... bench:       3,343 ns/iter (+/- 44) = 763 MB/s
test short_short::tw_contains                           ... bench:         113 ns/iter (+/- 2) = 495 MB/s
test short_short::tw_contains_rev                       ... bench:         124 ns/iter (+/- 2) = 451 MB/s

@bluss
Copy link
Member Author

bluss commented Jun 16, 2015

Now I feel good about the reverse algorithm, thanks @gereeter. I've split out the add-on commits to make it easier to review hopefully, so I'll squash before merge later. Edited: Now rebased to just two commits.

At this point it should be algorithmically safe and sound. Short of switching algorithm, I guess followup changes can focus on performance optimizations. Sad to say, the loops are indexed, not using iterators, and they perform better the way they are than using .zip with two slice iterators.

I've corrected & edited the main PR message a bit, I will incorporate those in the commit log when I rebase. Done.

@bluss bluss changed the title core: Update StrSearch to use the Two Way searcher core: Update StrSearcher to use the Two Way searcher Jun 16, 2015
@bluss bluss changed the title core: Update StrSearcher to use the Two Way searcher core: Update StrSearcher to use the Two Way algorithm Jun 16, 2015

// We have found a match!
let match_pos = self.end - needle.len();
self.end -= needle.len(); // add self.period for all matches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To update this comment for next_back and make it more clear, maybe this should read "subtract self.period instead of needle.len() for overlapping matches" (similarly, the next case could be changed, but with add, not subtract)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this

@gereeter
Copy link
Contributor

This looks really good! Short of doing word-level things, which is messy and requires unsafe and should probably be in a separate PR, I'm pretty much out of ideas on how to speed this up. I might add more docs and change some things stylistically, but that can be done later, and all the content here looks good. I think this should be merged, assuming the change to the Pattern API is cleared.

@bluss
Copy link
Member Author

bluss commented Jun 16, 2015

@gereeter I found a paper on that kind of extension to this algorithm http://drops.dagstuhl.de/opus/volltexte/2011/3355/

@gereeter
Copy link
Contributor

That is the paper I had in mind when I made my comment - I saw your comment on #25483.

@bluss bluss force-pushed the two-way branch 3 times, most recently from 167bad3 to 00e97bf Compare June 16, 2015 23:09
@bluss
Copy link
Member Author

bluss commented Jun 17, 2015

Ok, the twoway search can be faster if instead of emitting Rejects, we just search to the next match. Let the StrSearcher handle emitting the implied intermediate Rejects. I'm going to incorporate that now, because it means we don't have to change the pattern API's specification.

I found this by looking at what @BurntSushi's regex does.

@bluss
Copy link
Member Author

bluss commented Jun 17, 2015

Results of moving reject handling out of the algorithm, that gave it some air to optimize the loop to find a match better. The reverse case is sped up the same way, showing results just for the forward case.

before:
test bad_naive::tw_contains                             ... bench:         796 ns/iter (+/- 191) = 92 MB/s
test bad_naive_reversed::tw_contains                    ... bench:         297 ns/iter (+/- 9) = 249 MB/s
test pathological_aa_100k::tw_contains                  ... bench:       3,512 ns/iter (+/- 582) = 28473 MB/s
test pathological_aab_100k::tw_contains                 ... bench:   1,951,935 ns/iter (+/- 57,089) = 153 MB/s
test pathological_two_way_10k::tw_contains              ... bench:     241,923 ns/iter (+/- 4,562) = 123 MB/s
test pathological_two_way_20k::tw_contains              ... bench:     749,007 ns/iter (+/- 39,225) = 133 MB/s
test pathological_two_way_20k_reversed::tw_contains     ... bench:       3,216 ns/iter (+/- 55) = 18656 MB/s
test short_long::tw_contains                            ... bench:       3,338 ns/iter (+/- 16) = 764 MB/s
test short_short::tw_contains                           ... bench:         112 ns/iter (+/- 1) = 499 MB/s

after:
test bad_naive::tw_contains                             ... bench:         418 ns/iter (+/- 201) = 177 MB/s
test bad_naive_reversed::tw_contains                    ... bench:         263 ns/iter (+/- 22) = 281 MB/s
test pathological_aa_100k::tw_contains                  ... bench:       3,386 ns/iter (+/- 49) = 29533 MB/s
test pathological_aab_100k::tw_contains                 ... bench:   1,004,961 ns/iter (+/- 66,086) = 298 MB/s
test pathological_two_way_10k::tw_contains              ... bench:     114,635 ns/iter (+/- 28,467) = 261 MB/s
test pathological_two_way_20k::tw_contains              ... bench:     362,096 ns/iter (+/- 34,481) = 276 MB/s
test pathological_two_way_20k_reversed::tw_contains     ... bench:       3,209 ns/iter (+/- 33) = 18697 MB/s
test short_long::tw_contains                            ... bench:       2,398 ns/iter (+/- 665) = 1063 MB/s
test short_short::tw_contains                           ... bench:         102 ns/iter (+/- 6) = 549 MB/s

@bluss
Copy link
Member Author

bluss commented Jun 17, 2015

Updated to add two new commits. Will squash to one commit after review is done, and I'll update the commit logs with the fact that we don't need to change the Pattern API.

@gereeter
Copy link
Contributor

First, that improvement is impressive enough that I think we should keep the new coalesing of Rejects even if it has some negative consequences.

Second, it seems that the primary users who benifit from smaller Rejects are is_prefix_of and is_suffix_of, which should be overridden to not use two way searching at all anyway.

@bluss
Copy link
Member Author

bluss commented Jun 17, 2015

I just looked at benchmarks for is_prefix_of, and even before the latest Reject change, the results were 3 to 220 times in the old naive string search's favour.

We need to implement all the trait's methods in the long run, and yes, use something else (memcmp-like) for the prefix and suffix versions.

@Kimundi
Copy link
Member

Kimundi commented Jun 17, 2015

@bluss: Yeah, the old impls never got optimized.
As for uses, I'm specifically referring to the .trim_...() methods: They skip forward until the point where the first Reject appears, and then stop. Example:

"abababcccccccccccccccccccccccccccccccccab".trim_left_matches("ab")
   => "cccccccccccccccccccccccccccccccccab"

If the string searcher where to be written with only Match instead, then the algorithm would first scan through the whole string until the final "ab" just to emit the start position of the Reject.

Having the duplication is unfortunate, but I don't see a better way apart from ensuring the generic codegen will optimize out the right cases on its own.

@bluss
Copy link
Member Author

bluss commented Jun 17, 2015

I just addressed is_prefix_of / is_suffix_of, so they are a bit faster with this change than the naive impl they replace.

Finding the first reject could be done simpler too, we could special case just that part maybe.

@Kimundi
Copy link
Member

Kimundi commented Jun 17, 2015

@bluss: It doesn't have to match the naive impl exactly. It can split the Rejects up at arbitrary points if it helps.

@bluss
Copy link
Member Author

bluss commented Jun 18, 2015

I've experimented back and forth (macros, closures for specialization) and it looks like duplicating the forward and backward searches is the best unfortunately.

However, StrSearcher is a very specific algorithm for finding matches, and it is a suboptimal choice for reject-only search. In that case naive search is always better and any time we spend precomputing needle properties is wasted. For that reason, it would be best if this was given its own method on the Pattern API that circumvents the StrSearcher. For example fn maximal_prefix_of(haystack: &str) -> Option<usize> and so on.

I'm away for the weekend, so enjoy midsummer everyone!

@bors
Copy link
Contributor

bors commented Jun 18, 2015

☔ The latest upstream changes (presumably #26192) made this pull request unmergeable. Please resolve the merge conflicts.

Ulrik Sverdrup added 3 commits June 21, 2015 19:58
To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes rust-lang#25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Much thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
Use a trait to be able to implement both the fast search that skips to
each match, and the slower search that emits `Reject` intervals
regularly. The latter is important for uses of `next_reject`.
@bluss bluss changed the title core: Update StrSearcher to use the Two Way algorithm Update substring search to use the Two Way algorithm Jun 21, 2015
@bluss
Copy link
Member Author

bluss commented Jun 21, 2015

Rebased.

Updated with:

  • Use a trait to split the algorithm in two cases without code duplication. The cases are the MatchOnly strategy, and the RejectAndMatch strategy. This way we both offer the quickest possible next_match() impl and a general next() that emits Rejects too.
  • The RejectAndMatch case is less streamlined for now.
  • The RejectAndMatch case manually steps to the next character boundary so that the Pattern API can stay as it is.

@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Jun 21, 2015
This is needed to not drop performance, after the trait-based changes.
Force separate versions of the next method to be generated for the short
and long period cases.
@bluss
Copy link
Member Author

bluss commented Jun 24, 2015

Here are some updated benchmarks with caveats.

  • rustc nightly is updated, llvm is updated, so things might have changed
  • I can't seem to get cpu scaling to fall into the same gear again that produced the previous pretty consistent results
  • So: Don't compare these literally to the previous benchmarks.

The cases are:

  • find: current forward search
  • rfind: current forward search
  • tw_contains: new forward search in this PR
  • tw_contains_rev: new forward search in this PR
running 36 tests
test bad_naive::find                                    ... bench:       1,386 ns/iter (+/- 121) = 53 MB/s
test bad_naive::rfind                                   ... bench:       1,463 ns/iter (+/- 19) = 50 MB/s
test bad_naive::tw_contains                             ... bench:         470 ns/iter (+/- 7) = 157 MB/s
test bad_naive::tw_contains_rev                         ... bench:         314 ns/iter (+/- 18) = 235 MB/s
test bad_naive_reversed::find                           ... bench:         533 ns/iter (+/- 90) = 138 MB/s
test bad_naive_reversed::rfind                          ... bench:         570 ns/iter (+/- 39) = 129 MB/s
test bad_naive_reversed::tw_contains                    ... bench:         263 ns/iter (+/- 13) = 281 MB/s
test bad_naive_reversed::tw_contains_rev                ... bench:         434 ns/iter (+/- 200) = 170 MB/s
test pathological_aa_100k::find                         ... bench:     765,670 ns/iter (+/- 51,184) = 130 MB/s
test pathological_aa_100k::rfind                        ... bench:     823,406 ns/iter (+/- 1,706) = 121 MB/s
test pathological_aa_100k::tw_contains                  ... bench:       3,893 ns/iter (+/- 32) = 25687 MB/s
test pathological_aa_100k::tw_contains_rev              ... bench:       3,848 ns/iter (+/- 97) = 25987 MB/s
test pathological_aab_100k::find                        ... bench:   2,704,123 ns/iter (+/- 99,663) = 110 MB/s
test pathological_aab_100k::rfind                       ... bench:   2,824,903 ns/iter (+/- 163,534) = 105 MB/s
test pathological_aab_100k::tw_contains                 ... bench:   1,149,895 ns/iter (+/- 73,320) = 260 MB/s
test pathological_aab_100k::tw_contains_rev             ... bench:   1,120,750 ns/iter (+/- 136,251) = 267 MB/s
test pathological_two_way_10k::find                     ... bench:     227,447 ns/iter (+/- 157) = 131 MB/s
test pathological_two_way_10k::rfind                    ... bench:     244,858 ns/iter (+/- 14,582) = 122 MB/s
test pathological_two_way_10k::tw_contains              ... bench:     136,919 ns/iter (+/- 13,430) = 219 MB/s
test pathological_two_way_10k::tw_contains_rev          ... bench:       3,208 ns/iter (+/- 18) = 9351 MB/s
test pathological_two_way_20k::find                     ... bench:     762,829 ns/iter (+/- 1,937) = 131 MB/s
test pathological_two_way_20k::rfind                    ... bench:     820,271 ns/iter (+/- 780) = 121 MB/s
test pathological_two_way_20k::tw_contains              ... bench:     426,550 ns/iter (+/- 35,134) = 234 MB/s
test pathological_two_way_20k::tw_contains_rev          ... bench:       5,263 ns/iter (+/- 10) = 19000 MB/s
test pathological_two_way_20k_reversed::find            ... bench:     505,162 ns/iter (+/- 44,729) = 118 MB/s
test pathological_two_way_20k_reversed::rfind           ... bench:     538,927 ns/iter (+/- 30,293) = 111 MB/s
test pathological_two_way_20k_reversed::tw_contains     ... bench:       3,564 ns/iter (+/- 16) = 16834 MB/s
test pathological_two_way_20k_reversed::tw_contains_rev ... bench:     179,662 ns/iter (+/- 3,419) = 333 MB/s
test short_long::find                                   ... bench:      23,059 ns/iter (+/- 118) = 110 MB/s
test short_long::rfind                                  ... bench:      24,691 ns/iter (+/- 75) = 103 MB/s
test short_long::tw_contains                            ... bench:       2,126 ns/iter (+/- 4) = 1199 MB/s
test short_long::tw_contains_rev                        ... bench:       2,040 ns/iter (+/- 16) = 1250 MB/s
test short_short::find                                  ... bench:         160 ns/iter (+/- 1) = 350 MB/s
test short_short::rfind                                 ... bench:         311 ns/iter (+/- 1) = 180 MB/s
test short_short::tw_contains                           ... bench:          81 ns/iter (+/- 58) = 691 MB/s
test short_short::tw_contains_rev                       ... bench:          99 ns/iter (+/- 0) = 565 MB/s

@bluss
Copy link
Member Author

bluss commented Jun 24, 2015

I'll tentatively nominate this for the beta because I had hoped I would get this in before we branched for 1.2. It's a big performance improvement, no other perks.

@bluss bluss added the beta-nominated Nominated for backporting to the compiler in the beta channel. label Jun 24, 2015
@gereeter
Copy link
Contributor

I just realized that I never actually said this, so LGTM.

@bluss
Copy link
Member Author

bluss commented Jun 26, 2015

@gereeter Thank you, you've provided a lot of valuable help & feedback!

@aturon
Copy link
Member

aturon commented Jun 30, 2015

@bors: r+

Thanks for doing this!

@bors
Copy link
Contributor

bors commented Jun 30, 2015

📌 Commit 274bb24 has been approved by aturon

@bors
Copy link
Contributor

bors commented Jun 30, 2015

⌛ Testing commit 274bb24 with merge 7fc0675...

bors added a commit that referenced this pull request Jun 30, 2015
Update substring search to use the Two Way algorithm

To improve our substring search performance, revive the two way searcher
and adapt it to the Pattern API.

Fixes #25483, a performance bug: that particular case now completes faster
in optimized rust than in ruby (but they share the same order of magnitude).

Many thanks to @gereeter who helped me understand the reverse case
better and wrote the comment explaining `next_back` in the code.

I had quickcheck to fuzz test forward and reverse searching thoroughly.

The two way searcher implements both forward and reverse search,
but not double ended search. The forward and reverse parts of the two
way searcher are completely independent.

The two way searcher algorithm has very small, constant space overhead,
requiring no dynamic allocation. Our implementation is relatively fast,
especially due to the `byteset` addition to the algorithm, which speeds
up many no-match cases.

A bad case for the two way algorithm is:

```
let haystack = (0..10_000).map(|_| "dac").collect::<String>();
let needle = (0..100).map(|_| "bac").collect::<String>());
```

For this particular case, two way is not much faster than the naive
implementation it replaces.
@bors bors merged commit 274bb24 into rust-lang:master Jun 30, 2015
@bluss bluss deleted the two-way branch June 30, 2015 20:12
@alexcrichton
Copy link
Member

triage: not accepting for a beta backport

@alexcrichton alexcrichton removed the beta-nominated Nominated for backporting to the compiler in the beta channel. label Jun 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants