Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geospatial index #1158

Closed
neumino opened this issue Jul 10, 2013 · 42 comments
Closed

Geospatial index #1158

neumino opened this issue Jul 10, 2013 · 42 comments
Assignees
Milestone

Comments

@neumino
Copy link
Member

neumino commented Jul 10, 2013

Asked on Twitter by marcuspoehls.

I was pretty sure we had an issue for that but couldn't find it.
We talk a little about it on this issue #809

From what I've seen, people would want to

  • Get all points in a polygon
  • Get the k nearest neighbors
@coffeemug
Copy link
Contributor

Moving to backlog for obvious reasons.

@PerfectedApp
Copy link

+1

@namtih58
Copy link

With more and more apps becoming ( needing to be ) location aware, geo-spatial indexes are absolutely necessary. This is one of the features you should seriously consider taking off the backlog.

@coffeemug
Copy link
Contributor

Thanks for your input -- once we iron out remaining performance issues and ship an LTS release, we'll reevaluate feature requests. We'll definitely find a way for Rethink users to use geospacial easily and effectively.

@mstdokumaci
Copy link

this is exactly why i cannot choose rethinkdb over mongo

@codingkevin
Copy link
Contributor

Bump for a feature that would is greatly needed (once you guys have an LTS release)

@dminkovsky
Copy link

Geospatial indexing would be awesome, though obviously this is big work, given the depth of the the GIS world beyond just the indexing. May be some good code over at the PostGIS project? PostGIS is a spatial extension to Postgres.

@coffeemug
Copy link
Contributor

Moving to 2.x, this is something we should try to fit into the product roadmap for 2014.

Not sure if we could reuse PostGIS off the top of my head, but we'll definitely investigate (code reuse > NIH).

@dminkovsky
Copy link

Yeah, obviously no idea how their code might relate to Rethink, but I've loosely followed their changelogs and it's been interesting to see their evolution (i.e. "in this version we've switched algo B for algo A; we had thought algo A would have been better, but here's why algo B is better").

@fedosov
Copy link

fedosov commented Feb 26, 2014

+1

@maximveksler
Copy link

+1 for not being able to drop mongo because of geo support.

@chriserik
Copy link

+1

@jcspencer
Copy link

👍 Nearest neighbour searches would be a great addition to RethinkDB!

@ipeisong
Copy link

ipeisong commented Apr 4, 2014

+1

1 similar comment
@amscotti
Copy link

amscotti commented Apr 8, 2014

+1

@coffeemug
Copy link
Contributor

Ok, talked to @danielmewes and he'd like to do this. This will likely be in 1.14.

@srh
Copy link
Contributor

srh commented Apr 8, 2014

How is this going to be implemented?

@danielmewes
Copy link
Member

@srh: I haven't figured out all of the details yet, but it's probably going to be a quad-tree like structure, implemented on top of our existing btree.
Something similar to what MongoDB is doing http://blog.mongodb.org/post/50984169045/new-geo-features-in-mongodb-2-4 .
The key point there is that polygons are indexed based on their intersection with a fixed multi-resolution grid.
Each grid cell can be addressed by its path through a quad tree, and the keys of the coarser cells are prefixes of the keys of the finer cells contained in the coarse cell. Finding everything stored in a given cell can be easily translated to a range get in the one dimensional btree.

We will be able to store polygons or points (and maybe lines, which are kind of degenerate polygons) into the index, and perform intersection and nearest neighbor queries.

We should support Euclidean spaces for 2 and possibly 3 dimensions, and geographic coordinates with the corresponding intrinsic distances on a sphere.

I will write up an API suggestion and more details of the potential implementation once I get to working on this.

@mlucy
Copy link
Member

mlucy commented Apr 9, 2014

One really simple implementation of 2d indexes would be to just splice the bits of 1d indexes together (e.g. if you have two doubles which serialize to the bytes abcde and ABCDE, their 2d index would be aAbBcCdDeE). This wouldn't require implementing any new data structures, just some mildly-annoying logic to turn 2D ranges into a set of ranges on the spliced bits and filter out any extraneous values.

Basically every two bits in the new spliced key would specify a quadrant:

00|01
--+--
10|11

@danielmewes
Copy link
Member

@mlucy: Yeah, that's the Morton order http://en.wikipedia.org/wiki/Z-order_%28curve%29. It's actually equivalent to taking the path in a quad (or some other power of 2) tree as the key.

@danielmewes
Copy link
Member

In general the tricky part is how to efficiently index polygons and lines.
We could decide to support only points in the index in the first version, but it would be nice if we could add support for polygons later.

@coffeemug
Copy link
Contributor

@danielmewes -- do other products support indexing polygons? (another one to look at is Postgres, btw)

@danielmewes
Copy link
Member

@coffeemug Postgres, MongoDB and MySQL all do.

MySQL seems to just index by the bounding rectangle of a polygon by the way, which simplifies things and might be a trick worth considering ( http://dev.mysql.com/doc/refman/5.0/en/optimizing-spatial-analysis.html ).

@coffeemug
Copy link
Contributor

Ok, thanks!

@mlucy
Copy link
Member

mlucy commented Apr 9, 2014

What does it mean to index a polygon, exactly? What sort of operations would we want to support?

@mlucy
Copy link
Member

mlucy commented Apr 9, 2014

If it's just intersection with other polygons, we could definitely get away with the bounding rectangle and an extra filter (or maybe a set of bounding rectangles if we really want to be fancy).

@danielmewes
Copy link
Member

@mlucy: MongoDB supports intersection as well as "within". The latter is searching for things that are completely within a given query polygon.
Bounding rectangles are bad if the polygons are not very "dense" if that makes sense (I could draw it on a board here if you like).

A set of bounding rectangles is essentially what the grid approach comes down to, except that the set of possible rectangles is fixed in case of the grid (which makes them easier to index and process).

@mlucy
Copy link
Member

mlucy commented Apr 9, 2014

Alright, it sounds like the more complicated implementation is worth it here.

@danielmewes
Copy link
Member

The good news is that mapping a polygon to the set of grid cells it intersects with at a reasonable resolution is already implemented in S2 https://code.google.com/p/s2-geometry-library/source/browse/geometry/s2regioncoverer.h

Intersection and coverage tests on polygons are also implemented in that library, as well as converting between different coordinate systems. The only disadvantage is that S2 supports geometry on the surface of a sphere only, and not Euclidean geometry (unless I didn't look carefully enough; but the sphere is even in its name, S2 being the sphere with a 2 dimensional surface).
MongoDB, also using this library, supports many of the more advanced operations on spherical geometry only.

We could probably implement our own versions of many of the operations for Euclidean spaces, but I suggest that we only support geometry on the sphere at first. That should satisfy most real world applications.

@ded
Copy link

ded commented May 5, 2014

I'm using Postgres for the geo queries on OpenLikes ( see openlikes.com/i/design/nearby ) using this technique and would love to see what this would look like in Rethink.

@pensierinmusica
Copy link

+1 :)

@dieu
Copy link

dieu commented May 9, 2014

+1 When?)

@coffeemug
Copy link
Contributor

Scheduled for 1.14 (tentatively, July).

@dieu
Copy link

dieu commented May 9, 2014

@coffeemug thx

@cultofmetatron
Copy link

+1 would also love some sort of support for routing functionality similar to pgrouting. My team is currently working on two geospatial apps, one using postgres, the other on mongodb. Rethink was ruled out immediately due to the lack of geospatial indexes/querying but I'd love to reinvestigate this in the future if that feature got added. Being able to have a distributed alternative to pgrouting would be a killer feature. neo4j is the only other db in the space I'm aware of that offers routing.

Am I correct in assuming that the query syntax would be using geojson? its much easier to read in my experience than the equivalent postgis queries.

@PatrickJS
Copy link
Contributor

+1

@coffeemug
Copy link
Contributor

@cultofmetatron -- we'll be debating a specific query proposal here in about ~two - three weeks. I don't know yet what queries will look like, but it'd be awesome if you participated in the discussion when it starts!

Geo routing looks interesting, but that's a separate feature. We should talk about that separately, it won't be a part of the first geospacial implementation.

@hcarvalhoalves
Copy link

+1

Could this be solved more generally though? GIS is more complex than just "find me nearby points", and it might be better as a community effort. Does Rethink have an extension architecture in place so 3rd parties can define indexes and query functions?

@danielmewes
Copy link
Member

@hcarvalhoalves We don't have an extension architecture at this point.

Our implementation will very likely support at least the following features:

  • storing points, lines and arbitrary polygons
  • geodesic earth geometry
  • querying by intersection with a polygon
  • querying the nearest neighbors of a point

Implementing a generic GiST (http://en.wikipedia.org/wiki/GiST) API is worth a consideration, though there could be some problems with combining it with our other features (like compound indexes).

Do you have specific features in mind that you would need?

@danielmewes
Copy link
Member

@AvivLander, @namtih58, @mstdokumaci, @codingkevin, @dminkovsky, @fedosov, @maximveksler, @chriserik, @JamesS237, @ipeisong, @amscotti, @ded, @pensierinmusica, @dieu, @cultofmetatron, @gdi2290, @hcarvalhoalves

We are currently discussing the API for geospatial support in RethinkDB in #2571 (comment) . Please feel free to add your comments.

@coffeemug
Copy link
Contributor

@danielmewes -- since the discussion has moved to #2571, is it ok to close this issue?

@danielmewes
Copy link
Member

Sure, closing in favor of #2571.

@larkost larkost modified the milestones: 1.14, duplicate Aug 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests