WL#7236: Spatial Analysis Functions

Status: Complete

Implement MySQL GIS spatial analysis functions using boost geometry, including area, centroid, convexHull, distance, evelope, which are all standard gis analysis functions defined by OGC.

Gis features are more and more powerful, important and pervasive in RDBMS products, and MySQL GIS needs extensive improvements in terms of functionality, reliablity and performance. We want to achive this by replacing the home grown gis computation core with boost geometry, just like PostGIS is using GEOS, so that we have all features boost geometry can provide. Boost geometry is more powerful and robust and keeps growing, we will grow with it.

F-1: This WL refactors spatial analysis functions including area(), centroid(), convexhull(), distance() and envelope() using Boost.Geometry functions directly or indirectly. The semantics of these functions can be found in OGC standard specifications, we follow such specifications.

F-2: All functions in this WL are *new* to end users, because the documentation says they don't exist before. The fact is only 'convexhull' didn't exist before, others do exist in code. Thus we need to cover usage of these functions in MySQL documentation.

F-3: Existing functions/features usage or input/output data format shall not change, existing SQL query code shall need no change at all. Arguments and return values of these GIS functions are of GEOMETRY format if a geometry is the argument or return value.

F-4: Added a new SQL GIS function 'convexhull', which is defined by OGC and boost geometry has full support, but old gis code doesn't have this function. This function accepts a geometry identical to other gis functions such as 'envelope', 'buffer', and returns a polygon geometry also exactly in the same way as the two functions. The mathematical meaning for 'convexhull' can be found in wikipedia.

F-5: All the formal names of these functions can be optionally prefixed with st_ and the effect is exactly the same, thus the function names are st_buffer/buffer is a pair of equivalent functions, as well as st_envelop and envelop. This is so because they were so before, i.e. for historical reasons, so new functions will follow this convention in order not to break user expectations.

F-6: Given invalid GEOMETRY blob input, all SQL functions in this WL report ER_GIS_INVALID_DATA error. Other than this we don't check the validity of geometries, so a polygon input having a ring with spikes is accepted and given to BG, BG always assumes all inputs are valid and thus might produce a mistaken result; If a polygon isn't CCW it is implicitly and transiently reversed before giving to BG.

F-7: area() returns 0 for 0 and 1 dimensional geometries; and it returns sum of area values of all components for a geometry collection. For a polygon operand, the polygon is converted to be CCW if it's not(outer ring being CCW and inner rings being CW). If input polygon's inner ring(s) are larger than the outer ring, a negative value is returned. If operand is an empty geometry collection, return 0 as its area.

F-8: centroid() processes geometry collections in this way: compute centroid point for components of highest dimension in the collection. such components are extracted and made into a single multipolygon or multilinestring or multipoint for BG to compute the centroid. If operand is an empty geometry collection, the item returns NULL, i.e. we internally set null_value=true for the Item_func_centroid object.

F-9: distance() processes geometry collection in this way: it returns the shortest distance among all combinations of the components of the two geometry operands. If either operand is an empty geometry collection, the item returns NULL, i.e. we internally set null_value=true for the Item_func_distance object.

F-10: When computing a geometry's convexhull, we will first check whether the geometry operand's vertex points are colinear, if so we return a linear hull, otherwise we return a polygon hull. convexhull processes geometry collection in this way: it extracts all vertex points of all components of the collection, make it a multipoint, and compute the convexhull of the multipoint. If operand is an empty geometry collection, the item returns NULL, i.e. we internally set null_value=true for the Item_func_convexhull object.

F-11: envelope will before calculating the MBR do a ring normalization to verify the input polygon or multipolygon have valid rings, although such normalization doesn't detect all types of invalid rings. If operand is an empty geometry collection, the item returns NULL, i.e. we internally set null_value=true to the Item_func_envelope function.

F-12: SRIDs of geometry arguments are ignored for now, we always assume all geometries are in abstract cartesian coordinate system. However distance() requires two operands have identical SRIDs.

I-1: No new files
I-2: No new syntax
I-3: No new commands
I-4: No new tools.
I-5: No impact on existing functionality
I-6: Added one new GIS function: st_convexhull/convexhull

These analysis functions are easier to implement because all are unary except distance, and boost geometry claims to support all 6 types of geometries in all of these functions. Though we need to implement support for geometry collection based on boost geometry features, and we will also use a divide&conqure technique. Other than that, most of the implementation should be straightforward --- construct the boost geometry object from Geometry object, and feed the bg object to corresponding bg analysis functions and get the result.

Though we have to make sure polygons are in the same ring order as declared otherwise the area computation will be wrong for multipolygon.

WL#7236 changes are removed from mysql-trunk-wl7220, the changes are in WL#7236-Removed.diff file. Reverse apply it to add all changes for wl7236 back, this patch works for mysql-trunk-wl7220 branch, may have conflicts for others. if can't apply, remove changes to new test files first.