WL#1763: Avoid creating temporary table in UNION ALL

Status: Complete

Currently, union queries always use a temporary table to store the
result before it is returned to the user. This worklog is about
avoiding creating a temporary table for the result of UNION ALL when
there is no need for it, i.e., when there is no top-level ORDER BY.

This will save the cost of creating, writing to and reading from the
temporary table, which may be put on disk if the result is large. This
will also reduce the need for disk space for large unions, and the
client will get the first rows quicker.

Bug#11758470 DO NOT CREATE TEMPORARY TABLES FOR UNION ALL

RB#2651

User Documentation
==================

http://dev.mysql.com/doc/refman/5.7/en/internal-temporary-tables.html
http://dev.mysql.com/doc/relnotes/mysql/5.7/en/news-5-7-3.html
Description
-----------

Currently, a temporary table is always created to hold the result of a
union query. If the union is a UNION ALL rather than UNION DISTINCT,
there is no filtering of duplicates. If the UNION also doesn't have an
ORDER BY clause, there is no need for a temporary table, and the
result can be sent directly to the client.

Since the queries in the union may have different types in the same
column, some values may have to be converted. Currently, this is done
when writing to and reading from the temporary table. When skipping
the temporary table, all queries must either have the same column
types, or types must be converted behind the scenes so that the query
result has one type for each column.


Skipping temporary tables
-------------------------

Setting up a temporary table is a two-stage process:

1) Creating
   - Allocate a TABLE structure and fill it
   - Done by create_tmp_table()
2) Instantiating
   - Setting up internal structures, etc.
   - Done by instantiate_tmp_table()

In this worklog, step 2 is skipped. Step 1, creating the temporary
table is still done so that the fields of the table can be used to do
typecasting. Since the table is never instantiated, no rows are ever
written to it.


Changes to syntax, results, errors and warnings
-----------------------------------------------

There are no changes to syntax, results, errors or warnings as a
result of this worklog. If some values must be typecast, the query
should return the same type (and value) as when temporary tables are
used.

Optimizer trace and EXPLAIN output will change to show that temporary
tables are not used for affected queries. This means that the UNION
RESULT query block is no longer part of the EXPLAIN output, since this
block is the query that reads data from the temporary table.

Typecasting to a single output type for each column should happen
behind the scenes and should not be visible in any output.


Rules
-----

1) Skip instantiating a temporary table and send output of the union
directly to the client if all these conditions apply:

 - The union is not DISTINCT
 - There is no global ORDER BY clause
 - The union is not the top level query block of an INSERT ... SELECT
   statement
 - The union is not the top level query block of a REPLACE ... SELECT
   statement

The data is written to and read from a field of the temporary table to
do type conversion if needed.

2) Otherwise, use a temporary table.


Examples
--------

The EXPLAIN output should show that there is no query reading the
UNION result from a temporary table.

CREATE TABLE t1 (a INT, b INT, c INT);
CREATE TABLE t2 (a INT, b INT, c TINYINT);
INSERT INTO t1 VALUES (1,1,1), (2,2,2), (3,3,3);
INSERT INTO t2 VALUES (1,1,1), (2,2,2), (3,3,3);

EXPLAIN
SELECT a, b, c FROM t1
UNION ALL
SELECT a, b, c FROM t2;
id      select_type     table   type    possible_keys   key     key_len ref    
rows    Extra
1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    3       NULL
2       UNION   t2      ALL     NULL    NULL    NULL    NULL    3       NULL


EXPLAIN EXTENDED output will not show that typecast items are used:

EXPLAIN EXTENDED
SELECT a, b, c FROM t1
UNION ALL
SELECT a, b, c FROM t2;
id      select_type     table   type    possible_keys   key     key_len ref    
rows    filtered        Extra
1       PRIMARY t1      ALL     NULL    NULL    NULL    NULL    3       100.00  NULL
2       UNION   t2      ALL     NULL    NULL    NULL    NULL    3       100.00  NULL
Warnings:
Note    1003    /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS
`b`,`test`.`t1`.`c` AS `c` from `test`.`t1` union all /* select#2 */ select
`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t2`.`c` AS `c` from
`test`.`t2`


EXPLAIN FORMAT=JSON output will show that no temporary tables are
used, and will not show typecast items:

EXPLAIN FORMAT=JSON
SELECT a, b, c FROM t1
UNION ALL
SELECT a, b, c FROM t2;
EXPLAIN
{
  "query_block": {
    "union_result": {
      "using_temporary_table": false,
      "query_specifications": [
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 1,
            "table": {
              "table_name": "t1",
              "access_type": "ALL",
              "rows": 3,
              "filtered": 100
            }
          }
        },
        {
          "dependent": false,
          "cacheable": true,
          "query_block": {
            "select_id": 2,
            "table": {
              "table_name": "t2",
              "access_type": "ALL",
              "rows": 3,
              "filtered": 100
            }
          }
        }
      ]
    }
  }
}
Warnings:
Note    1003    /* select#1 */ select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS
`b`,`test`.`t1`.`c` AS `c` from `test`.`t1` union all /* select#2 */ select
`test`.`t2`.`a` AS `a`,`test`.`t2`.`b` AS `b`,`test`.`t2`.`c` AS `c` from
`test`.`t2`
DROP TABLE t1, t2;
Overall design
==============

A new subclass of select_union, select_union_direct, implements result
handling for UNION ALL without temporary tables. A select_union_direct
object is created for each UNION query. The object is a wrapper around
another select_result (e.g., select_send or select_insert) and
receives the output from each query of the union in sequence, casts
fields to the correct data type and forwards the rows to the wrapped
select_result.


Detailed design
===============

New classes
-----------

select_union_direct (subclass of select_union)

    Result class for UNION ALL without temporary table. This is a
    wrapper around another select_result, allowing several
    st_select_lexes to deliver their output into the same result
    object.

    Most member function calls are forwarded to the wrapped
    select_result, but some can't be called more than once (e.g.,
    prepare()). These are implemented so that the wrapped function is
    called only the first time.

    At construction time, the st_select_lexes that use the
    select_union_direct object is known. A pointer to the last
    st_select_lex of the union is kept to know when send_eof() should
    forward its call to the wrapped select_result.

    A temporary table is created to do the typecasting, but it is
    never instantiated. When a query returns a row, Item_fields for
    the temporary table are filled and read in order to do the type
    conversion, but no rows are written to the table, and the row is
    forwarded to the wrapped select_result immediately.


New functions
-------------

select_result::change_result()

    Change the wrapped select_result object of wrapper
    select_results.

    The introduction of this function leads to changes in
    JOIN::change_result(), since it now needs to make this call to its
    select_result. As a consequence, the function signatures for
    change_result() becomes the same for all classes that implement
    it.

st_select_lex_unit::union_needs_tmp_table()

    Assuming that the unit represents a union, return true if the unit
    needs a temporary table to store the result of queries. The
    correct way to check if a st_select_lex_unit represents a union
    that needs a temporary table, is to check (is_union() &&
    union_needs_tmp_table()).


Changes to st_select_lex_unit
-----------------------------

The choice of select_result defines how results are returned (to the
client with select_send, to a table with select_insert, etc.). This
select_result is taken as parameter sel_result to
st_select_lex_unit::prepare(), which decides on the select_result for
each st_select_lex. The choice is between select_union,
select_union_direct or the value of sel_result.

Despite the name, select_union is used not only for union queries, but
also for other queries that need to collect their results in a
temporary table.

If the unit represents a union query that doesn't need temporary
tables, a select_union_direct is created. There is no need for a
fake_select_lex. The parser creates one, but st_select_lex::prepare()
removes it when it finds out that a temporary table is not needed.

In st_select_lex::prepare(), st_select_lex::exec() and
st_select_lex::explain(), there are numerous steps for setting up and
executing fake_select_lex, and these are skipped by testing if, after
choosing a select_result, fake_select_lex != NULL.


Changes to EXPLAIN code
-----------------------

Traditional explain is more or less untouched since it just loops
through the st_select_lexes and explains them one by one. Checks for
fake_select_lex != NULL are added to support unions without
fake_select_lex.

JSON explain output wraps the details of the union query in an
"object" representing the fake_select_lex. When explaining a union
without a temporary table, since there is no fake_select_lex, a
wrapping "object" in the JSON output is made by
union_ctx::format_body().