WL#7083: GTIDS: set gtid_mode=ON online

Affects: Server-5.7   —   Status: Complete

EXECUTIVE SUMMARY
=================

This worklog provides a way to turn on GTIDs online, so that:
 1. Reads and writes are allowed always during the procedure; and
 2. servers do not need to synchronize.

Before this worklog, the user had to stop updates, then sychronize all servers, 
then restart all servers simultaneously. Thus, turning on GTIDs implied several 
minutes of planned downtime.

After this worklog, we still require the server to restart, but it is enough to 
restart one server at a time, so the replication cluster can still be online and 
accept updates. Thus, the mode of operation is similar to that of a rolling 
upgrade.


REFERENCES
==========
  - BUG#69059: GTIDS LACK A REASONABLE DEPLOYMENT STRATEGY
  - WL#6559  : Optimize GTIDs for passive slave - store GTIDs in table
Functional Requirements
=======================

FR1. The procedures for turning ON or OFF GTIDs must not impose a requirement to
     synchronize the entire topology at any given point in time.

FR2. The procedures for turning ON or OFF GTIDs must not impose a requirement to
     restart any server at any time (other than possibly upgrading the server
     version to one that contains the feature).

FR3. This is a secondary goal / positive side effect.
     The new functionality shall make it possible for a multi-source slave
     to have both masters with GTID_MODE=OFF and masters with GTID_MODE=ON.

Non-functional Requirements
===========================

NFR1. The procedures must work in arbitrary replication topologies.

NFR2. If the user makes any mistake, it should be detected as soon as possible,
      if at all possible. Mistakes must never lead to situations where wrong
      transactions are applied on the database.

NFR3. The extra functionality should not introduce new ways for the user to
      make the kind of mistakes that causes the slave to go out of sync with
      the master at a later fail-over operation.
1. ANALYSIS OF REQUIREMENTS
===========================

If servers that generate GTIDs coexist in the same topology as old
servers that do not generate GTIDs, we will have a mixture of
transactions that have identifiers and transactions that do not have
identifiers.

Terminology. There are two types of transactions:

- A transaction that has a GTID in the form UUID:NUMBER is called a
  *GTID-transaction*.

  In binary and relay logs, every GTID-transaction is always preceded
  by a Gtid_log_event.

  GTID-transactions can be addressed using either the GTID or using
  filename and position.

- A transaction that does not have a GTID assigned is called an
  *anonymous transaction*.

  After WL#7592, anonymous transactions are always be preceded by an
  Anonymous_gtid_log_event.  Before WL#7592, anonymous transactions
  are not preceded by any particular event at all.

  So after this worklog, transactions in a relay log that was received
  from an old master may not be preceded by any particular event at
  all, but after being replayed and logged in the slave's binary log,
  they will be preceded with an Anonymous_gtid_log_event.

  See also section 2.7 for a description of how to detect anonymous
  transactions in new and old binary and relay logs.

  Anonymous transactions can only be addressed using filename and
  position.

1.1. Requirement: ANONYMOUS TRANSACTION MUST REMAIN ANONYMOUS ON RE-EXECUTION
-----------------------------------------------------------------------------

An anonymous transaction must be kept anonymous when re-executed. If
an old master generates an anonymous transaction, then the new slave
must preserve anonymity and not generate a new GTID. The following
example illustrates what would happen if anonymous transactions were
not kept anonymous when replicated:

                              +---------->  Server B
                              |             GTID_MODE=ON
     Server A         --------+             Binlog: T1 (GTID=B:1)
     GTID_MODE=OFF            |
     Binlog: T1 (GTID=anon)   +---------->  Server C
                                            GTID_MODE=ON
                                            Binlog: T1 (GTID=C:1)

Server A is a master, Servers B and C are slaves of A. A is old and
generates only anonymous transactions. B and C are new and generates a
GTID for anonymous that it re-executes. A has executed one
transaction, T1. B and C have re- executed T1 and each of them has
generated a new GTID. The GTIDs are different since the Server UUIDs
for B and C are different.  Suppose A crashes. Then B should become a
new master and C should become a slave of B.  Since C does not have
any transaction with GTID (B, 1), B will send T1 to C and C will
re-execute it. This will lead to inconsistent data on C since T1 is
executed twice.

1.2. Requirement: GTID-TRANSACTION MUST KEEP ITS GTID WHEN RE-EXECUTED
----------------------------------------------------------------------

If a master with GTID_MODE=ON generates a GTID for a transaction, then
an error should be generated if an old server or a server with
GTID_MODE=OFF tries to process the transaction. The following example
illustrates what would happen if an old server would remove the GTID
from a transaction that it re-executes and which originates from a new
server.

     Server A         ------------------->  Server B
     GTID_MODE=ON                           GTID_MODE=OFF
     Binlog: T1 (GTID=A:1)                  Binlog: T1 (GTID=anon)

Server A is a master, Server B is a slave of A. A has GTID_MODE=ON
whereas B has GTID_MODE=OFF (or is old). A has executed one
transaction, T1, and assigned it GTID (A:1). B has re-executed T1 and
stripped away the GTID, converting T1 to an anonymous
transaction. Suppose that B is upgraded and starts to use the
AUTO_POSITION protocol. Then, when B reconnects next time using the
AUTO_POSITION protocol, A:1 will be retransmitted and B will execute
it again. This leads to inconsistent data on B.

1.3. Requirement: GENERATE GTIDS/ANONYMOUS TRANSACTIONS ACCORDING TO GTID_MODE
------------------------------------------------------------------------------

A server that has GTID_MODE=ON must only generate GTIDs. If it would
receive anonymous transactions from a master, it must fail to execute
the transactions.

A server that has GTID_MODE=OFF must only generate anonymous
transactions. If it would receive GTID-transactions from a master, it
must fail to execute the transactions.

1.4. Requirement: GTIDS MUST NEVER BE LOST
------------------------------------------

Even if the user switches between GTID_MODE=ON and GTID_MODE=OFF
several times, the GTID state (i.e., the value of
@@GLOBAL.GTID_EXECUTED) must not be lost.

So if e.g. one server by accident generates an anonymous transaction,
other servers can temporarily change GTID_MODE to the effect that they
can replicate the anonymous transaction (and in the meantime they
cannot perform an automatic fail-over); but this does not lose any of
the existing GTID state, so the servers can continue to work correctly
after setting GTID_MODE=ON again.

1.5. Requirement: GTID-TRANSACTIONS MUST BE GTID-CONSISTENT
-----------------------------------------------------------

The variable @@GLOBAL.ENFORCE_GTID_CONSISTENCY (which already exists
in the server) disallows certain types of transactions that cannot be
safely logged using GTIDs.  Therefore, we require that any
GTID-transaction is subject to the checks implied by
@@GLOBAL.ENFORCE_GTID_CONSISTENCY.

1.6. Requirement: ALLOW MULTIPLE MASTERS WITH OPPOSITE GTID_MODES
-----------------------------------------------------------------

The architecture should be open-ended. In particular, if in the future
slaves are capable to connect to multiple sources, this should not be
limited by the feature.

Suppose that a future slave is connecting to two masters. One of the
masters has GTID_MODE = ON and the other has GTID_MODE = OFF.  This
may happen e.g. if the masters are owned by two different DBAs (maybe
even different organizations) and the slave DBA has no influence over
them. Still the slave DBA may need to aggregate the data from the two
masters.

There must be a way to configure the slave so that this scenario is
possible.

1.7. Requirement: SLAVE MUST BE ABLE TO PROCESS WHAT IT GETS FROM THE MASTER
----------------------------------------------------------------------------

This is a requirement on the way the feature is used.

Any slave must be configured so that it is able to process the
transactions coming from its master. In particular, if master is ON,
slave cannot be OFF, and if slave is OFF, master cannot be ON. (These
are only examples of restrictions; there will be more restrictions;
see below.)

1.8. Requirement: AUTO_POSITION REQUIRES GTID-ONLY MASTER
---------------------------------------------------------

In order to use the AUTO_POSITION protocol, the master must only
generate GTID-transactions, not anonymous transactions.

Thus, if the AUTO_POSITION protocol is enabled, connecting to a master
that has GTID_MODE!=ON must fail. This is because anonymous
transactions cannot be addressed using GTIDs; they can only be
addressed using (filename, offset) pairs.

1.9. SUMMARY OF REQUIREMENTS
----------------------------

This is a summary of the requirements listed on the above sections.

First, the implementation requirements:

 IR1. An anonymous transaction must be kept anonymous when
      re-executed.

 IR2. A GTID-transaction must keep its GTIDs when re-executed

 IR3. When GTID_MODE = OFF, only anonymous transactions must be
      generated.

 IR4. When GTID_MODE = ON, only GTID-transactions must be generated.

 IR5. GTID_EXECUTED shall always be persisted and never lose GTIDs.

 IR6. All GTID-transactions must be subject to
      ENFORCE_GTID_CONSISTENCY checks.

 IR7. It shall be possible to configure a slave so that it can accept
      updates both from masters that have GTID_MODE = ON and masters
      that have GTID_MODE = OFF.

Second, the requirements for the procedure to turn on GTIDs itself:

 PR1. All servers must generate anonymous transactions until all
      servers know how to preserve GTIDs.

 PR2. All anonymous transactions in the topology must have been
      processed before using AUTO_POSITION.


2. PROPOSED SOLUTION
====================

To satisfy the implementation requirements and the procedure
requirements listed in the REQUIREMENTS section, the act of turning ON
GTIDs needs to be done in multiple steps. As such, the DBA needs to
tell the server on which step of the procedure the server is in. For
that we resort to the GTID_MODE variable.

2.1. MAKING GTID_MODE SETTABLE
------------------------------

To instruct the server in which mode it should operate, we change the
GTID_MODE system variable:

  - Make GTID_MODE accept more values than just ON and OFF. (In fact,
    it does already but these modes are not implemented yet):

    - OFF: Both new and replicated transactions must be anonymous.

    - OFF_PERMISSIVE: New transactions are anonymous. Replicated
      transactions can be either anonymous or GTID-transactions.

    - ON_PERMISSIVE: New transactions are
      GTID-transactions. Replicated transactions can be either
      anonymous or GTID-transactions.

    - ON: Both new and replicated transactions must be GTID-transactions.

  - Make the variable settable dynamically, except in the following
    case:

    - Changing GTID_MODE from ON_PERMISSIVE to ON requires a server
      restart.

    The reason is explained in Section 2.8.

The default remains GTID_MODE = OFF.  The variable remains global-only.
The variable can only be set by SUPER, from a top-level statement,
outside a transaction.

To see why we need the intermediate steps, consider bidirectional
replication between two servers:

- Initially the two servers have GTID_MODE = OFF. You cannot switch
  any of the servers directly to GTID_MODE = ON_PERMISSIVE or ON,
  because the other server would still have GTID_MODE = OFF and thus
  it would generate an error when it tried to process the
  transactions. So the first step must be to set GTID_MODE =
  OFF_PERMISSIVE.

- Suppose the two servers have GTID_MODE = OFF_PERMISSIVE. You cannot
  switch any of the servers directly to GTID_MODE = ON, because that
  server would still receive anonymous transactions from the other
  server and therefore it would generate an error. So the second step
  must be to set GTID_MODE = ON_PERMISSIVE.

- Once the two servers have GTID_MODE = ON_PERMISSIVE, it suffices to
  process all anonymous transactions of all relay logs, and after that
  all transactions are GTID-transactions. So then it is safe to switch
  to GTID_MODE = ON.

The same procedure works in any topology.

2.2. PROCEDURE FOR TURNING ON GTIDS WITH THE REPLICATION CLUSTER ONLINE 
-----------------------------------------------------------------------

The procedure to start using GTIDs is as follows. Note that it is
crucial that you complete every step before continuing to the next
step.

U1. The pre-conditions for using GTIDs are:

    U1.1. *All* servers in your topology must use MySQL 5.6.X or
          later.  You cannot use the GTID feature if one server in the
          topology is old.

          *All* servers that need to switch GTID_MODE to ON online
          must use MySQL 5.7.Y or later.  If any server is older than
          that, it can still switch GTID_MODE to ON, but all such old
          servers have to be offline at the same time, during part of
          the procedure (see below).

          Here, 5.6.X is the first release that supports GTIDs and
          5.7.Y is the first release that supports the four GTID
          operation modes.

    U1.2. All servers leave GTID_MODE with the default value OFF.

U2. On each server, execute:

    SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN;

    Then, let it run for a while with your normal workload.  If this
    causes any warnings in the log, adjust your application so that it
    only uses GTID-compatible features and does not generate any
    warning.

U3. On each server, execute:

    SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON;

U4. On each server, execute:

    SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;

    It does not matter which server executes this statement first, but
    it is important that all servers complete this step before any
    server begins the next step.

     4.1. If any servers use a version older than 5.7.X, switch them
          off at this point.  Steps 5 and 6 apply only to the servers
          of version 5.7.X or later.

U5. On each server, execute:

    SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;

    It does not matter which server executes this statement first.

U6. On each server, wait until the status variable
    ANONYMOUS_TRANSACTION_COUNT is zero.  This can be checked using:

    SHOW STATUS LIKE 'ANONYMOUS_TRANSACTION_COUNT';

    On a replication slave, it is theoretically possible that this
    shows zero and then non-zero again.  This is ok; it suffices that
    it shows zero once.

U7. Wait for all transactions generated up to step 6 to replicate to
    all servers.  You can do this without stopping updates: the only
    important thing is that all anonymous transactions get replicated.

    There are several possible ways to wait for transactions to
    replicate:

    U7.1. A simple method which works regardless of your topology, but
          relies on timing: if you are sure that the slave never lags
          more than N seconds, just wait for a bit more than N
          seconds.  Or wait for a day, or whatever time period you
          consider safe for your deployment.

    U7.2. A safer method in the sense that it does not depend on
          timing: if you only have a master with one or more slaves,
          do the following:

          U7.2.1. On the master, execute:

                  SHOW MASTER STATUS;

                  Note down the values in the File and Position
                  column.

          U7.2.2. On every slave, execute:

                  SELECT MASTER_POS_WAIT(, );

    U7.3. If you have a master and multiple levels of slaves (slaves
          of the slaves), repeat U7.2 on each level, starting from the
          master, then all the direct slaves, then all the
          slaves-of-slaves, etc.

    U7.4. If you use a circular replication topology where multiple
          servers may have write clients, perform step U7.2 for each
          master-slave connection, until you have completed the full
          circle.  Repeat so that you do the full circle *twice*.

          Here is an example: Suppose you have three servers A, B, and
          C, replicating in a circle like A -> B -> C -> A.  The
          procedure is then:

          - Do step U7.2.1 on A and step U7.2.2 on B.
          - Do step U7.2.1 on B and step U7.2.2 on C.
          - Do step U7.2.1 on C and step U7.2.2 on A.
          - Do step U7.2.1 on A and step U7.2.2 on B.
          - Do step U7.2.1 on B and step U7.2.2 on C.
          - Do step U7.2.1 on C and step U7.2.2 on A.

U8. On each server, execute:

    SET @@GLOBAL.GTID_MODE = ON;

U9. On each server, add gtid-mode=ON to my.cnf.

U10.You are now guaranteed that all transactions have a GTID (except
    transactions generated in step 5 or earlier, which have already
    been processed).  To start using the GTID protocol so that you can
    later perform automatic fail-over, execute on each slave:

     U10.1. Wait until at least one GTID-transaction has been replicated
            to all slaves.
     U10.2. STOP SLAVE;
     U10.3. CHANGE MASTER TO MASTER_AUTO_POSITION = 1;
     U10.4. START SLAVE;

    (If a future version of the server supports replication from
    multiple masters, step U10.3 must be performed once for each
    replication channel.)

    (Step U10.1 is needed to avoid spurious errors due to C5.3; see
    below.)

2.3. PROCEDURE FOR TURNING OFF GTIDS WITH THE REPLICATION CLUSTER ONLINE 
------------------------------------------------------------------------

Users who want to turn off GTIDs can do almost the same procedure as
in the previous section, but backwards. The only thing that differs is
the point at which you wait for logged transaction to replicate.

D1. On each slave, execute:

    D1.1. STOP SLAVE;
    D1.2. CHANGE MASTER TO MASTER_AUTO_POSITION = 0,
                        MASTER_LOG_FILE = ,
                        MASTER_LOG_POS = ;
    D1.3. START SLAVE;

    (If a future version of the server supports replication from
    multiple masters, step 1.2 must be performed once for each
    replication channel.)

D2. On each server, execute:

    SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE.

D3. On each server, execute:

    SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE.

D4. On each server, wait until the variable @@GLOBAL.GTID_OWNED is
    equal to the empty string. This can be checked using:

    SELECT @@GLOBAL.GTID_OWNED;

    On a replication slave, it is theoretically possible that this
    is empty and then nonempty again.  This is ok; it suffices that
    it is empty once.

D5. Wait for all transactions that currently exist in any binary log
    to replicate to all slaves.  Use the same method as in U7 of the
    procedure for turning on GTIDs.

D6. On each server, execute:

    SET @@GLOBAL.GTID_MODE = OFF;

D7. On each server, set gtid-mode=OFF in my.cnf.

D8. If you want to set ENFORCE_GTID_CONSISTENCY = OFF, you can do so now.

D9. If you want to downgrade to an earlier version of MySQL, you can
    do so now, using the normal downgrade procedure.

2.4. COMBINATIONS OF MASTER GTID_MODE, SLAVE GTID_MODE, AND AUTO_POSITION
-------------------------------------------------------------------------

As exemplified in section 2.1, the only rule that works in all
topologies is that master and slave differ by at most one step.

However, in order to satisfy IR6, it is necessary to relax this
restriction and allow master and slave to differ by more than one
step:

- Suppose a future slave is capable of replicating from two masters at
  the same time, and suppose it has one master with GTID_MODE = OFF
  and another master with GTID_MODE = ON.

- A slave that has GTID_MODE = OFF_PERMISSIVE or ON_PERMISSIVE can
  accept any GTID_MODE from the master without problems.

- A slave with GTID_MODE = OFF_PERMISSIVE or ON_PERMISSIVE can do
  fail-over, as long as the old and new master have GTID_MODE = ON.

- So the only combinations of master GTID_MODE and slave GTID_MODE
  that must be disallowed are when the slave has GTID_MODE = OFF and
  the master has GTID_MODE = ON or ON_PERMISSIVE; or when the slave
  has GTID_MODE = ON and the master has GTID_MODE = OFF or
  OFF_PERMISSIVE.  These settings do not make sense because the slave
  cannot handle the identifiers of the transactions committed on the
  master.

- The combinations of AUTO_POSITION and GTID_MODE that are necessary
  to disallow are (1) the slave has AUTO_POSITION = 1 and the master
  has GTID_MODE != ON; (2) the slave has AUTO_POSITION = 1 and the slave
  has GTID_MODE == OFF.

The following table illustrates the allowed combinations.

   Master GTID_MODE    OFF  OFF_PERMISSIVE  ON_PERMISSIVE  ON
    Slave GTID_MODE
                OFF     Y           Y           N          N
     OFF_PERMISSIVE     Y           Y           Y          Y*
      ON_PERMISSIVE     Y           Y           Y          Y*
                 ON     N           N           Y          Y*

   N - Slave thread will stop with an error instead of connect
   Y - GTID_MODEs are compatible
   * - AUTO_POSITION can be used

Second, notice that there is a use case when this additional
flexibility is needed:

- Suppose that a future slave is connecting to two masters. One of the
  masters has GTID_MODE = ON and the other has GTID_MODE = OFF.  This
  may happen e.g. if the masters are owned by two different DBAs
  (maybe even different organizations) and the slave DBA has no
  influence over them. Still the slave DBA may need to aggregate the
  data from the two masters.

- In this case, the slave must run with either GTID_MODE =
  OFF_PERMISSIVE or GTID_MODE = ON_PERMISSIVE. So a slave that has one
  of these two modes must be compatible with a master that has
  GTID_MODE = ON or OFF, i.e., with all modes.

- The slave DBA may want to perform a fail-over for the channel
  connected to the GTID_MODE = ON master, to another GTID_MODE = ON
  master. So AUTO_POSITION = 1 should be allowed as long as the slave
  has GTID_MODE != OFF, and the slave SQL thread should generate an
  error if master has GTID_MODE != ON.

Note: When CHANGE MASTER TO MASTER_AUTO_POSITION = 1 is executed, the
      slave is not connected to the master, so the only check we can
      do at that time is to give an error if GTID_MODE = OFF. Then, we
      may have an additional point of error generation in the IO
      thread connect code, if the master is using GTID_MODE = ON.

2.5. COMBINATIONS OF GTID_MODE AND GTID_NEXT
--------------------------------------------

The following table shows the behavior of the server for the different
values of GTID_MODE and GTID_NEXT. This summarizes the discussion
above.

        GTID_NEXT    AUTOMATIC  AUTOMATIC   ANONYMOUS  UUID:NUMBER
                     binlog on  binlog off
        GTID_MODE
              OFF    anonymous  anonymous   anonymous  error
   OFF_PERMISSIVE    anonymous  anonymous   anonymous  UUID:NUMBER
    ON_PERMISSIVE    new GTID   anonymous   anonymous  UUID:NUMBER
               ON    new GTID   anonymous   error      UUID:NUMBER

  Legend:
    anonymous - Generate an anonymous transaction.
        error - Generate an error and fail to execute 'SET GTID_NEXT'.
  UUID:NUMBER - Generate a GTID with the specified UUID:NUMBER.
     new GTID - Generate a GTID with an automatically generated number.

Note: When the binary log is off and GTID_NEXT = 'AUTOMATIC', then no
      GTID is generated. This is consistent with how the server works
      now.

2.6. ENFORCE_GTID_CONSISTENCY
-----------------------------

ENFORCE_GTID_CONSISTENCY shall be changed to be a dynamic variable.
The variable shall be global and settable only by SUPER in a top-level
statement outside a transaction.  The default shall remain OFF.  The
variable shall increase its range to values 0, 1, 2, with the
following symbolic names:

  0 = OFF

    All transactions are allowed to violate GTID consistency.

  1 = ON

    No transaction is allowed to violate GTID consistency.

  2 = WARN

    All transactions are allowed to violate GTID consistency, but a
    warning is generated in this case.

    The WARN value is useful in order to pre-check the workload before
    turning this variable to ON.  If this was not possible, there
    would be a risk for downtime when switching the value to ON and a
    lot of errors would be generated.

When GTID_MODE = ON, only ENFORCE_GTID_CONSISTENCY = ON is allowed.
When GTID_MODE != ON, all values are allowed.  However, even if
ENFORCE_GTID_CONSISTENCY != ON, the check is enforced in the following
cases:

- For transactions that use GTID_NEXT = 'UUID:NUMBER', an error is
  generated if the transaction violates GTID consistency, regardless
  of the value of ENFORCE_GTID_CONSISTENCY.

- When GTID_MODE = ON or ON_PERMISSIVE, for transactions that use
  GTID_NEXT = 'AUTOMATIC', an error is generated if the transaction
  violates GTID consistency, regardless of the value of
  ENFORCE_GTID_CONSISTENCY.

Notice that this logic ensures that IR7 is satisfied, i.e., a slave
that uses GTID_MODE = ON_PERMISSIVE or OFF_PERMISSIVE can have two
masters, one that uses GTID_MODE = OFF and another that uses GTID_MODE
= ON.  Such a slave can use ENFORCE_GTID_CONSISTENCY = OFF or WARN.
Then, transactions coming from the GTID_MODE = ON master are still
subject to the consistency checks (since such transactions use
GTID_NEXT = 'UUID:NUMBER'), while transactions coming from the
GTID_MODE = OFF master are accepted even if they violate GTID
consistency.

The following error conditions are checked:

- An error shall be generated if GTID_MODE = ON and the user executes
  SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = OFF or WARN.

- An error shall be generated if GTID_MODE = OFF_PERMISSIVE and there
  is an ongoing transaction that uses GTID_NEXT = 'AUTOMATIC' and
  violates GTID consistency, and the user executes SET
  @@GLOBAL.GTID_MODE = ON_PERMISSIVE.  See subsection 2.

- An error shall be generated if ENFORCE_GTID_CONSISTENCY is changed
  from OFF or WARN to ON and there are ongoing transactions that
  violate GTID consistency. See also subsection 2.8.1.

- A warning shall be generated if ENFORCE_GTID_CONSISTENCY is changed
  from OFF to WARN and there are ongoing transactions that violate
  GTID consistency.

2.7. DETECTING ANONYMOUS TRANSACTIONS
-------------------------------------

This is a clarification of existing behavior; all the following is
implemented in the server prior to this worklog.

There are two components that generate SQL statements from binary logs
or relay logs: the slave applier thread and mysqlbinlog. Both these
must take care to detect which transactions are anonymous and which
are GTID-transactions, and they must set GTID_NEXT accordingly.

In binary logs or relay logs that originate from a server where the
present worklog is implemented, this is detected only based on the
type of event that precedes the transaction: either a Gtid_log_event
or an Anonymous_gtid_log_event.

In binary logs or relay logs that originate from an old server, this
is detected by noticing that the file contains transactions without
any Gtid_log_event. The implementation is as follows:

- Slave thread: When it applies a Format_description_log_event that
  originates from a master, it sets THD.variables.gtid_next.type to a
  special value, NOT_YET_DETERMINED_GROUP. This will be converted to a
  correct value later; there are two cases:

   1. If a Gtid_log_event or Anonymous_gtid_log_event appears, then
      that event will set THD.variables.gtid_next.type accordingly.

   2. If no Gtid_log_event or Anonymous_gtid_log_event appears, then
      the next time an SQL statement is executed, it will set
      THD.variables.gtid_next.type to ANONYMOUS_GROUP. This is done in
      gtid_pre_statement_checks, which is called from mysql_parse for
      SQL statements and from Rows_log_event::do_apply_event for row
      events.

- mysqlbinlog: When it reads a binary log, it will output a BINLOG
  statement containing a base64-encoding of the initial
  Format_description_log_event.  When a client replays this, the
  Format_description_log_event does the same as it does in the slave
  thread, i.e., sets THD.variables.gtid_next.type to
  NOT_YET_DETERMINED_GROUP. After that it works exactly as in the case
  for the slave thread above.

2.8. ONGOING TRANSACTIONS AND SERVER RESTARTS
---------------------------------------------

In section 2.1, we mentioned that server restart is required when
changing from GTID_MODE = ON_PERMISSIVE to ON. The reason for this is
that we need to handle ongoing transactions correctly. In this section
we explain how ongoing transactions are handled by all the steps.

 2.8.1. ENFORCE_GTID_CONSISTENCY: OFF -> ON or WARN -> ON:

        When ENFORCE_GTID_CONSISTENCY = ON, the server is not allowed
        to execute any transaction that violates GTID consistency.

        However, ENFORCE_GTID_CONSISTENCY is checked at transaction
        start.  So if we allow ongoing transactions while changing
        from ENFORCE_GTID_CONSISTENCY = OFF or WARN to ON, it is
        possible to have the following erroneous execution:

         1. trx1 violates GTID consistency.

         2. trx1 passes the ENFORCE_GTID_CONSISTENCY check because
            ENFORCE_GTID_CONSISTENCY = OFF.

         3. Another client executes
            SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = OFF.

         4. trx1 commits.

        One solution to this problem would be to require a server
        restart in order to change ENFORCE_GTID_CONSISTENCY to ON,
        since that definitely ensures that there are no ongoing
        transactions.  However, in order to reduce the number of
        restarts, we use a different method.

        We maintain a counter of the number of transactions that
        violate GTID-consistency.  When a violating transaction begins
        to execute, the counter is incremented, and when a violating
        transaction ends, the counter is decremented.  The statement
        SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON is only allowed
        when the counter is zero.

        There is a cost for maintaining a global counter, even if we
        use lock-free atomic integer operations.  However, we estimate
        that the cost is not significant since it only affects those
        transactions that violate GTID consistency.

 2.8.2. ENFORCE_GTID_CONSISTENCY: ON -> OFF, ON -> WARN, or WARN -> OFF:

        These transitions are not problematic for ongoing
        transactions. Since we go from more restrictive to more
        permissive modes, any transaction that was ongoing before the
        SET statemement will be allowed after the statement as well.

 2.8.3. ENFORCE_GTID_CONSISTENCY: OFF -> WARN:

        Here we go from a more permissive to a more restrictive mode.
        Consider the following execution:

         1. trx1 violates GTID consistency, but no warning is
            generated since ENFORCE_GTID_CONSISTENCY = OFF when the
            transaction begins to execute.

         2. Another client executes:
            SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN

         3. trx1 commits

        Then, trx1 has committed when ENFORCE_GTID_CONSISTENCY = WARN,
        without generating any warning.  To ensure that there is some
        warning, we generate a warning for the statement SET
        @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN, if there is any
        ongoing transaction that violates GTID consistency.

 2.8.4. GTID_MODE: OFF -> OFF_PERMISSIVE:

        This transition is not problematic in 5.7. Since we go from a
        more restricted to a more permissive mode, any transaction
        that was ongoing before the SET statement will be allowed
        after the SET statement too.

        In 5.6, this would have been problematic. The reason is:

        - Gtid transactions always have a Gtid_log_event.

        - Anonymous transactions have an Anonymous_gtid_log_event.

        - There is one exception to the above rules: in 5.6, anonymous
          transactions do not have any event when GTID_MODE = OFF.

        - In 5.6, the Gtid event is allocated at the beginning of the
          transaction, but written only at the end of the transaction.

        Hence, in 5.6, if we would allow ongoing transactions while
        changing GTID_MODE from OFF to OFF_PERMISSIVE, the following
        erroneous execution would be possible:

         1. trx1 begins to execute. Since GTID_MODE = OFF, it does not
            allocate any event.

         2. Another client executes
            SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE.

         3. trx1 commits. Then it tries to write the Gtid_log_event to
            unallocated space.

        This is not a problem in 5.7, because the allocation of the
        Gtid/Anonymous event was moved to the end of the transaction
        (this was a big refactoring).

 2.8.5. GTID_MODE: OFF_PERMISSIVE -> ON_PERMISSIVE:

        - Both anonymous transactions and GTID-transactions are
          allowed in both modes, so ongoing replicated transactions
          (which execute with GTID_NEXT = 'ANONYMOUS' or GTID_NEXT =
          'UUID:NUMBER') can commit without problem.

        - The GTID of a new transaction (which executes with GTID_NEXT
          = 'AUTOMATIC') is generated when the transaction prepares
          (if the binary log is diabled) or flushes (if the binary log
          is enabled).  So it is possible to have the following
          execution:

           1. trx1 prepares or flushes, and is determined to be
              'ANONYMOUS' since GTID_MODE = OFF_PERMISSIVE.

           2. Another client executes SET @@GLOBAL.GTID_MODE =
              ON_PERMISSIVE.

           3. trx1 commits.

          Then, trx1 commits as an anonymous transaction even if
          GTID_MODE = ON_PERMISSIVE at the time of the commit.

          This could cause problems in the upgrade procedure in case
          step U7.2.1 of the procedure is performed before trx1 has
          been fully flushed to the binary log.  To prevent this from
          happening, we have introduced step U6.

        - GTID_MODE = ON_PERMISSIVE is more restrictive than GTID_MODE
          = OFF_PERMISSIVE for transactions that use GTID_NEXT =
          'AUTOMATIC' and violate GTID consistency (cf. subsection
          2.6).  Spefically, such transactions must generate an error
          when GTID_MODE = ON_PERMISSIVE but not when GTID_MODE =
          OFF_PERMISSIVE.

          Thus, if there is any ongoing transaction that uses
          GTID_NEXT = 'AUTOMATIC' and violates GTID consistency, SET
          @@GLOBAL.GTID_MODE = ON_PERMISSIVE shall generate an error.

 2.8.6. GTID_MODE: ON_PERMISSIVE -> ON:

        When GTID_MODE = ON, all transactions must be
        GTID-transactions; anonymous transactions are disallowed.

        Transactions can be set to anonymous before they begin to
        execute, using SET @@SESSION.GTID_NEXT = 'ANONYMOUS'.  So if
        we allow ongoing transactions while changing GTID_MODE =
        ON_PERMISSIVE to ON, we can have the following erroneous
        execution:

         1. trx1 sets GTID_NEXT = 'ANONYMOUS'. This is allowed because
            GTID_MODE = ON_PERMISSIVE.

         2. Another client executes SET @@GLOBAL.GTID_MODE = ON.

         3. trx1 commits.

        One solution to this problem would be to require a server
        restart in order to change GTID_MODE to ON, since that
        definitely ensures that there are no ongoing transactions.
        However, in order to reduce the number of restarts, we use a
        different method, similar to that we use to allow online SET
        @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON.

        We maintain a counter of the number of ongoing transactions
        that are anonymous.  When an anonymous transaction starts to
        execute, the counter is incremented, and when it ends, the
        counter is decremented.  The statement SET @@GLOBAL.GTID_MODE
        = ON fails with an error if the counter is not zero. The
        counter is exposed as the status variable
        ANONYMOUS_TRANSACTION_COUNT so that the user can know when it
        is allowed to set GTID_MODE = ON. (In fact, this status
        variable should be zero already at step U6 of the upgrade
        procedure; see section 2.2.)

        Maintaining a global counter has a cost, as it requries either
        a global lock or an atomic integer operation.  This has a
        performance impact on all transactions, even on a server that
        has GTID_MODE = OFF and does not plan to ever turn GTID_MODE
        to ON.  We shall do a performance test to see if this is a
        significant problem.  If it is, we can remove the counter and
        require a server restart instead.  (In contrast, the counter
        used for ENFORCE_GTID_CONSISTENCY is only incremented for the
        small set of unusual statements/transactions that violate GTID
        consistency; we consider the cost acceptable for such cases.)
        This would change steps U6 and U7 of the upgrade procedure:
        see Appendix A.

 2.8.7. GTID_MODE: ON -> ON_PERMISSIVE:

        This transition does not cause any problems for ongoing
        transactions, since ON is more restrictive than ON_PERMISSIVE.

 2.8.8. GTID_MODE: ON_PERMISSIVE -> OFF_PERMISSIVE:

        The considerations for this step are similar to those in step
        2.8.5:

        - Both anonymous transactions and GTID-transactions are
          allowed in both modes, so ongoing replicated transactions
          (which execute with GTID_NEXT = 'ANONYMOUS' or GTID_NEXT =
          'UUID:NUMBER') can commit without problem.

        - The GTID of a new transaction (which executes with GTID_NEXT
          = 'AUTOMATIC') is generated when the transaction prepares
          (if the binary log is diabled) or flushes (if the binary log
          is enabled).  So it is possible to have the following
          execution:

           1. trx1 prepares or flushes, and is determined to be
              'UUID:NUMBER' since GTID_MODE = ON_PERMISSIVE.

           2. Another client executes SET @@GLOBAL.GTID_MODE =
              OFF_PERMISSIVE.

           3. trx1 commits.

          Then, trx1 commits as a GTID-transaction even if GTID_MODE =
          OFF_PERMISSIVE at the time of the commit.

          This could cause problems in the downgrade procedure in case
          step U7.2.1 of the procedure (part of D5) is performed
          before trx1 has been fully flushed to the binary log.  To
          prevent this from happening, we have introduced step D4.

        - ON_PERMISSIVE is more restrictive for transactions that
          violate GTID consistency compared to OFF_PERMISSIVE (cf
          subsections 2.6 and 2.8.6).  Therefore, GTID consistency
          does not cause any trouble for this transition.

 2.8.9. GTID_MODE: OFF_PERMISSIVE -> OFF:

        When GTID_MODE = OFF, all transactions that commit must be
        anonymous; GTID-transactions are disallowed.

        Transactions can be set to GTID-transactions before they begin
        to execute. If we would allow GTID-transactions to execute
        while changing GTID_MODE to OFF, we could have the following
        erroneous execution:

         1. trx1 sets GTID_NEXT = 'UUID:NUMBER'. This is allowed
            because GTID_MODE = OFF_PERMISSIVE.

         2. Another client executes SET @@GLOBAL.GTID_MODE = OFF.

         3. trx1 commits.

        Luckily, it is easy to detect if there are any ongoing
        GTID-transactions: this can be checked using the function
        gtid_state->owned_gtids->is_empty().  So we disallow setting
        GTID_MODE = OFF when this function returns false.

        In 5.6, this transition would have been problematic for the
        same reason as in 2.8.4: an ongoing transaction would have
        allocated space for a Gtid_log_event, but since it commits
        when GTID_MODE = OFF it would never write to the allocated
        memory, and eventually the uninitialized memory would get
        written to the binary log.  In 5.7 this problem does not exist
        since a Gtid_log_event is generated unconditionally at the end
        of the transaction.

2.9. BINLOG_CONFIGURATION_LOG_EVENT
-----------------------------------

In order to perform some of the safety checks, we need to know what
value of GTID_MODE was in effect when the binary log was generated.

To make this possible, we introduce a new event type:
Binlog_configuration_log_event.  The Binlog_configuration_log_event is
stored in the beginning of the binary log, just after the
Format_description_log_event and Previous_gtids_log_event.  It shall
contain a single field: GTID_MODE, with value 0 (OFF), 1
(OFF_PERMISSIVE), 2 (ON_PERMISSIVE), or 3 (ON).

When GTID_MODE is changed, the binary log shall be rotated, so that
the Binlog_configuration_log_event correctly matches the GTID_MODE.

Binlog_configuration_log_event shall have the LOG_EVENT_IGNORABLE_F
flag set.  It shall be replicated to the slave as any other event.
The member function do_apply_event shall do nothing.

When the relay log is generated, Binlog_configuration_log_event shall
be generated.  It shall be generated just after the
Previous_gtids_log_event.

When AUTO_POSITION is enabled, the master send thread and the slave
receive thread generate errors if a Binlog_configuration_log_event is
found which has GTID_MODE != ON.

2.10. ANONYMOUS_TRANSACTION_COUNT
---------------------------------

In step U6 of the upgrade procedure, user has to wait for possible
ongoing anonymous transactions to commit, so that they are not missed
in the synchronization step U7.  In order to know when this is done,
we introduce the status variable ANONYMOUS_TRANSACTION_COUNT.  This
will be equal to the number of ongoing transactions for which it has
been decided that they must be anonymous.

For transactions coming from a user session, it is decided at the time
of transaction prepare whether the transaction is going to be
anonymous or have a GTID.  The decision is based on GTID_MODE.
Therefore, it is possible that there are N ongoing transactions, all
of which will eventually be committed as anonymous transactions, but
at the same time ANONYMOUS_TRANSACTION_COUNT < N since the decision to
make the transactions anonymous has not yet been taken.


3. COMPATIBILITY CHECKS
=======================

There are a number of places where the server needs to check
compatibility of GTID_MODE, GTID_NEXT, AUTO_POSITION,
ENFORCE_GTID_CONSISTENCY, GTID_MODE of a master, GTID_MODE of a slave,
and GTIDs of running transactions.

3.1. CHECKS PERFORMED WHEN SETTING GTID_MODE
--------------------------------------------

When user changes GTID_MODE, the following compatibility checks are
possible to implement:

C1.1. GTID_MODE must only change one step.

      Rationale:

      It would conceivably be possible to allow changing directly from
      OFF to ON_PERMISSIVE and from ON to OFF_PERMISSIVE.  However,
      this would not have any significant advantage since:

       1. It is not needed in the recommended procedure.

       2. The workaround is obvious (use the intermediate step).

      Moreover, enabling this would also have significant drawbacks:

       1. It is more uniform and easy to understand that a variable
          can change one step at a time, rather than one step in some
          cases and one or two steps in other cases.

       2. If we allow two steps, it is easier for the user to make a
          mistake in the upgrade or downgrade procedure.

       3. The analysis of changing just one step at a time is complex
          as it is (cf section 2.8).  Allowing more than one step at a
          time would imply even more case analysis, would be harder to
          maintain, etc.

      Error message:

      "The value of GTID_MODE can only change one step at a time: OFF
      <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON. Also note that this
      value must be stepped up or down simultaneously on all
      servers. See the Manual for instructions."

C1.2. SET @@GLOBAL.GTID_MODE = ON is not allowed when there are
      ongoing anonymous transactions. See subsection 2.8.6.

      Error message:

      "SET GTID_MODE = ON is not allowed when there are ongoing,
      anonymous transactions. Before setting GTID_MODE = ON, wait
      until SHOW STATUS LIKE 'ANONYMOUS_TRANSACTION_COUNT' shows zero
      on all servers. Then wait for all existing, anonymous
      transactions to replicate to all slaves, and then execute SET
      @@GLOBAL.GTID_MODE = ON on all servers. See the Manual for
      details."

C1.3. SET @@GLOBAL.GTID_MODE = OFF is not allowed if there are ongoing
      GTID-transactions.  That is, generate an error if
      @@GLOBAL.GTID_OWNED != ''.

      Error message:

      "SET GTID_MODE = OFF is not allowed when there are ongoing
      transactions that have a GTID. Before you set GTID_MODE = OFF,
      wait until SELECT @@GLOBAL.OWNED_GTIDS is empty on all servers.
      Then wait for all GTID-transactions to replicate to all servers,
      and then execute SET @@GLOBAL.GTID_MODE = OFF on all servers.
      See the Manual for details."

C1.4. When GTID_MODE changes from OFF_PERMISSIVE to ON_PERMISSIVE, and
      there is any ongoing transaction that uses GTID_MODE =
      'AUTOMATIC' and violates GTID consistency, an error shall be
      generated.

      Note: this means that users have to adjust their workload to be
      GTID-consistent before setting the option.

      Error message:

      "SET GTID_MODE = ON_PERMISSIVE is not allowed when there are
      ongoing transactions that use GTID_NEXT = 'AUTOMATIC', which
      violate GTID consistency. Make sure to adjust your workload to
      be GTID-consistent before setting GTID_MODE = ON_PERMISSIVE.
      See the Manual for @@GLOBAL.ENFORCE_GTID_CONSISTENCY for
      details."

C1.5. The AUTO_POSITION mode of every replication channel must be
      compatible with the new GTID_MODE. I.e., if AUTO_POSITION = 1
      and the new GTID_MODE is OFF, then generate an error.

      Error message:

      "SET GTID_MODE = OFF is not allowed since replication
      channel '%.192s' is configured in AUTO_POSITION mode. Execute
      CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL '%.192s'
      before you set GTID_MODE = OFF."

C1.6. SQL_SLAVE_SKIP_COUNTER must be 0, since SQL_SLAVE_SKIP_COUNTER=1
      is not allowed when GTID_MODE=ON (see section 4.2).

      Error message:

      "SET GTID_MODE = ON is only allowed when SQL_SLAVE_SKIP_COUNTER
      = 0."

* CHECKS NOT PERFORMED WHEN SETTING GTID_MODE

  Some checks which seem desirable to perform when setting GTID_MODE
  are too difficult to implement and not strictly necessary.  In
  particular, the folling checks will NOT be implemented:

  C1.7. Unprocessed transactions in the relay log must be compatible
        with the new GTID_MODE.  The check could use one or both of
        the following methods:

        - Read the Binlog_configuration_log_event of all unprocessed
          relay logs.

        - Let the receiver thread store the position of the last
          received anonymous transaction, and let the applier thread
          store the position of the last committed anonymous
          transaction. If the former is greater than the latter,
          generate an error.

        Both these checks are a little bit complex, and the error will
        be detected by the applier thread anyways (see C8.1 and C8.2).

  C1.8. The GTID_MODE of connected masters must be compatible with the
        new GTID_MODE.

        Since the upgrade procedure is online, masters will change
        GTID_MODE without the slave reconnecting.  There is currently
        no way for slaves to read master configuration at other points
        than reconnect time.  In any case, the receiver thread will
        stop once it receives transactions generated by the master in
        the incompatible mode (see C7.1 - C7.6).

  C1.9. The GTID_MODE of connected slaves must be compatible with the
        new GTID_MODE.

        Since the upgrade procedure is online, slaves will change
        GTID_MODE without the master knowing about it. There is
        currently no way for masters to read slave configuration other
        than what the slave specifies at reconnect time.  In any case,
        the slave's receiver thread will check compatibility with the
        master's GTID_MODE (see C7.1 - C7.6), so the error will be
        detected by the slave when it receives transactions generated
        using the incompatible mode.

  C1.10.The AUTO_POSITION of connected slaves must be compatible with
        the new GTID_MODE.  I.e., if GTID_MODE = ON and there is some
        slave running with AUTO_POSITION = 1, then SET
        @@GLOBAL.GTID_MODE = ON_PERMISSIVE shall generate an error.

        This is hard to implement because it requires iterating over
        all connected send threads.  In any case, the sender and
        receiver threads will check compatibility with AUTO_POSITION
        (see C6.1, C6.4, C7.1, and C7.4).

3.2. CHECKS PERFORMED WHEN SETTING AUTO_POSITION
------------------------------------------------

When the user executes CHANGE MASTER TO MASTER_AUTO_POSITION = 1, the
following check must be performed:

C2.1. If GTID_MODE == OFF, an error is generated and the CHANGE MASTER
      TO command fails.

      Error message:

      "CHANGE MASTER TO MASTER_AUTO_POSITION = 1 cannot be executed
      because GTID_MODE = OFF."

This check is already performed by the server (but with a slightly
different error message).

3.3. CHECKS PERFORMED WHEN SETTING GTID_NEXT
--------------------------------------------

The checks performed when setting GTID_NEXT are already in place, so
there is nothing to change. (The checks are: generate error when
setting GTID_NEXT = 'ANONYMOUS' and GTID_MODE = ON, and generate error
when setting GTID_NEXT = 'UUID:NUMBER' and GTID_MODE = OFF.)

3.4. CHECKS PERFORMED BY SLAVE WHEN CONNECTING TO A MASTER
----------------------------------------------------------

In the master-slave handshake, the slave shall read the master's
GTID_MODE and perform the following checks:

(this is mostly a summary of section 2.4)

C4.1. If slave has GTID_MODE = OFF and master has GTID_MODE =
      ON_PERMISSIVE or ON, the slave receiver thread shall generate an
      error and stop.

      Error message:

      "The replication receiver thread cannot start because the master
      has GTID_MODE = %.192s and this server has GTID_MODE = %.192s."

      This message already exists in the server.

C4.2. If slave has GTID_MODE = ON and master has GTID_MODE =
      OFF_PERMISSIVE or OFF, the slave receiver thread shall generate
      an error and stop.

      Error message: Same as for C4.1.

C4.3. If slave is using the AUTO_POSITION protocol and master does not
      have GTID_MODE = ON, the slave receiver thread shall generate an
      error and stop.

      Error message:

      "The replication receiver thread cannot start in AUTO_POSITION
      mode: the master has GTID_MODE = %.192s instead of ON."

      This error already exists in the server.

C4.4. If slave has GTID_MODE = OFF and AUTO_POSITION = 1, the slave
      IO thread shall generate an error and stop.

      This cannot normally happen because of the checks performed when
      setting GTID_MODE = OFF and AUTO_POSITION = 1.  However, it can
      happen if user changes GTID_MODE from ON to OFF in the
      configuration file when the server is offline and then starts
      the server with --force-gtid-mode-on-startup.

      Error message:

      "The replication receiver thread cannot start in AUTO_POSITION
      mode: this server uses GTID_MODE = OFF."

3.5. CHECKS PERFORMED BY MASTER WHEN A SLAVE CONNECTS
-----------------------------------------------------

In the master-slave handshake, the master performs the following checks:

C5.1. When a server connects as slave using the AUTO_POSITION
      protocol, and the master does not have GTID_MODE = ON, the
      master shall generate an error and stop the send thread.

      This may seem redundant because we have C4.3, but it is not,
      because there is a race: the master may change GTID_MODE after
      the slave has checked it and before the slave connects. To avoid
      surprises, the master should take a lock to prevent changing
      GTID_MODE, then perform the check, then perform its
      initialization, and then release the lock.

      Error message:

      "The replication sender thread cannot start in AUTO_POSITION
      mode: this server has GTID_MODE = %.192s instead of ON."

      The check already exists in the server, with a typo in the error
      message.

C5.2. When a server connects as slave using the AUTO_POSITION
      protocol, and the master finds that the slave is missing
      transactions that were generated when the master was using
      GTID_MODE != ON (as detected by reading the
      Binlog_configuration_log_event), the master shall generate an
      error and stop the send thread.

      Error message:

      "The replication sender thread cannot start in AUTO_POSITION
      mode: the binary log file '%.256s' contains GTIDs that are
      missing on slave, and which were generated using GTID_MODE =
      %.192s instead of ON."

C5.3. When a server connects as slave using the AUTO_POSITION
      protocol, and the master finds that the last transaction that is
      *not* to be sent is anonymous, then the master shall generate an
      error and stop the send thread.

      This prevents loss of transactions in the following case:

       1. Master and slave are in sync and both use GTID_MODE = ON and
          auto_position protocol.

       2. Slave threads stop.

       3. Master server changes to GTID_MODE = OFF or OFF_PERMISSIVE.

       4. Master server generates some transactions.

       5. Master server restarts and sets GTID_MODE = ON again.

       6. Slave threads start.

      If the check was not there, the anonymous transactions would be
      silently skipped.

      Error message:

      "The replication sender thread cannot start in AUTO_POSITION
      mode: the first transaction to send is preceded by an anonymous
      transaction. Replicate at least one GTID-transaction to the
      slave before you enable AUTO_POSITION."

      This step implies that step 9.1 is needed in the upgrade
      procedure.  If user forgets step 9.1, the only thing that will
      happen is that you get an error in the applier thread after step
      9.4, and you just have to redo steps 9.1-9.4.  There is no risk
      for data loss, only a minor inconvenience.

3.6. CHECKS PERFORMED BY A RUNNING SEND THREAD ON MASTER
--------------------------------------------------------

The master does not know the slave's GTID_MODE, as it may legally
change during the online procedure for turning on or off
GTIDs. Therefore, the master's send thread shall not perform any
checks for GTID_MODE.

When using the auto-positioning protocol, the send thread should
perform the following checks:

C6.1. If AUTO_POSITION = 1 and the send thread reads a
      Binlog_configuration_log_event that contains GTID_MODE != ON, an
      error is generated and the send thread is stopped.

      Error message:

      "Cannot replicate binary log generated with GTID_MODE = %.192s
      when AUTO_POSITION is enabled, at file %.512s, position %lld."

C6.2. If GTID_MODE = ON and the send thread reads a
      Binlog_configuration_log_event with GTID_MODE = OFF, an error is
      generated and the send thread is stopped.

      Error message:

      "Cannot replicate binary log generated with GTID_MODE = %.192s
      when GTID_MODE = %.192s, at file %.512s, position %lld."

C6.3. If GTID_MODE = OFF and the send thread reads a
      Binlog_configuration_log_event with GTID_MODE = ON, an error is
      generated and the send thread is stopped.

      Error message: same as C6.2.

C6.4. If AUTO_POSITION = 1 and the send thread reads an
      Anonymous_gtid_log_event, an error is generated and the send
      thread is stopped.

      Error message:

      "Cannot replicate anonymous transaction when AUTO_POSITION = 1,
      at file %.512s, position %lld."

C6.5. If GTID_MODE = ON and the send thread reads an
      Anonymous_gtid_log_event, an error is generated and the send
      thread is stopped.

      Error message:

      "Cannot replicate anonymous transaction when GTID_MODE = ON, at
      file %.512s, position %lld."

C6.6. If GTID_MODE = OFF and the send thread reads a
      Gtid_log_event, an error is generated and the send
      thread is stopped.

      Error message:

      "Cannot replicate GTID-transaction when GTID_MODE = OFF, at file
      %.512s, position %lld."

3.7. CHECKS PERFORMED BY A RUNNING RECEIVE THREAD ON SLAVE
----------------------------------------------------------

The slave must not receive transactions that are incompatible with the
current GTID_MODE. Therefore, the slave receive thread shall implement
the following checks:

C7.1. If AUTO_POSITION = 1 and the receive thread receives a
      Binlog_configuration_log_event that contains GTID_MODE != ON,
      the receive thread shall stop with an error.

      Error message: same as for C6.1.

C7.2. If the server uses GTID_MODE = ON and the receive thread
      receives a Binlog_configuration_log_event that contains
      GTID_MODE = OFF_PERMISSIVE or OFF, then the receive thread shall
      stop with an error.

      Error message: Same as for C6.2

C7.3. If the server uses GTID_MODE = OFF and the receive thread
      receives a Binlog_configuration_log_event that contains
      GTID_MODE = ON_PERMISSIVE or ON, then the receive thread shall
      stop with an error.

      Error message: Same as for C6.3.

C7.4. If AUTO_POSITION = 1 and the receive thread receives an
      Anonymous_gtid_log_event, the receive thread shall stop with an
      error.

      Error message: same as for C6.4.

C7.5. If GTID_MODE = ON and the receive thread receives an
      Anonymous_gtid_log_event, the receive thread shall stop with an
      error.

      Error message: same as for C6.5.

C7.6. If GTID_MODE = OFF and the receive thread receives a
      Gtid_log_event, the receive thread shall stop with an error.

      Error message: same as for C6.6.

      The check already exists, but with a slightly different error
      message.

3.8. CHECKS PERFORMED BY A RUNNING APPLIER THREAD
-------------------------------------------------

The applier threads automatically performs the checks imposed by
setting GTID_NEXT according to the events it read from the relay log
(see 3.3). The checks for setting GTID_NEXT are done already in MySQL
5.6.

In addition, the following checks are performed:

C8.1. If the server is using GTID_MODE = OFF and the applier thread
      reads a Binlog_configuration_log_event that contains GTID_MODE =
      ON_PERMISSIVE or ON, then the applier thread shall stop with an
      error.

      Error message: Same as for C6.2.

C8.2. If the server is using GTID_MODE = ON and the applier thread
      reads a Binlog_configuration_log_event that contains GTID_MODE =
      OFF_PERMISSIVE or OFF, then the applier thread shall stop with an
      error.

      Error message: Same as for C6.2.

3.9. CHECKS PERFORMED BY SERVER STARTUP
---------------------------------------

C9.1. If server starts with GTID_MODE = OFF and the replication
      connection has AUTO_POSITION = 1, then a warning shall be
      generated. The channel will still have AUTO_POSITION = 1, which
      will later cause an error when starting the slave receiver
      thread (see C5.1).

      Warning message:

      "Detected misconfiguration: replication channel '%.192s' was
      configured with AUTO_POSITION = 1, but the server was started
      with --gtid-mode=off.  Either reconfigure replication using
      CHANGE MASTER TO MASTER_AUTO_POSITION = 0 FOR CHANNEL '%.192s',
      or change GTID_MODE to some value other than OFF, before
      starting the slave receiver thread."

C9.2. It is easy for the user to forget updating my.cnf after
      performing the online upgrade procedure. Changing the GTID_MODE
      in a server restart is dangerous and can lead to the DBA having
      to temporarily downgrade to AUTO_POSITION = 0 and ON_PERMISSIVE
      on all servers in order to process the anonymous transactions.
      This is unwanted and presents a risk since automatic fail-over
      will not be allowed in the meantime.

      To prevent against such mistakes, the following check shall be
      performed at server statsup: If the
      Binlog_configuration_log_event of the last binary log contains
      'GTID_MODE = ON', and the server starts with a GTID_MODE other
      than ON, the server shall generate an error and fail to start.

      Error message:

      "Cannot start the server because the server was last
      running with GTID_MODE = ON and is now being started with
      --gtid-mode=%.192s. Unintentionally starting in the wrong
      GTID_MODE can be harmful. If you are intentionally changing the
      GTID_MODE, suppress the check using
      --force-gtid-mode-on-startup."

C9.3. A command-line option shall be provided so that the user can
      circumvent C9.2 and start the server anyways:

      --force-gtid-mode-on-startup

      If this option is enabled, and GTID_MODE is different from the
      GTID_MODE of the last binary log, then only a warning is
      generated instead of an error.

      Warning message:

      "The server was last using GTID_MODE = ON and is now being
      started with --gtid-mode=%.192s. This is allowed because
      --force-gtid-mode-on-startup is used."

C9.4. If the server starts with GTID_MODE = ON and
      ENFORCE_GTID_CONSISTENCY and GTID_MODE != ON, an
      error shall be generated.

      Error message:

      "GTID_MODE = ON requires ENFORCE_GTID_CONSISTENCY = ON."

3.10 CHECKS PERFORMED WHEN SETTING ENFORCE_GTID_CONSISTENCY
-----------------------------------------------------------

C10.1. When GTID_MODE = ON, and the user tries to change
      ENFORCE_GTID_CONSISTENCY to OFF or WARN, an error shall be
      generated.  When GTID_MODE = ON, and the user tries to change
      ENFORCE_GTID_CONSISTENCY to OFF or WARN, the statement shall
      fail and an error shall be generated.

      The error message shall be the same as in C9.4.

C10.2. When ENFORCE_GTID_CONSISTENCY is changed from OFF or WARN to
      ON, and there is any ongoing transaction that violates GTID
      consistency, the statement shall fail and an error shall be
      generated.

      Error message:

      "Cannot set ENFORCE_GTID_CONSISTENCY = ON because there are
      ongoing transactions that violate GTID consistency."

C10.3. When ENFORCE_GTID_CONSISTENCY is changed from OFF to WARN, and
      there are ongoing transactions that violate GTID consistency,
      the statement shall generate a warning.

      Warning message:

      "There are ongoing transactions that violate GTID consistency."

3.11 CHECKS PERFORMED WHEN STARTING AN APPLIER THREAD
-----------------------------------------------------

No compatibility checks will be performed when starting an applier
thread.  We could conceivably check that the
Binlog_configuration_log_events of all unprocessed relay logs are
compatible with the current GTID_MODE, but this will be detected when
the thread reaches the relevant relay log in any case.


4. OTHER GTID FEATURES
======================

4.1. WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS
--------------------------------------

WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS should be possible to execute
whenever GTID_MODE != OFF.

Rationale: If multi-source is implemented, a slave can use
multi-source to aggregate data from two masters. If one master has
GTID_MODE = ON and the other has GTID_MODE = OFF, the slave must have
GTID_MODE = ON_PERMISSIVE or OFF_PERMISSIVE.  Still the slave may want
to wait for a given GTID set from the master that uses GTID_MODE = ON.

There is nothing to change: the server already works this way.

In case similar functions are implemented, e.g., to wait for
transactions to be received, then the same restriction shall apply.

4.2. SQL_SLAVE_SKIP_COUNTER
---------------------------

SQL_SLAVE_SKIP_COUNTER should be possible to set whenever GTID_MODE !=
ON.

Rationale: It is currently not allowed to set it when GTID_MODE = ON,
since the correct and only GTID-safe way to skip transactions is using
an empty transaction. However, when GTID_MODE != ON, there can be
anonymous transactions which cannot be skipped using empty
transactions.

There is nothing to change: the server already works this way.

4.3. GTID_EXECUTED, GTID_PURGED, and FIELDS IN SHOW SLAVE STATUS,
     PERFORMANCE_SCHEMA, ETC
----------------------------------------------------------------

Fields displaying GTID sets (e.g. GTID_EXECUTED, GTID_PURGED, SHOW
SLAVE STATUS / RETRIEVED_GTID_SET,
PERFORMANCE_SCHEMA.replication_connection_status.RECEIVED_TRANSACTION_SET,
etc) should work the same way regardless of GTID_MODE. It is possible
that these sets are nonempty even when GTID_MODE = OFF, because
GTID_MODE can have been ON earlier. If the server has been off forever
and has not executed any GTID transactions, the sets should simply be
empty ('').

Before this worklog, GTID_EXECUTED and GTID_PURGED are empty if
GTID_MODE is OFF and binlog is disabled. This must change so that they
are initialized on server startup regardless of GTID_MODE and binlog
being enabled.

Notation:
- Fields displaying a GTID set should contain an empty string if
  the field is empty.

- Fields displaying a single GTID (e.g.,
  PERFORMANCE_SCHEMA.REPLICATION_EXECUTE_STATUS_BY_WORKER /
  CURRENT_TRANSACTION) should display "ANONYMOUS" if the current
  transaction is anonymous.

4.4. SET GTID_PURGED
--------------------

It should be allowed to execute SET GTID_PURGED regardless of the
GTID_MODE, since GTID_EXECUTED and GTID_PURGED are preserved e.g. when
going from GTID_MODE = ON to OFF to ON.

4.5. THE GTID_EXECUTED TABLE
----------------------------

The table mysql.gtid_executed was introduced in WL#6559.  When a
transaction is committed, its GTID is stored in the table
mysql.gtid_executed.

The table is range-compressed once for every N committed transaction.
The range compression is performed by a separate thread.  Currently,
when GTID_MODE = OFF, the thread is not started at all; otherwise, it
is started when the server starts and stopped when the server stops.

Since GTID_MODE is now dynamic, we need to change the logic.  To make
it simple, we start the thread unconditionally when the server starts
and stop it unconditionally when the server stops.  Since no
transactions are committed when GTID_MODE = OFF, the thread will never
wake up in this case and thus will not use any CPU.

5. SUMMARY OF USER-VISIBLE CHANGES
----------------------------------

- GTID_MODE is now dynamic. It can be set by SUPER from a top-level
  statement.

- GTID_MODE now takes the following values:

  - 0 = OFF: Both new and replicated transactions must be anonymous.

  - 1 = OFF_PERMISSIVE: New transactions are anonymous. Replicated
    transactions can be either anonymous or GTID-transactions.

  - 2 = ON_PERMISSIVE: New transactions are GTID-transactions.
    Replicated transactions can be either anonymous or
    GTID-transactions.

  - 3 = ON: Both new and replicated transactions must be
    GTID-transactions.

- GTID_MODE can only be altered one step at a time:
  OFF <-> OFF_PERMISSIVE <-> ON_PERMISSIVE <-> ON

- GTID_MODE can not be altered dynamically from ON_PERMISSIVE to ON.
  This step requires a server restart.

- ENFORCE_GTID_CONSISTENCY is now dynamic. It can be set by SUPER from
  a top-level statement.

- ENFORCE_GTID_CONSISTENCY now takes the following values:

  0 = OFF

    All transactions are allowed to violate GTID consistency.

  1 = ON

    No transaction is allowed to violate GTID consistency.

  2 = WARN

    All transactions are allowed to violate GTID consistency, but a
    warning is generated in this case.

  Additionally, transactions that use GTID_NEXT = 'UUID:NUMBER' are
  not allowed to violate GTID consistency, regardless of the value of
  ENFORCE_GTID_CONSISTENCY.  Transactions that use GTID_NEXT =
  'AUTOMATIC' are not allowed to violate GTID consistency when
  GTID_MODE = ON_PERMISSIVE or ON.

- GTID_MODE = ON is only allowed when ENFORCE_GTID_CONSISTENCY = ON

- The binary log contains a new type of event type:
  Binlog_configuration_log_event.

- The existing binary log event Previous_gtids_log_event has been
  extended with one more field.

- A new command line option --force-gtid-mode-on-startup has been
  introduced.

- The status variable ANONYMOUS_TRANSACTION_COUNT has been introduced.
  This shows the number of transactions for which it has been
  determined that they will be anonymous.


==== APPENDIX A: alternative that requires one server restart ====

The above algorithm requires every transaction to increase and
decrease an atomic counter.  This is needed in order to allow
ON_PERMISSIVE -> ON without restarting the server.  If the overhead
imposed by the atomic operations is deemed unacceptable, we could
remove them and require a server restart for the ON_PERMISSIVE -> ON
step.  This means that steps 6 and 7 of the upgrade procedure become
more complex:

U6'.Wait for any anonymous transactions that may be still be executing
    to commit.  This cannot be checked with 100% certainty.  However,
    transactions are only assigned their anonymity a very short time
    before they get committed (normally a fraction of a second).
    Therefore, you can wait a minute to be safe.

U8'.Restart each server with gtid-mode=ON.  If any servers older than
    5.7.X were switched off in step 4.1, then they can be switched on
    now as well.

    It does not matter which server executes this step first.

    When performing this step on a master, typically a switch-over
    will be needed.  In a tree topology, this can be done as follows:

    U8.1. Ensure the slaves are not lagging too much behind the
          master.

    U8.2. Stop updates on the master.

    U8.3. Wait until some slave is up to date with the master; we call
          this slave the stand-in. Record the binlog positions shown
          in SHOW MASTER STATUS on the stand-in.

    U8.4. Allow clients to connect to the stand-in to do updates.

    U8.5. If there are other slaves of the master, wait for them to
          catch up with the master.  Then redirect all other slaves to
          the stand-in, using MASTER_LOG_FILE and MASTER_LOG_POS as
          recorded in step U8.3.

    U8.6. Restart the master with GTID_MODE = ON.

    U8.7. Connect the master as a slave of the stand-in, using
          MASTER_LOG_FILE and MASTER_LOG_POS as recorded in step U8.3.

    U8.8. Wait until the master does not lag too much behind the
          stand-in.

    U8.9. Stop updates on the stand-in.

   U8.10. Wait until the master is up to date with the
          stand-in. Record the binlog positions shown in SHOW MASTER
          STATUS on the master.

   U8.11. Allow clients to connect to the master to do updates.

   U8.12. Wait until all direct slaves of the stand-in are up to date
          with the stand-in.

   U8.13. Connect the slaves and the stand-in as slaves of the master,
          using MASTER_LOG_POS and MASTER_LOG_FILE as recorded in step
          8.10.

Moreover, Check C1.2 needs to be replaced by an error generated
unconditionally when user executes SET @@GLOBAL.GTID_MODE = ON.
SUMMARY OF CHANGES
==================

 1. Small simplifications.

    While debugging the feature, a few small things had to be fixed, e.g.
    more DBUG output, etc. This patch collects all such simplifications,
    so that they don't distract the rest of the worklog.

 2. Currently, the code uses numeric constants instead of enumeration
    values for GTID_MODE. Also it uses the names UPGRADE_STEP_1 and
    UPGRADE_STEP_2 instead of OFF_PERMISSIVE and ON_PERMISSIVE. Change
    to use the new names and to use enumeration values always. Use an
    enum type to give better compilation checks (e.g. warning for
    missing enumeration value in switch). Encapsulate all access to
    the global variable using getter functions.

 3. Make GTID_MODE settable and allow OFF_PERMISSIVE and ON_PERMISSIVE.

    Make reads to gtid_mode be guarded by global_sid_lock.rdlock and
    writes by global_sid_lock.wrlock.

    Make the GTID table compression thread start and stop
    unconditionally.

    Make GTID_EXECUTED and GTID_PURGED be initialized unconditionally.

    Allow SET GTID_PURGED when GTID_MODE = OFF.

    Rotate the binary log when changing GTID_MODE.

 4. Implement check C1.2. This requires that we implement the following
    counter:

    - anonymous_gtid_count:
      The number of active transactions that use GTID_NEXT =
      'ANONYMOUS'.

    To implement this, we need:

    - Update anonymous_gtid_count whenever thd->gtid_owned changes to
      THD::OWNED_SIDNO_ANONYMOUS, or when it changes from
      THD::OWNED_SIDNO_ANONYMOUS to something else.

 5. Make ENFORCE_GTID_CONSISTENCY settable, allow WARN.

    Use an enum instead of a boolean. Use symbolic names always.
    Encapsulate all access to the global variable using getter
    functions.

    Make reads of ENFORCE_GTID_CONSISTENCY be protected by
    global_sid_lock.rdlock and writes by global_sid_lock.wrlock.

    Make every transaction check for GTID consistency regardless of
    the value of GTID_MODE.  Then, it is enough to take
    global_sid_lock.rdlock for transactions that violate GTID
    consistency.

 6. Implement simple checks:
    C1.1-C1.3, C1.6, C2.1, C4.1-C4.4, C5.1, C6.4-C6.6, C7.4-C7.6, C9.4, C10.1

 7. Implement checks that need to read AUTO_POSITION from multiple channels:
    C1.5, C9.1

 8. Introduce two global counters:

    - automatic_gtid_consistency_violation_count:
      The number of active transactions that use GTID_NEXT =
      'AUTOMATIC' and violate GTID consistency.

    - anonymous_gtid_consistency_violation_count:
      The number of active transactions that use GTID_NEXT =
      'ANONYMOUS' and violate GTID consistency.

    To implement these, we need:

    - Introduce a flag THD::gtid_consistency_violation that indicates
      if the current transaction has increased one of these counters.

    - Currently, consistency is checked only if
      ENFORCE_GTID_CONSISTENCY = ON.  We shall change so that
      consistency is checked unconditionally.  If consistency is
      violated, do this:

      - Fail with an error if one of the following holds:
        - ENFORCE_GTID_CONSISTENCY = ON, or
        - GTID_NEXT = 'UUID:NUMBER', or
        - GTID_NEXT = 'AUTOMATIC' and GTID_MODE = ON_PERMISSIVE or ON

      - Otherwise:
        if GTID_NEXT = 'AUTOMATIC':
          increase automatic_gtid_consistency_violation_count
          set thd->gtid_consistency_violation = 1
        if GTID_NEXT = 'ANONYMOUS':
          increase automatic_gtid_consistency_violation_count
          set thd->gtid_consistency_violation = 2
        if ENFORCE_GTID_CONSISTENCY = WARN:
          generate a warning
        allow the statement to execute

      At the end of the statement, if thd->gtid_consistency_violation
      != 0, decrease the corresponding global counter and set
      thd->gtid_consistency_violation = 0.

 8. Implement checks that depend on the counters introduced in the
    previous step: C1.4, C10.2, C10.3

10. Add a new event type, Binlog_configuration_log_event, that
    contains the GTID_MODE in use when the binary log was
    created. (But make the event format extensible so that other
    fields can be added if needed.) The event should have the
    LOG_EVENT_IGNORABLE_F flag set.

    Flush the binary log every time GTID_MODE is changed.  This is
    needed so that every binary log contains the correct GTID_MODE in
    the Binlog_configuration_log_event.

11. Implement checks that depend on Binlog_configuration_log_event:
    C5.2, C6.1-C6.3, C7.1-C7.3, C8.1, C8.2, C9.2, which all depend
    on Binlog_configuration_log_event

12. Implement C9.3, which depends on C9.2 implemented in the previous
    step.

13. Implement C5.3.  We postpone it until this step since it is a
    little bit complex, since it requires adding functionality to
    MYSQL_BIN_LOG::read_gtids_from_binlog.