Skip to content

Commit

Permalink
Add logical read-ahead to InnoDB.
Browse files Browse the repository at this point in the history
Summary:
When the session variable innodb_lra_size is set to N, we issue async
read requests for the next M logical pages where the total size of the M
pages on disk is N megabytes. The max allowed value of innodb_lra_size
is is 16384 which corresponds to prefetching 16GB of data. We may choose
to use smaller values in production.

When the flashcache is available, the logical-read-ahead tells
flashcache to not cache the pages it reads.

We always sort the page numbers before issuing read requests for them
because sorting is cheap and this way we don't have to rely on the block
layer of linux kernel to sort our read requests. Another advantage is
that if the read array is small then block layer doesn't get a chance to
coalesce reads.

I added status variables for the number of pages prefetched by
logical-read-ahead, the number of pages that are missed (a page is
missed if we notice that it was not prefetched while doing the scan.
this can happen if the b-tree was modified since the last time we
prefetched pages.), and the number of pages that were already in buffer
pool. These are

  innodb_logical_read_ahead_prefetched
  innodb_logical_read_ahead_missed
  innodb_logical_read_ahead_in_buf_pool.

There are two more session variables that control the behaviour of
logical read ahead:

  innodb_lra_n_node_recs_before_sleep: this variable determines how many
node pointer records should be traversed by logical read ahead before
going to sleep.

  innodb_lra_sleep: this is the amount of time (in milliseconds) that
logical read ahead sleeps in order to give other transactions a chance
to x-latch the index lock.

I had to make the following modifications:

* Persistent cursor can not be restored on a level other than the leaf
level. I provided a way to do this but it only works for PAGE_CUR_LE mode.

* Make btr_pcur_restore_position_func() always re-traverse the B-tree in
debug build. This is to test the above functionality in debug build.

Test Plan:
* innodb_logical_read_ahead.test tests that logical read ahead fetches
all pages necessary when the read ahead size is large enough. It also
tests whether the asynchronous io requests made for the table was the
same as the number of prefetches done by the logical read ahead.

* Add a unit test where one thread creates a lot of splits and merges on
the B-tree while the other scans the table. This is to stress test the
correctness of the changes.

* innodb_logical_read_ahead_correctness.test: This test tests for the
base cases (empty table, table with one row, table with many rows) and a
case where pages are merged while the scan is being performed. If the
code was not careful in restoring the cursor in row_read_ahead_logical()
or if row_search_for_mysql() did not check the return value of this
function, then we could skip a record. If the return value of
row_read_ahead_logical() in row_search_for_mysql() is ignored, this test
fails.

* Add stress test for logical read ahead: checksum of the table is
computed in a separate thread while the table is modified by bunch of
threads. This makes sure that innodb_lra_size works under concurrency.

* Run on a production table without traffic.

* Run on a production shadow with read/write traffic.

Reviewers: rongrong, mcallaghan, steaphan, yoshinori

Reviewed By: steaphan
  • Loading branch information
Nizameddin Ordulu authored and steaphangreene committed Jun 4, 2013
1 parent f9d1a53 commit f8e3619
Show file tree
Hide file tree
Showing 31 changed files with 1,307 additions and 26 deletions.
50 changes: 50 additions & 0 deletions mysql-test/suite/innodb/r/innodb_logical_read_ahead.result
@@ -0,0 +1,50 @@
DROP TABLE if exists t1;
CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;
INSERT INTO t1 VALUES (0, REPEAT('a',256));
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
show global status like "innodb_buffered_aio_submitted";
Variable_name Value
Innodb_buffered_aio_submitted 0
show global status like "innodb_logical_read_ahead_misses";
Variable_name Value
Innodb_logical_read_ahead_misses 0
show global status like "innodb_logical_read_ahead_prefetched";
Variable_name Value
Innodb_logical_read_ahead_prefetched 0
show global status like "innodb_logical_read_ahead_in_buf_pool";
Variable_name Value
Innodb_logical_read_ahead_in_buf_pool 0
SET SESSION innodb_lra_size=1024;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
checksum table t1;
Table Checksum
test.t1 2920207201
select t1.variable_value=t2.variable_value prefetched_equals_submitted from information_schema.global_status t1, information_schema.global_status t2 where t1.variable_name = 'innodb_logical_read_ahead_prefetched' and t2.variable_name='innodb_buffered_aio_submitted';
prefetched_equals_submitted
1
show global status like "innodb_logical_read_ahead_misses";
Variable_name Value
Innodb_logical_read_ahead_misses 0
select variable_value > 1000 from information_schema.global_status where variable_name="innodb_logical_read_ahead_prefetched";
variable_value > 1000
1
select variable_value < 100 from information_schema.global_status where variable_name="innodb_logical_read_ahead_in_buf_pool";
variable_value < 100
1
DROP TABLE t1;
@@ -0,0 +1,88 @@
DROP TABLE IF EXISTS t1_small;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t1_lra;
DROP TABLE IF EXISTS t2_small;
DROP TABLE IF EXISTS t3_small;
CREATE TABLE t1_small(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;
SET SESSION innodb_lra_size=1;
SELECT * FROM t1_small;
a b
SET SESSION innodb_lra_size=0;
INSERT INTO t1_small(b) VALUES(REPEAT('a',256));
SET SESSION innodb_lra_size=1;
SELECT a, LENGTH(b) FROM t1_small;
a LENGTH(b)
1 256
SET SESSION innodb_lra_size=0;
DROP TABLE t1_small;
CREATE TABLE `t2_small` (
`id1` bigint(20) unsigned NOT NULL DEFAULT '0',
`time` bigint(20) unsigned NOT NULL DEFAULT '0',
`id2` bigint(20) unsigned NOT NULL DEFAULT '0',
`id2_type` int(10) unsigned DEFAULT NULL,
`data` text,
`status` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id1`,`time`,`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SET SESSION innodb_lra_size=1;
SELECT * FROM t2_small;
id1 time id2 id2_type data status
DROP TABLE t2_small;
CREATE TABLE `t3_small` (
`id` bigint(20) NOT NULL,
`a` text,
`b` text,
`c` text,
`d` text,
`e` text,
`f` text,
`g` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
SET SESSION innodb_lra_size=1;
SELECT * FROM t3_small;
id a b c d e f g
DROP TABLE t3_small;
CREATE TABLE t1(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;
CREATE TABLE t1_lra(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;
INSERT INTO t1 VALUES (0, REPEAT('a',256));
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1_lra SELECT * FROM t1;
CHECKSUM TABLE t1;
Table Checksum
test.t1 2793042655
SET SESSION innodb_lra_size=1;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
CHECKSUM TABLE t1_lra;
Table Checksum
test.t1_lra 2793042655
DELETE FROM t1 WHERE a >= 5480 AND a < 5520;
DELETE FROM t1 WHERE a >= 5520 AND a < 5550;
CHECKSUM TABLE t1;
Table Checksum
test.t1 1005864202
SET GLOBAL innodb_lra_test=1;
DELETE FROM t1_lra WHERE a >= 5480 AND a < 5520;
DELETE FROM t1_lra WHERE a >= 5520 AND a < 5550;
SET SESSION innodb_lra_size=1;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
CHECKSUM TABLE t1_lra;
Table Checksum
test.t1_lra 1005864202
DROP TABLE t1;
DROP TABLE t1_lra;
@@ -0,0 +1,2 @@
--innodb_use_native_aio=1
--force-restart
59 changes: 59 additions & 0 deletions mysql-test/suite/innodb/t/innodb_logical_read_ahead.test
@@ -0,0 +1,59 @@
--source include/have_innodb.inc
--source include/have_native_aio.inc

--disable_warnings
DROP TABLE if exists t1;
--enable_warnings

# Create table.
CREATE TABLE t1 (a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;

# Populate table.
INSERT INTO t1 VALUES (0, REPEAT('a',256));
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;
INSERT INTO t1 SELECT 0, b FROM t1;

--source include/restart_mysqld.inc

show global status like "innodb_buffered_aio_submitted";
show global status like "innodb_logical_read_ahead_misses";
show global status like "innodb_logical_read_ahead_prefetched";
show global status like "innodb_logical_read_ahead_in_buf_pool";

# set the logical read ahead large enough to prefetch
# the entire table.
SET SESSION innodb_lra_size=1024;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
checksum table t1;

# the asynchronous io submits must be nonzero and equal
# to the innodb_logical_read_ahead_prefetched.
select t1.variable_value=t2.variable_value prefetched_equals_submitted from information_schema.global_status t1, information_schema.global_status t2 where t1.variable_name = 'innodb_logical_read_ahead_prefetched' and t2.variable_name='innodb_buffered_aio_submitted';

# there should be no misses, all pages must have been
# prefetched by the logical read ahead.
show global status like "innodb_logical_read_ahead_misses";
# the total number of pages prefetched must be close to the number
# of leaf pages of the table.
select variable_value > 1000 from information_schema.global_status where variable_name="innodb_logical_read_ahead_prefetched";
# innodb_logical_read_ahead_in_buf_pool is the number of pages
# of the table that were already in the buffer pool while doing the scan.
# This should be small.
select variable_value < 100 from information_schema.global_status where variable_name="innodb_logical_read_ahead_in_buf_pool";

DROP TABLE t1;
@@ -0,0 +1,2 @@
--innodb_use_native_aio=1
--force-restart
107 changes: 107 additions & 0 deletions mysql-test/suite/innodb/t/innodb_logical_read_ahead_correctness.test
@@ -0,0 +1,107 @@
--source include/have_debug.inc
--source include/have_innodb.inc
--source include/have_native_aio.inc

--disable_warnings
DROP TABLE IF EXISTS t1_small;
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t1_lra;
DROP TABLE IF EXISTS t2_small;
DROP TABLE IF EXISTS t3_small;
--enable_warnings

# The small table is for checking against a bug where the table's only page is the
# root page. In such a case the function called for getting the parent page caused
# the server to crash.
CREATE TABLE t1_small(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;

SET SESSION innodb_lra_size=1;
SELECT * FROM t1_small;

SET SESSION innodb_lra_size=0;
INSERT INTO t1_small(b) VALUES(REPEAT('a',256));
SET SESSION innodb_lra_size=1;
SELECT a, LENGTH(b) FROM t1_small;
SET SESSION innodb_lra_size=0;

DROP TABLE t1_small;

CREATE TABLE `t2_small` (
`id1` bigint(20) unsigned NOT NULL DEFAULT '0',
`time` bigint(20) unsigned NOT NULL DEFAULT '0',
`id2` bigint(20) unsigned NOT NULL DEFAULT '0',
`id2_type` int(10) unsigned DEFAULT NULL,
`data` text,
`status` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id1`,`time`,`id2`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

SET SESSION innodb_lra_size=1;
SELECT * FROM t2_small;
DROP TABLE t2_small;

CREATE TABLE `t3_small` (
`id` bigint(20) NOT NULL,
`a` text,
`b` text,
`c` text,
`d` text,
`e` text,
`f` text,
`g` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

SET SESSION innodb_lra_size=1;
SELECT * FROM t3_small;
DROP TABLE t3_small;

CREATE TABLE t1(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;
CREATE TABLE t1_lra(a INT NOT NULL PRIMARY KEY AUTO_INCREMENT, b VARCHAR(256)) ENGINE=INNODB;

# Populate tables.
INSERT INTO t1 VALUES (0, REPEAT('a',256));
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;
INSERT INTO t1(b) SELECT b FROM t1;

INSERT INTO t1_lra SELECT * FROM t1;

--source include/restart_mysqld.inc

CHECKSUM TABLE t1;

SET SESSION innodb_lra_size=1;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
CHECKSUM TABLE t1_lra;

--source include/restart_mysqld.inc

DELETE FROM t1 WHERE a >= 5480 AND a < 5520;
DELETE FROM t1 WHERE a >= 5520 AND a < 5550;

CHECKSUM TABLE t1;

SET GLOBAL innodb_lra_test=1;
DELETE FROM t1_lra WHERE a >= 5480 AND a < 5520;
DELETE FROM t1_lra WHERE a >= 5520 AND a < 5550;

SET SESSION innodb_lra_size=1;
SET SESSION innodb_lra_n_node_recs_before_sleep=128;
SET SESSION innodb_lra_sleep=100;
CHECKSUM TABLE t1_lra;

DROP TABLE t1;
DROP TABLE t1_lra;
27 changes: 27 additions & 0 deletions mysql-test/suite/innodb_stress/r/innodb_stress_lra.result
@@ -0,0 +1,27 @@
include/master-slave.inc
Warnings:
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
Note #### Storing MySQL user name or password information in the master.info repository is not secure and is therefore not recommended. Please see the MySQL Manual for more about this issue and possible alternatives.
[connection master]
DROP TABLE IF EXISTS t1;
CREATE TABLE t1(id INT AUTO_INCREMENT PRIMARY KEY,
msg_prefix VARCHAR(255),
msg VARCHAR(255),
msg_length int,
msg_checksum varchar(128),
KEY msg_i(msg_prefix))
ENGINE=INNODB;
call mtr.add_suppression(".*");
include/rpl_restart_server.inc [server_number=2]
Checksum master
stop slave;
Warnings:
Note 1255 Slave already has been stopped
start slave;
Checksum slave
DROP TABLE t1;
stop slave;
start slave;
include/stop_slave.inc
include/start_slave.inc
include/stop_slave.inc
9 changes: 9 additions & 0 deletions mysql-test/suite/innodb_stress/t/innodb_stress_lra-master.opt
@@ -0,0 +1,9 @@
--binlog-do-db=test
--innodb-file-per-table
--innodb_file_format='Barracuda'
--sync_binlog=10
--innodb_flush_log_at_trx_commit=2
--force-restart
--innodb_buffer_pool_size=128M
--innodb_use_native_aio=1
--innodb_flush_method=O_DIRECT
35 changes: 35 additions & 0 deletions mysql-test/suite/innodb_stress/t/innodb_stress_lra.test
@@ -0,0 +1,35 @@
# Don't test this under valgrind, memory leaks will occur
--source include/not_valgrind.inc
--source include/have_innodb.inc
--source include/master-slave.inc
--source include/big_test.inc
--source include/have_native_aio.inc


--disable_warnings
DROP TABLE IF EXISTS t1;
--enable_warnings

# create the actual table
CREATE TABLE t1(id INT AUTO_INCREMENT PRIMARY KEY,
msg_prefix VARCHAR(255),
msg VARCHAR(255),
msg_length int,
msg_checksum varchar(128),
KEY msg_i(msg_prefix))
ENGINE=INNODB;


let $use_blob=0;
let $do_crash=0;
let $do_compress=1;

--let $num_crashes = 1
--let $num_workers = 10
--let $num_transactions = 1000
--let $kill_db_after = 0
--let $num_records = 1000000
--let $max_rows = 1100000
--let $do_checksum = 1

--source suite/innodb_stress/include/innodb_stress.inc

1 comment on commit f8e3619

@a7853z
Copy link

@a7853z a7853z commented on f8e3619 Oct 13, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come the speed of full table scan(simple select * query) on my MySql-5.6.12 don't speed up after applying the LRA and 'Submitting multiple async I/O requests at once' methods to InnoDB engine as stated in: http://yoshinorimatsunobu.blogspot.jp/2013/10/making-full-table-scan-10x-faster-in.html .
Actually, it's even a bit slower than the original MySql-5.6.12. Is there anybody who can help me figure this out? I am beginner of MySQL. Here comes some information of my test:
+----------+
| count() |
+----------+
| 34076616 |
+----------+
+---------------------------------------+--------+
| Variable_name | Value |
+---------------------------------------+--------+
| Innodb_buffer_pool_read_ahead_rnd | 0 |
| Innodb_buffer_pool_read_ahead | 0 |
| Innodb_buffer_pool_read_ahead_evicted | 0 |
| Innodb_logical_read_ahead_misses | 0 |
| Innodb_logical_read_ahead_prefetched | 204341 |
| Innodb_logical_read_ahead_in_buf_pool | 613027 |
+---------------------------------------+--------+
+-------------------------------------+-------+
| Variable_name | Value |
+-------------------------------------+-------+
| innodb_lra_n_node_recs_before_sleep | 10000 |
| innodb_lra_size | 2048 |
| innodb_lra_sleep | 50 |
+-------------------------------------+-------+
*
************************* 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: homed_dbtest
TABLE_NAME: TestTable
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Compact
TABLE_ROWS: 17269278
AVG_ROW_LENGTH: 195
DATA_LENGTH: 3383263232
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 589299712
DATA_FREE: 5242880
AUTO_INCREMENT: NULL
CREATE_TIME: 2016-10-12 19:11:40
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: latin1_swedish_ci
CHECKSUM: NULL
CREATE_OPTIONS:
TABLE_COMMENT:

Please sign in to comment.