Deadlock with small buffer pool

Bug #1521905 reported by Laurynas Biveinis
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.6
Fix Released
Medium
Nickolay Ihalainen
5.7
Fix Released
Medium
Nickolay Ihalainen

Bug Description

Split from bug 1433432, by Krunal Bauskar:

I found a deadlock and here is analysis for the same.

T1 -> (Query thread) .. Tries to perform a write heavy workload that causes it to reach a point when free buffers are exhausted and it needs a free buffer to complete the action.

#0 0x00007f99e0d99743 in select () at ../sysdeps/unix/syscall-template.S:81
#1 0x0000000000c8dc8a in os_thread_sleep (tm=4096) at /opt/projects/codebase/5.6/storage/innobase/os/os0thread.cc:285
#2 0x0000000000e0a738 in buf_LRU_get_free_block (buf_pool=0x317bc18) at /opt/projects/codebase/5.6/storage/innobase/buf/buf0lru.cc:1403
#3 0x0000000000df2700 in buf_page_create (space=6, offset=7623, zip_size=0, mtr=0x7f99dc1054a0) at /opt/projects/codebase/5.6/storage/innobase/buf/buf0buf.cc:3961

As T1 is waiting for free block, srv_empty_free_list_algorithm algorithm kicks in and causes T1 to suspend.
T1 is now relying on buf_flush_lru_manager_thread/page-cleaner-thread (buf_flush_page_cleaner_thread) to make free buffer available.

----

Page-Cleaner thread schedules a flushing of flush list by scanning the flush list (buf_flush_page)
First dirty-page to flush is space=x, offset=0. This is page that has been made dirty by T1 as part of write-heavy workload.
Page-Cleaner now need to obtain s-lock on this page but fails to obtain it as T1 is already holding X-lock on this page as part of insert action.

DEADLOCK....

* Page-Cleaner thread can't proceed till T1 releases an X-lock so that it can complete flush.

* T1 can't release the lock till Page-Cleaner thread make the free buffer available. [BTW T1 is in middle of doing pessimistic insert].

* Well what about LRU Manager thread ?
It is unable to free pages beyond set hard limit of 256 pages to avoid thrashing so LRU manager thread though active has went in passive mode just hogging CPU)

/** If LRU list of a buf_pool is less than this size then LRU eviction
should not happen. This is because when we do LRU flushing we also put
the blocks on free list. If LRU list is very small then we can end up
in thrashing. */
#define BUF_LRU_MIN_LEN 256

---------------

I have used the same TC as quoted in bug just increased the number of inserts by few more (18 to be precise)

tags: added: xtradb
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
no longer affects: percona-server/5.5
tags: added: lru-flusher
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1675

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.